ShowAndTell
updated
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
• 2412.11605
• Published
• 18
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
Fourier Position Embedding: Enhancing Attention's Periodic Extension for
Length Generalization
Paper
• 2412.17739
• Published
• 41
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic
Retrieval
Paper
• 2412.15443
• Published
• 10
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
• 2501.01264
• Published
• 26
SDPO: Segment-Level Direct Preference Optimization for Social Agents
Paper
• 2501.01821
• Published
• 20
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models
in Multi-Hop Tool Use
Paper
• 2501.02506
• Published
• 10
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level
Reward Models
Paper
• 2501.03124
• Published
• 14
Evaluating Sample Utility for Data Selection by Mimicking Model Weights
Paper
• 2501.06708
• Published
• 5
Atla Selene Mini: A General Purpose Evaluation Model
Paper
• 2501.17195
• Published
• 35
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published
• 58
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper
• 2502.18137
• Published
• 60
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction
Following
Paper
• 2502.14494
• Published
• 15
Agentic Reward Modeling: Integrating Human Preferences with Verifiable
Correctness Signals for Reliable Reward Systems
Paper
• 2502.19328
• Published
• 23
Can Large Language Models Detect Errors in Long Chain-of-Thought
Reasoning?
Paper
• 2502.19361
• Published
• 28
Towards an AI co-scientist
Paper
• 2502.18864
• Published
• 52
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
• 2503.00808
• Published
• 56
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence
Generation up to 100K Tokens
Paper
• 2502.18890
• Published
• 30
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by
Coordinating Data Quality and Diversity
Paper
• 2503.01506
• Published
• 10
General Reasoning Requires Learning to Reason from the Get-go
Paper
• 2502.19402
• Published
• 5
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper
• 2503.00735
• Published
• 23
Process-based Self-Rewarding Language Models
Paper
• 2503.03746
• Published
• 39
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in
Expert-Domain Information Retrieval
Paper
• 2503.04644
• Published
• 21
Know You First and Be You Better: Modeling Human-Like User Simulators
via Implicit Profiles
Paper
• 2502.18968
• Published
• 3
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent
Truthful-Guided Pre-Intervention
Paper
• 2503.10602
• Published
• 4
Temporal Consistency for LLM Reasoning Process Error Identification
Paper
• 2503.14495
• Published
• 11
EvalTree: Profiling Language Model Weaknesses via Hierarchical
Capability Trees
Paper
• 2503.08893
• Published
• 6
Discovering Knowledge Deficiencies of Language Models on Massive
Knowledge Base
Paper
• 2503.23361
• Published
• 5
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration
via Tensorization
Paper
• 2503.20286
• Published
• 3
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
• 2504.00824
• Published
• 43
Agentic Knowledgeable Self-awareness
Paper
• 2504.03553
• Published
• 27
Heimdall: test-time scaling on the generative verification
Paper
• 2504.10337
• Published
• 33
Paper
• 2504.11442
• Published
• 30
Efficient Process Reward Model Training via Active Learning
Paper
• 2504.10559
• Published
• 13
AI-University: An LLM-based platform for instructional alignment to
scientific classrooms
Paper
• 2504.08846
• Published
• 9
Learning Adaptive Parallel Reasoning with Language Models
Paper
• 2504.15466
• Published
• 44
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
• 2505.00234
• Published
• 26
100 Days After DeepSeek-R1: A Survey on Replication Studies and More
Directions for Reasoning Language Models
Paper
• 2505.00551
• Published
• 36
Toward Evaluative Thinking: Meta Policy Optimization with Evolving
Reward Models
Paper
• 2504.20157
• Published
• 37
TreeHop: Generate and Filter Next Query Embeddings Efficiently for
Multi-hop Question Answering
Paper
• 2504.20114
• Published
• 4
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Paper
• 2504.19162
• Published
• 18
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical
Investigation
Paper
• 2503.12854
• Published
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
Paper
• 2505.02625
• Published
• 23
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG
Evaluation Prompts
Paper
• 2504.21117
• Published
• 26
CORG: Generating Answers from Complex, Interrelated Contexts
Paper
• 2505.00023
• Published
• 9
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM
Inference
Paper
• 2505.02922
• Published
• 28
Scalable Chain of Thoughts via Elastic Reasoning
Paper
• 2505.05315
• Published
• 26
X-Reasoner: Towards Generalizable Reasoning Across Modalities and
Domains
Paper
• 2505.03981
• Published
• 15
AutoLibra: Agent Metric Induction from Open-Ended Feedback
Paper
• 2505.02820
• Published
• 3
Phare: A Safety Probe for Large Language Models
Paper
• 2505.11365
• Published
• 7
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search
with Reasoning via Reinforcement Learning
Paper
• 2505.15776
• Published
• 11
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Paper
• 2505.13529
• Published
• 11
Text Generation Beyond Discrete Token Sampling
Paper
• 2505.14827
• Published
• 10
Scaling Diffusion Transformers Efficiently via μP
Paper
• 2505.15270
• Published
• 35
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware
Representations
Paper
• 2505.18125
• Published
• 112
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization
Paper
• 2505.18092
• Published
• 43
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Paper
• 2505.14669
• Published
• 78
Learning to Reason without External Rewards
Paper
• 2505.19590
• Published
• 29
Can Large Language Models Infer Causal Relationships from Real-World
Text?
Paper
• 2505.18931
• Published
• 1
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
• 2506.07900
• Published
• 95
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form
Generation Tasks with Structured Checklists
Paper
• 2506.01241
• Published
• 9
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge
Conflict on Large Language Models
Paper
• 2506.06485
• Published
• 5
Cartridges: Lightweight and general-purpose long context representations
via self-study
Paper
• 2506.06266
• Published
• 7
Improving large language models with concept-aware fine-tuning
Paper
• 2506.07833
• Published
• 3
HASHIRU: Hierarchical Agent System for Hybrid Intelligent Resource
Utilization
Paper
• 2506.04255
• Published
• 5
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published
• 277
MemMamba: Rethinking Memory Patterns in State Space Model
Paper
• 2510.03279
• Published
• 73