Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
• 2512.24618
• Published
• 151
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper
• 2512.24873
• Published
• 105
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper
• 2512.23343
• Published
• 29
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
Paper
• 2512.24297
• Published
• 6
Valori: A Deterministic Memory Substrate for AI Systems
Paper
• 2512.22280
• Published
• 5
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper
• 2512.23959
• Published
• 112
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization
Paper
• 2512.24615
• Published
• 119
Nested Learning: The Illusion of Deep Learning Architectures
Paper
• 2512.24695
• Published
• 44
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
Paper
• 2512.24330
• Published
• 35
Fast-weight Product Key Memory
Paper
• 2601.00671
• Published
• 6
SimpleMem: Efficient Lifelong Memory for LLM Agents
Paper
• 2601.02553
• Published
• 37
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper
• 2601.02346
• Published
• 26
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
Paper
• 2601.01576
• Published
• 18
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
Paper
• 2601.03193
• Published
• 47
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper
• 2601.02427
• Published
• 45
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning
Paper
• 2512.23412
• Published
• 41
Token-Level LLM Collaboration via FusionRoute
Paper
• 2601.05106
• Published
• 40
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search
Paper
• 2601.04767
• Published
• 28
Paper
• 2601.05111
• Published
• 20
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
Paper
• 2601.02439
• Published
• 16
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
Paper
• 2601.03425
• Published
• 16
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
Paper
• 2601.04575
• Published
• 9
DocDancer: Towards Agentic Document-Grounded Information Seeking
Paper
• 2601.05163
• Published
• 5
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
• 2601.02151
• Published
• 109
AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering
Paper
• 2601.04620
• Published
• 3
Evolving Programmatic Skill Networks
Paper
• 2601.03509
• Published
• 87
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning
Paper
• 2601.03872
• Published
• 43
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Paper
• 2601.05432
• Published
• 166
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Paper
• 2601.06002
• Published
• 55
Agentic Rubrics as Contextual Verifiers for SWE Agents
Paper
• 2601.04171
• Published
• 12
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
Paper
• 2601.06021
• Published
• 47
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
Paper
• 2601.02075
• Published
• 8
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
Paper
• 2601.05808
• Published
• 36
Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
Paper
• 2601.03315
• Published
• 6
AgentOCR: Reimagining Agent History via Optical Self-Compression
Paper
• 2601.04786
• Published
• 30
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents
Paper
• 2601.03236
• Published
• 6
Can We Predict Before Executing Machine Learning Agents?
Paper
• 2601.05930
• Published
• 27
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published
• 227
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift
Paper
• 2601.05882
• Published
• 21
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Paper
• 2601.05905
• Published
• 20
SmartSearch: Process Reward-Guided Query Refinement for Search Agents
Paper
• 2601.04888
• Published
• 10
Over-Searching in Search-Augmented Large Language Models
Paper
• 2601.05503
• Published
• 7
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation
Paper
• 2601.04823
• Published
• 7
Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning
Paper
• 2601.04726
• Published
• 7
TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration
Paper
• 2601.04544
• Published
• 6
IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck
Paper
• 2601.05870
• Published
• 3
Distilling Feedback into Memory-as-a-Tool
Paper
• 2601.05960
• Published
• 3
BabyVision: Visual Reasoning Beyond Language
Paper
• 2601.06521
• Published
• 196
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Paper
• 2601.05593
• Published
• 84
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper
• 2601.07226
• Published
• 33
Dr. Zero: Self-Evolving Search Agents without Training Data
Paper
• 2601.07055
• Published
• 22
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
Paper
• 2601.07779
• Published
• 28
Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction
Paper
• 2601.05107
• Published
• 24
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
Paper
• 2601.06860
• Published
• 16
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era
Paper
• 2601.07526
• Published
• 24
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
Paper
• 2601.06803
• Published
• 10
TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning
Paper
• 2601.04698
• Published
• 10
How Do Large Language Models Learn Concepts During Continual Pre-Training?
Paper
• 2601.03570
• Published
• 4
OpenTinker: Separating Concerns in Agentic Reinforcement Learning
Paper
• 2601.07376
• Published
• 7
ShowUI-Aloha: Human-Taught GUI Agent
Paper
• 2601.07181
• Published
• 3
Are LLM Decisions Faithful to Verbal Confidence?
Paper
• 2601.07767
• Published
• 5
Structured Episodic Event Memory
Paper
• 2601.06411
• Published
• 4
Artificial Entanglement in the Fine-Tuning of Large Language Models
Paper
• 2601.06788
• Published
• 5
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
Paper
• 2601.08225
• Published
• 52
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
Paper
• 2601.06487
• Published
• 53
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
Paper
• 2601.07389
• Published
• 2
MemoBrain: Executive Memory as an Agentic Brain for Reasoning
Paper
• 2601.08079
• Published
• 38
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences
Paper
• 2601.06789
• Published
• 79
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
Paper
• 2601.07264
• Published
• 24
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation
Paper
• 2601.08670
• Published
• 20
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
Paper
• 2601.04582
• Published
• 10
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
Paper
• 2601.08468
• Published
• 7
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs
Paper
• 2601.06786
• Published
• 6
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
• 2601.07348
• Published
• 114
MAXS: Meta-Adaptive Exploration with LLM Agents
Paper
• 2601.09259
• Published
• 95
EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
Paper
• 2601.09465
• Published
• 41
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG
Paper
• 2601.09028
• Published
• 34
ExpSeek: Self-Triggered Experience Seeking for Web Agents
Paper
• 2601.08605
• Published
• 16
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models
Paper
• 2601.08955
• Published
• 13
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
Paper
• 2601.06794
• Published
• 4
The AI Hippocampus: How Far are We From Human Memory?
Paper
• 2601.09113
• Published
• 5
DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing
Paper
• 2601.09609
• Published
• 3
Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning
Paper
• 2601.09536
• Published
• 5
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning
Paper
• 2601.04809
• Published
• 3
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
• 2601.08763
• Published
• 148
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Paper
• 2601.09667
• Published
• 91
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning
Paper
• 2601.07641
• Published
• 47
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
Paper
• 2601.10402
• Published
• 37
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
Paper
• 2601.10712
• Published
• 24
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
Paper
• 2601.10129
• Published
• 12
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution
Paper
• 2601.10657
• Published
• 20
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper
• 2601.06431
• Published
• 12
PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary
Paper
• 2601.10201
• Published
• 9
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
Paper
• 2601.10338
• Published
• 6
Memory Bank Compression for Continual Adaptation of Large Language Models
Paper
• 2601.00756
• Published
• 2
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
• 2601.09088
• Published
• 63
Your Group-Relative Advantage Is Biased
Paper
• 2601.08521
• Published
• 154
The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents
Paper
• 2601.11496
• Published
• 47
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
Paper
• 2601.10355
• Published
• 39
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
Paper
• 2601.11037
• Published
• 18
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
Paper
• 2601.09195
• Published
• 15
Reasoning Models Generate Societies of Thought
Paper
• 2601.10825
• Published
• 14
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
Paper
• 2601.09636
• Published
• 8
Language of Thought Shapes Output Diversity in Large Language Models
Paper
• 2601.11227
• Published
• 9
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
Paper
• 2601.08808
• Published
• 39
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper
• 2601.11004
• Published
• 30
Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs
Paper
• 2601.11061
• Published
• 7
YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation
Paper
• 2601.08441
• Published
• 8
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
Paper
• 2601.09512
• Published
• 4
Think3D: Thinking with Space for Spatial Reasoning
Paper
• 2601.13029
• Published
• 47
Toward Efficient Agents: Memory, Tool learning, and Planning
Paper
• 2601.14192
• Published
• 54
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
Paper
• 2601.13761
• Published
• 16
Aligning Agentic World Models via Knowledgeable Experience Learning
Paper
• 2601.13247
• Published
• 15
Agentic-R: Learning to Retrieve for Agentic Search
Paper
• 2601.11888
• Published
• 19
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper
• 2601.14249
• Published
• 12
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
Paper
• 2601.14209
• Published
• 6
Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning
Paper
• 2601.13697
• Published
• 4
Agentic Reasoning for Large Language Models
Paper
• 2601.12538
• Published
• 197
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance
Paper
• 2601.14171
• Published
• 50
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
• 2601.13572
• Published
• 24
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
Paper
• 2601.14750
• Published
• 17
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics
Paper
• 2601.14027
• Published
• 12
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models
Paper
• 2601.14152
• Published
• 5
The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems
Paper
• 2601.15059
• Published
• 3
Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek
Paper
• 2601.15100
• Published
• 3
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Paper
• 2601.15876
• Published
• 90
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
• 2601.16206
• Published
• 84
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
Paper
• 2601.15224
• Published
• 12
Agentic Uncertainty Quantification
Paper
• 2601.15703
• Published
• 8
Agentic Confidence Calibration
Paper
• 2601.15778
• Published
• 5
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models
Paper
• 2601.15690
• Published
• 4
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
Paper
• 2601.16746
• Published
• 89
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
Paper
• 2601.16973
• Published
• 40
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
Paper
• 2601.15808
• Published
• 20
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper
• 2601.16443
• Published
• 16
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Paper
• 2601.15715
• Published
• 13
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
Paper
• 2601.13606
• Published
• 11
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Paper
• 2601.07251
• Published
• 11
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
Paper
• 2601.11258
• Published
• 9
Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization
Paper
• 2601.13118
• Published
• 1
daVinci-Dev: Agent-native Mid-training for Software Engineering
Paper
• 2601.18418
• Published
• 124
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published
• 40
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents
Paper
• 2601.18217
• Published
• 11
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal
Paper
• 2601.18081
• Published
• 8
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts
Paper
• 2601.17111
• Published
• 5
Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests
Paper
• 2601.17617
• Published
• 4
RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents
Paper
• 2601.18130
• Published
• 1
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
Paper
• 2601.18631
• Published
• 47
Self-Distillation Enables Continual Learning
Paper
• 2601.19897
• Published
• 26
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Paper
• 2601.20614
• Published
• 118
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Paper
• 2601.19325
• Published
• 79
Reinforcement Learning via Self-Distillation
Paper
• 2601.20802
• Published
• 40
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
Paper
• 2601.20209
• Published
• 22
Linear representations in language models can change dramatically over a conversation
Paper
• 2601.20834
• Published
• 21
SERA: Soft-Verified Efficient Repository Agents
Paper
• 2601.20789
• Published
• 13
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
Paper
• 2601.19280
• Published
• 9
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution
Paper
• 2601.20380
• Published
• 8
How AI Impacts Skill Formation
Paper
• 2601.20245
• Published
• 8
VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning
Paper
• 2601.20055
• Published
• 6
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning
Paper
• 2601.20829
• Published
• 6
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives
Paper
• 2601.20833
• Published
• 177
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper
• 2601.21204
• Published
• 99
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation
Paper
• 2601.21420
• Published
• 42
Exploring Reasoning Reward Model for Agents
Paper
• 2601.22154
• Published
• 22
Language-based Trial and Error Falls Behind in the Era of Experience
Paper
• 2601.21754
• Published
• 16
Self-Improving Pretraining: using post-trained models to pretrain better models
Paper
• 2601.21343
• Published
• 17
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening
Paper
• 2601.21590
• Published
• 13
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Paper
• 2601.21598
• Published
• 9
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
Paper
• 2601.20975
• Published
• 9
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning
Paper
• 2601.22069
• Published
• 7
Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels
Paper
• 2601.21268
• Published
• 4
BMAM: Brain-inspired Multi-Agent Memory Framework
Paper
• 2601.20465
• Published
• 4
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
Paper
• 2601.19001
• Published
• 4
WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
Paper
• 2601.21872
• Published