Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 513
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified
Self-Play
Paper
• 2509.25541
• Published • 141
Agent Learning via Early Experience
Paper
• 2510.08558
• Published • 276
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published • 148
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP
Use
Paper
• 2509.24002
• Published • 179
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action
Model
Paper
• 2509.09372
• Published • 254
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published • 240
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
MLLMs
Paper
• 2510.09201
• Published • 50
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published • 33