SOD: Step-wise On-policy Distillation for Small Language Model Agents Paper • 2605.07725 • Published 23 days ago • 24
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 12 days ago • 102
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 19 days ago • 195
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 18 days ago • 269
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification Paper • 2605.09269 • Published 21 days ago • 6
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 Text Generation • 2.43M • Updated Dec 19, 2025 • 5.53M • 6