Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper ⢠2512.01374 ⢠Published 11 days ago ⢠88
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing Paper ⢠2507.20352 ⢠Published Jul 27
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts Paper ⢠2305.14688 ⢠Published May 24, 2023
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions Paper ⢠2401.00690 ⢠Published Jan 1, 2024 ⢠1
Building Chinese Biomedical Language Models via Multi-Level Text Discrimination Paper ⢠2110.07244 ⢠Published Oct 14, 2021
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning Paper ⢠2311.08182 ⢠Published Nov 14, 2023
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability Paper ⢠2505.24147 ⢠Published May 30
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding Paper ⢠2506.03968 ⢠Published Jun 4 ⢠15
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper ⢠2506.11763 ⢠Published Jun 13 ⢠72
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking Paper ⢠2505.20023 ⢠Published May 26
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper ⢠2509.09734 ⢠Published Sep 10 ⢠15