LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training Paper • 2510.14969 • Published Oct 16 • 7
Vibe Checker: Aligning Code Evaluation with Human Preference Paper • 2510.07315 • Published Oct 8 • 32
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search Paper • 2502.02584 • Published Feb 4 • 17