SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 18 days ago • 60
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 18 days ago • 60
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published Oct 22, 2025 • 20
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published Jan 24, 2025 • 27