AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios Paper • 2508.19988 • Published Aug 27, 2025
How to Improve the Robustness of Closed-Source Models on NLI Paper • 2505.20209 • Published May 26, 2025
Meta-Reasoning Improves Tool Use in Large Language Models Paper • 2411.04535 • Published Nov 7, 2024 • 1
How can representation dimension dominate structurally pruned LLMs? Paper • 2503.04377 • Published Mar 6, 2025
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study Paper • 2504.02733 • Published Apr 3, 2025
Reverse Engineering Human Preferences with Reinforcement Learning Paper • 2505.15795 • Published May 21, 2025