LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation Paper • 2501.05414 • Published Jan 9, 2025 • 2
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 17
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39