RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published Nov 27, 2025 • 14
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation Paper • 2510.19592 • Published Oct 22, 2025 • 12
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Paper • 2507.07990 • Published Jul 10, 2025 • 45
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30, 2025 • 89
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published May 29, 2025 • 29
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues Paper • 2506.00958 • Published Jun 1, 2025 • 20
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24, 2025 • 36
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper • 2505.15277 • Published May 21, 2025 • 104
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18, 2025 • 19
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild Paper • 2502.14892 • Published Feb 17, 2025 • 6
SEAL: Entangled White-box Watermarks on Low-Rank Adaptation Paper • 2501.09284 • Published Jan 16, 2025 • 10
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published Oct 17, 2024 • 44