JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published 7 days ago • 16
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios Paper • 2411.02708 • Published Nov 5, 2024 • 1
MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning Paper • 2509.21113 • Published Sep 25, 2025 • 5
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents Paper • 2508.19493 • Published Aug 27, 2025 • 11
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Paper • 2506.10857 • Published Jun 12, 2025 • 30
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models Paper • 2505.18812 • Published May 24, 2025 • 2
VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models Paper • 2504.16359 • Published Apr 23, 2025 • 3
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5, 2025 • 20
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities Paper • 2505.21191 • Published May 27, 2025 • 3
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21, 2025 • 49
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context Paper • 2411.16213 • Published Nov 25, 2024 • 2
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions Paper • 2505.15472 • Published May 21, 2025 • 3
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video Paper • 2505.02064 • Published May 4, 2025 • 4
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30, 2025 • 57