HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing Paper • 2602.03560 • Published 13 days ago • 43
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published 12 days ago • 46
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers Paper • 2601.17367 • Published 23 days ago • 33
VideoMaMa: Mask-Guided Video Matting via Generative Prior Paper • 2601.14255 • Published 27 days ago • 15
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 26 days ago • 74
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 212
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice Paper • 2601.05175 • Published Jan 8 • 36
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 226
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published Dec 23, 2025 • 38
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management Paper • 2512.12967 • Published Dec 15, 2025 • 108
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning Paper • 2512.02835 • Published Dec 2, 2025 • 10
Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Paper • 2511.20649 • Published Nov 25, 2025 • 48
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 185
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published Nov 14, 2025 • 15