From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model Paper • 2512.05277 • Published 22 days ago • 4
CASP: Compression of Large Multimodal Models Based on Attention Sparsity Paper • 2503.05936 • Published Mar 7 • 2
GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation Paper • 2403.19754 • Published Mar 28, 2024
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes Paper • 2509.06266 • Published Sep 8 • 11