view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb • May 21, 2025 • 257
view article Article KV Cache from scratch in nanoVLM +3 ariG23498, kashif, lusxvr, andito, pcuenq • Jun 4, 2025 • 119
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published Jun 3, 2025 • 27
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28, 2025 • 45
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published May 27, 2025 • 38
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 aamirshakir, tomaarsen, SeanLee97 • Mar 22, 2024 • 132
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6, 2025 • 192
view article Article 🪆 Introduction to Matryoshka Embedding Models +1 tomaarsen, Xenova, osanseviero • Feb 23, 2024 • 207
Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Paper • 2503.02199 • Published Mar 4, 2025 • 8
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20, 2025 • 175
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published Feb 20, 2025 • 16
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16, 2025 • 169
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control +2 danaaubakirova, Molbap, mshukor, cadene • Feb 4, 2025 • 192