IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse Paper • 2603.12201 • Published 8 days ago • 51
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published 8 days ago • 90
LLM2Vec-Gen: Generative Embeddings from Large Language Models Paper • 2603.10913 • Published 10 days ago • 40
Flash-KMeans: Fast and Memory-Efficient Exact K-Means Paper • 2603.09229 • Published 11 days ago • 79
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 11 days ago • 51
view article Article MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning 12 days ago • 15
view article Article 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 11 days ago • 38
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 14 days ago • 113
Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling Paper • 2509.08753 • Published Sep 10, 2025 • 3
view article Article NEO-unify: Building Native Multimodal Unified Models End to End 15 days ago • 102
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 201
Learn Hard Problems During RL with Reference Guided Fine-tuning Paper • 2603.01223 • Published 19 days ago • 12
PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models Paper • 2602.06053 • Published Jan 14 • 8
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 17 days ago • 97
OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens Paper • 2603.02138 • Published 18 days ago • 147
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing Paper • 2603.00141 • Published 25 days ago • 138
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 21 days ago • 96