MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14, 2025 • 300
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration Paper • 2502.01068 • Published Feb 3, 2025 • 18
Taming the Titans: A Survey of Efficient LLM Inference Serving Paper • 2504.19720 • Published Apr 28, 2025 • 12
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24, 2025 • 47