LLM Architecture - a JM-Brun Collection

JM-Brun 's Collections

RL

Diffusion models

Prompt Optimization

Tabular

Agents

SLMs

LLM-KG

LLM Architecture

Interpretability XAI

LLM Architecture

updated Sep 26, 2025

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 24
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18
Scaling Embedding Layers in Language Models

Paper • 2502.01637 • Published Feb 3, 2025 • 24
Taming the Titans: A Survey of Efficient LLM Inference Serving

Paper • 2504.19720 • Published Apr 28, 2025 • 12
EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published Sep 24, 2025 • 47