Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 26 days ago • 259
Discovering Multiagent Learning Algorithms with Large Language Models Paper • 2602.16928 • Published 17 days ago • 16
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding Paper • 2512.16229 • Published Dec 18, 2025 • 16
view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware Aug 8, 2025 • 33
view article Article Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance May 21, 2025 • 39
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 81
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published Jun 13, 2025 • 20
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10, 2025 • 30
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 94
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 13
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 54
view article Article Llama can now see and run on your device - welcome Llama 3.2 +5 Sep 25, 2024 • 191
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 Sep 18, 2024 • 276
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20, 2024 • 14
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 72