ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models Paper • 2512.07843 • Published 21 days ago • 19
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7, 2024 • 51
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Paper • 2405.19325 • Published May 29, 2024 • 14