Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs Paper • 2511.01202 • Published Nov 3 • 5
view article Article Welcome Falcon Mamba: The first strong attention-free 7B model +4 Aug 12, 2024 • 113
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14, 2024 • 34