Submitted by yfdeng 52 MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head DAGroup-PKU 119 3