Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Paper • 2606.18844 • Published 6 days ago • 12
Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models Paper • 2606.16700 • Published 8 days ago • 10
RepSelect: Robust LLM Unlearning via Representation Selectivity Paper • 2606.17168 • Published 8 days ago • 4
Rethinking the Role of Efficient Attention in Hybrid Architectures Paper • 2606.15378 • Published 10 days ago • 17
Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish Paper • 2606.18717 • Published 6 days ago • 5
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability Paper • 2606.19236 • Published 6 days ago • 12
Sumi: Open Uniform Diffusion Language Model from Scratch Paper • 2606.19005 • Published 6 days ago • 11
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL Paper • 2606.19162 • Published 6 days ago • 20
Learning from the Self-future: On-policy Self-distillation for dLLMs Paper • 2606.18195 • Published 7 days ago • 74
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 7 days ago • 59
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 7 days ago • 201
MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training Paper • 2606.08788 • Published 16 days ago • 4
SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling Paper • 2606.09304 • Published 15 days ago • 6
Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models Paper • 2606.11409 • Published 14 days ago • 9
Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning Paper • 2606.13106 • Published 12 days ago • 21
N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization Paper • 2606.10768 • Published 14 days ago • 24
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 12 days ago • 90
Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding? Paper • 2606.08063 • Published 17 days ago • 79