Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe Paper • 2605.03677 • Published 29 days ago • 27
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 242
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning Paper • 2603.16929 • Published Mar 14 • 13