Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
-
THU-KEG/LLaDA-8B-BGPO-math
Reinforcement Learning • 8B • Updated • 13 • 1 -
THU-KEG/LLaDA-8B-BGPO-code
Reinforcement Learning • 8B • Updated • 21 • 1 -
THU-KEG/LLaDA-8B-BGPO-countdown
Reinforcement Learning • 8B • Updated • 9 • 1 -
THU-KEG/LLaDA-8B-BGPO-sudoku
Reinforcement Learning • 8B • Updated • 11 • 1