Rl/GRPO - a talrejaa8 Collection

talrejaa8 's Collections

LoRA

Rl/GRPO

Rl/GRPO

updated Nov 8, 2025

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Paper • 2507.05687 • Published Jul 8, 2025 • 27
Perception-Aware Policy Optimization for Multimodal Reasoning

Paper • 2507.06448 • Published Jul 8, 2025 • 47
Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

Paper • 2507.16331 • Published Jul 22, 2025 • 20
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211