Bugai's Collection
updated
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
•
2508.20751
•
Published
•
89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
•
2508.17445
•
Published
•
80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
•
2508.19247
•
Published
•
43
VibeVoice Technical Report
Paper
•
2508.19205
•
Published
•
141
USO: Unified Style and Subject-Driven Generation via Disentangled and
Reward Learning
Paper
•
2508.18966
•
Published
•
56
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
•
2509.02547
•
Published
•
228
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
•
2509.02479
•
Published
•
83
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
•
2509.00676
•
Published
•
84
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
•
2509.01055
•
Published
•
76
Gated Associative Memory: A Parallel O(N) Architecture for Efficient
Sequence Modeling
Paper
•
2509.00605
•
Published
•
42
Open Data Synthesis For Deep Research
Paper
•
2509.00375
•
Published
•
70
DeepResearch Arena: The First Exam of LLMs' Research Abilities via
Seminar-Grounded Tasks
Paper
•
2509.01396
•
Published
•
57
Spatial Forcing: Implicit Spatial Representation Alignment for
Vision-language-action Model
Paper
•
2510.12276
•
Published
•
146
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
•
2508.03680
•
Published
•
122
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction
Transformer
Paper
•
2510.25976
•
Published
•
14
Don't Blind Your VLA: Aligning Visual Representations for OOD
Generalization
Paper
•
2510.25616
•
Published
•
96
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual
Representation
Paper
•
2511.02778
•
Published
•
101
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought
Paper
•
2511.02779
•
Published
•
58
Thinking with Video: Video Generation as a Promising Multimodal
Reasoning Paradigm
Paper
•
2511.04570
•
Published
•
211
V-Thinker: Interactive Thinking with Images
Paper
•
2511.04460
•
Published
•
97
Scaling Agent Learning via Experience Synthesis
Paper
•
2511.03773
•
Published
•
81
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
Paper
•
2511.04217
•
Published
•
16
HaluMem: Evaluating Hallucinations in Memory Systems of Agents
Paper
•
2511.03506
•
Published
•
93
IterResearch: Rethinking Long-Horizon Agents via Markovian State
Reconstruction
Paper
•
2511.07327
•
Published
•
76
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via
Gumbel-Reparameterized Soft-Thinking Policy Optimization
Paper
•
2511.06411
•
Published
•
17