VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Paper • 2505.23977 • Published May 29, 2025 • 10
Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach Paper • 2505.18882 • Published May 24, 2025 • 14
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Paper • 2505.14625 • Published May 20, 2025 • 13
Small Models Struggle to Learn from Strong Reasoners Paper • 2502.12143 • Published Feb 17, 2025 • 39
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 47
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 47
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17, 2024 • 51