ReasoningDiversity/openthoughts3-math-300k-embedding-k4-similar Viewer • Updated 25 days ago • 75k • 16
ReasoningDiversity/openthoughts3-math-300k-embedding-k4-diverse Viewer • Updated 25 days ago • 75k • 15
ReasoningDiversity/openthoughts3-math-300k-embedding-k4-similar Viewer • Updated 25 days ago • 75k • 16
ReasoningDiversity/openthoughts3-math-300k-embedding-k4-diverse Viewer • Updated 25 days ago • 75k • 15
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers Paper • 2602.05115 • Published Feb 4 • 18 • 10
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers Paper • 2602.05115 • Published Feb 4 • 18
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 42
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 42
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6, 2025 • 191
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 97
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity Paper • 2502.11901 • Published Feb 17, 2025 • 6
Only-IF:Revealing the Decisive Effect of Instruction Diversity on Generalization Paper • 2410.04717 • Published Oct 7, 2024 • 18