Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
3
4
2
Sneha R
Sneha7
Follow
0 followers
·
16 following
AI & ML interests
GenAI
Recent Activity
reacted
to
sergiopaniego
's
post
with 🔥
about 3 hours ago
ICYMI, you can fine-tune open LLMs using Claude Code just tell it: “Fine-tune Qwen3-0.6B on open-r1/codeforces-cots” and Claude submits a real training job on HF GPUs using TRL. it handles everything: > dataset validation > GPU selection > training + Trackio monitoring > job submission + cost estimation when it’s done, your model is on the Hub, ready to use read more about the process: https://huggingface.co/blog/hf-skills-training
reacted
to
sergiopaniego
's
post
with 🚀
about 3 hours ago
TRL v0.27.0 is out!! 🥳 It includes GDPO, the latest variant of GRPO for multi-reward RL ✨ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by @sliuau @SimonX et al. Explore the paper: https://huggingface.co/papers/2601.05242 Explore the full set of changes here: https://github.com/huggingface/trl/releases/tag/v0.27.0
upvoted
a
paper
3 days ago
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
View all activity
Organizations
None yet
spaces
1
Runtime error
1
Phi2 Helpfulness Grpo Demo
🐨
phi2-helpfulness-grpo-demo
models
0
None public yet
datasets
0
None public yet