Sneha R's picture

3 4 2

Sneha R

Sneha7

·

AI & ML interests

GenAI

Recent Activity

reacted to sergiopaniego's post with 🔥 about 3 hours ago

ICYMI, you can fine-tune open LLMs using Claude Code just tell it: “Fine-tune Qwen3-0.6B on open-r1/codeforces-cots” and Claude submits a real training job on HF GPUs using TRL. it handles everything: > dataset validation > GPU selection > training + Trackio monitoring > job submission + cost estimation when it’s done, your model is on the Hub, ready to use read more about the process: https://huggingface.co/blog/hf-skills-training

reacted to sergiopaniego's post with 🚀 about 3 hours ago

TRL v0.27.0 is out!! 🥳 It includes GDPO, the latest variant of GRPO for multi-reward RL ✨ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by @sliuau @SimonX et al. Explore the paper: https://huggingface.co/papers/2601.05242 Explore the full set of changes here: https://github.com/huggingface/trl/releases/tag/v0.27.0

upvoted a paper 3 days ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

View all activity

Organizations

None yet

spaces 1

Phi2 Helpfulness Grpo Demo

phi2-helpfulness-grpo-demo

models 0

None public yet

datasets 0

None public yet