llm-course-hw2 llm course @ HSE and vk llm A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness tsessk/llm-course-hw2-dpo Text Generation • 0.1B • Updated Mar 8 • 8 tsessk/llm-course-hw2-reward-model Text Classification • 0.1B • Updated Mar 8 • 10 tsessk/llm-course-hw2-ppo Text Generation • 0.1B • Updated Mar 8 • 6
llm-course-hw2 llm course @ HSE and vk llm A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness tsessk/llm-course-hw2-dpo Text Generation • 0.1B • Updated Mar 8 • 8 tsessk/llm-course-hw2-reward-model Text Classification • 0.1B • Updated Mar 8 • 10 tsessk/llm-course-hw2-ppo Text Generation • 0.1B • Updated Mar 8 • 6