AI & ML interests
DeepRL, RL finetuning
Organizations
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated
• 27k • 53
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offline-sandbox
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 62.1k • 114
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 62.1k • 107
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 62.1k • 115
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 91.9k • 245
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 91.9k • 101
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 91.9k • 188
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 62.5k • 55
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 62.5k • 78
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 62.5k • 64
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 99k • 235
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 99k • 114
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 99.1k • 176
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 62k • 41
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 62k • 48
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 62k • 63
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 100k • 95
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 100k • 165
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 100k • 71
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 61.6k • 55
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 61.6k • 61
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 61.6k • 165
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated
• 93.8k • 55
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated
• 93.8k • 68
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated
• 96.6k • 132
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated
• 26.6k • 156
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offline-sandbox