deqing/convergent-llama-300M-adamw-addition_3digit_seed123 Text Generation • 0.3B • Updated about 1 month ago • 764
deqing/convergent-llama-300M-adamw-addition_3digit Text Generation • 0.3B • Updated about 1 month ago • 429
deqing/convergent-llama-300M-muon-addition_3digit Text Generation • 0.3B • Updated about 1 month ago • 407
deqing/convergent-lstm-12layer-muon-original Text Generation • 0.2B • Updated about 1 month ago • 182
deqing/convergent-mamba2-300M-adamw-original Text Generation • 0.3B • Updated about 1 month ago • 283