Distilled Reasoning Models with Activation Sparse
AI & ML interests
ML algorithms and systems
Reproduce Deepseek distilled models based on open-r1.
-
InfiniAILab/OpenR1-Qwen-3B-SFT-Instruct
Text Generation • 3B • Updated • 5 • 1 -
InfiniAILab/OpenR1-Qwen-7B-SFT-Instruct
Text Generation • 8B • Updated • 3 • 2 -
InfiniAILab/OpenR1-Qwen-7B-Math-Instruct
Text Generation • 8B • Updated • 4 -
InfiniAILab/OpenR1-Qwen-1.5B-SFT-Instruct
Text Generation • 2B • Updated • 4
Distilled Reasoning Models with Activation Sparse
Reproduce Deepseek distilled models based on open-r1.
-
InfiniAILab/OpenR1-Qwen-3B-SFT-Instruct
Text Generation • 3B • Updated • 5 • 1 -
InfiniAILab/OpenR1-Qwen-7B-SFT-Instruct
Text Generation • 8B • Updated • 3 • 2 -
InfiniAILab/OpenR1-Qwen-7B-Math-Instruct
Text Generation • 8B • Updated • 4 -
InfiniAILab/OpenR1-Qwen-1.5B-SFT-Instruct
Text Generation • 2B • Updated • 4
models
96
InfiniAILab/Autoregressive-7B-2
2B
•
Updated
•
4
InfiniAILab/Autoregressive-7B
1.0B
•
Updated
•
3
•
1
InfiniAILab/Multiverse-7B
1B
•
Updated
•
59
InfiniAILab/Autoregressive-1.5B-2
0.2B
•
Updated
•
6
InfiniAILab/Autoregressive-1.5B
0.2B
•
Updated
•
3
•
1
InfiniAILab/Autoregressive-1.5B-no-structure
0.2B
•
Updated
•
3
InfiniAILab/Multiverse-1.5B
0.2B
•
Updated
•
3
•
1
InfiniAILab/S1-claude-1K-32B-bs16-new-tokenizer
33B
•
Updated
•
1
InfiniAILab/S1-claude-1K-32B-bs16
33B
•
Updated
•
2
InfiniAILab/S1.1-1K-32B-bs16-new-tokenizer-parallel-7.1-v6-true-mix-prompt
33B
•
Updated
•
1
datasets
22
InfiniAILab/multiverse-sample
Updated
•
10
InfiniAILab/gsm_infinite_symbolic_32k
Updated
•
363
InfiniAILab/gsm_infinite_hard_128k
Viewer
•
Updated
•
12.3k
•
317
InfiniAILab/gsm_infinite_symbolic_16k
Updated
•
549
InfiniAILab/gsm_infinite_medium_128k
Viewer
•
Updated
•
12.7k
•
402
InfiniAILab/gsm_infinite_symbolic_8k
Updated
•
662
InfiniAILab/gsm_infinite_hard_64k
Viewer
•
Updated
•
12.3k
•
63
InfiniAILab/gsm_infinite_symbolic_0
Updated
•
513
InfiniAILab/gsm_infinite_medium_64k
Viewer
•
Updated
•
21.3k
•
101
InfiniAILab/gsm_infinite_symbolic_128k
Updated
•
176