I. Pretrained MTPs of MiMo-7B

This model repository contains the pretrained MTP weights of MiMo-7B (model.mtp_layers.1 and model.mtp_layers.2)

Currently, MiMo-7B model each has 1 MTP layer (model.mtp_layers.0). Users may load the weights of pretrained MTPs for potential rollout speedup (please refer to Power Up Speculative Decoding In Reinforcement Learning).

We tuned 1 MTP layer in SFT and freeze it in RL, and we HAVE NOT test the performance of posttrained models with 2 more pretrained MTP layers.

II. Contact

Please contact us at mimo@xiaomi.com or open an issue if you have any questions.

Downloads last month: 64

Safetensors

Model size

0.4B params

Tensor type

BF16

Paper for XiaomiMiMo/MiMo-7B-MTPs

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12, 2025 • 86