MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Paper β’ 2505.07608 β’ Published β’ 86
How to use XiaomiMiMo/MiMo-7B-MTPs with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="XiaomiMiMo/MiMo-7B-MTPs", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("XiaomiMiMo/MiMo-7B-MTPs", trust_remote_code=True, dtype="auto")
This model repository is licensed under the MIT License.
This model repository contains the pretrained MTP weights of MiMo-7B (model.mtp_layers.1 and model.mtp_layers.2)
Currently, MiMo-7B model each has 1 MTP layer (model.mtp_layers.0). Users may load the weights of pretrained MTPs for potential rollout speedup (please refer to Power Up Speculative Decoding In Reinforcement Learning).
We tuned 1 MTP layer in SFT and freeze it in RL, and we HAVE NOT test the performance of posttrained models with 2 more pretrained MTP layers.
Please contact us at mimo@xiaomi.com or open an issue if you have any questions.