nvidia/MiniMax-M2.5-NVFP4 · Add vllm to supported inference engine

Add vllm to supported inference engine

by wzhao18 - opened 7 days ago

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -61,6 +61,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 ## Software Integration:
 **Runtime Engine(s):** <br>
 * SGLang <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Blackwell <br>
@@ -95,7 +96,7 @@ The model is quantized with nvidia-modelopt **v0.43.0**  <br>
 ## Inference:
-**Engine:** SGLang <br>
 **Test Hardware:** B200 <br>
 ## Post Training Quantization
@@ -109,6 +110,17 @@ To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), y
 python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --trust-remote-code --reasoning-parser minimax-append-think --tool-call-parser minimax-m2 --moe-runner-backend flashinfer_cutlass --attention-backend flashinfer
 ```
 ### Evaluation
 The accuracy benchmark results are presented in the table below:
 <table>

 ## Software Integration:
 **Runtime Engine(s):** <br>
 * SGLang <br>
+* vLLM <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Blackwell <br>
 ## Inference:
+**Engine:** SGLang, vLLM <br>
 **Test Hardware:** B200 <br>
 ## Post Training Quantization
 python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --trust-remote-code --reasoning-parser minimax-append-think --tool-call-parser minimax-m2 --moe-runner-backend flashinfer_cutlass --attention-backend flashinfer
 ```
+To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can launch the docker image `vllm/vllm-openai:latest` and run the sample command (for B200) below:
+```sh
+vllm serve nvidia/MiniMax-M2.5-NVFP4 \
+  --tensor-parallel-size 2 \
+  --tool-call-parser minimax_m2 \
+  --reasoning-parser minimax_m2_append_think \
+  --enable-auto-tool-choice \
+  --trust-remote-code
+```
 ### Evaluation
 The accuracy benchmark results are presented in the table below:
 <table>