[Question] gate_proj and up_proj weight_scale_2 mismatch in layer 3 of Kimi-K2.5-NVFP4

by MickJ - opened 21 days ago

•

Hi, thank you for releasing the Kimi-K2.5-NVFP4 quantized checkpoint!

We did some investigation and found that for all MoE layers except layer 3, the per-tensor weight_scale_2 values for gate_proj and up_proj are bit-for-bit identical for every expert — suggesting they were intentionally aligned during quantization.

However, layer 3 is the only exception: all 384 experts in this layer have independently calibrated gate_proj and up_proj scales, with relative differences up to ~70% in a few outlier experts.

It appears that during quantization, gate_proj and up_proj in layer 3 were calibrated independently rather than being treated as a single fused matrix.

Could you confirm whether this was intentional? If not, would it be possible to release an updated checkpoint with aligned scales for layer 3?

Thanks in advance for your time!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment