Gemma fp8 proplematic structure, question regarding the selective "mixed precision" quantization in specific layers (e.g., Layer 46),, comfy_quant

#8
by Sikaworld1990 - opened

Hi there,
I was recently analyzing the safetensors structure of this mixed-precision model as I am researching for different methods of quantization strategy. While doing so, I noticed a very specific pattern in the deeper layers, for example in Layer 46.
It appears that several weights in this layer were explicitly excluded from quantization (lacking the .comfy_quant and weight_scale parameters, presumably kept in BF16/FP16). This applies as follows:
model.layers.46.mlp.down_proj.weight
model.layers.46.self_attn.k_proj.weight
model.layers.46.self_attn.o_proj.weight
model.layers.46.self_attn.v_proj.weight
However, the Query-Projection in the exact same layer seems to be quantized:
model.layers.46.self_attn.q_proj.comfy_quant. Is this due to a selective quantization or a technical artifact of the ComfyUI native quantizer?
Is there a specific technical reason why q_proj was quantized here while K, V, O, and down_proj were protected? Was this done via an automated sensitivity analysis (like AutoRound/AWQ), or was it a manual choice because q_proj is less sensitive to precision loss in deeper layers for video/diffusion workflows? From my first impression the structure could be highly problematic because it contains random, inconsistent quantization failures across various layers—mixing high and low precision within the same attention blocks—which causes severe signal mismatch and erratic movements in video diffusion models

Sign up or log in to comment