Text-to-Image
Transformers
Safetensors
Hunyuan
text-generation
hunyuan
quantization
nf4
comfyui
custom-nodes
autoregressive
DiT
HunyuanImage-3.0
instruct
image-editing
bitsandbytes
4bit
distilled
custom_code
4-bit precision
Instructions to use EricRollei/HunyuanImage-3.0-Instruct-Distil-NF4-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EricRollei/HunyuanImage-3.0-Instruct-Distil-NF4-v2 with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("EricRollei/HunyuanImage-3.0-Instruct-Distil-NF4-v2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| """ | |
| Quick loader for NF4 quantized HunyuanImage-3.0-Instruct-Distil model. | |
| Generated automatically by hunyuan_quantize_instruct_distil_nf4.py | |
| This model is optimized for fast inference on consumer GPUs: | |
| - CFG distillation: No classifier-free guidance needed (no batch doubling) | |
| - Meanflow: Improved sampling in 8 steps | |
| - Only 13B active params despite 80B total (MoE) | |
| - NF4: Fits on a single 16GB GPU | |
| """ | |
| import torch | |
| from transformers import AutoModelForCausalLM, BitsAndBytesConfig | |
| def load_quantized_instruct_distil_nf4(model_path="H:\Testing\HunyuanImage-3.0-Instruct-Distil-NF4-v2"): | |
| """Load the NF4 quantized HunyuanImage-3.0-Instruct-Distil model.""" | |
| quant_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_use_double_quant=True, | |
| bnb_4bit_compute_dtype=torch.bfloat16, | |
| ) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_path, | |
| quantization_config=quant_config, | |
| device_map="cuda:0", # Should fit on single GPU | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| attn_implementation="sdpa", | |
| ) | |
| # Load tokenizer | |
| model.load_tokenizer(model_path) | |
| return model | |
| if __name__ == "__main__": | |
| print("Loading NF4 quantized Instruct-Distil model...") | |
| model = load_quantized_instruct_distil_nf4() | |
| print("Model loaded successfully!") | |
| print(f"Device map: {model.hf_device_map}") | |
| if torch.cuda.is_available(): | |
| print(f"GPU memory allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB") | |
| print(f"GPU memory reserved: {torch.cuda.memory_reserved() / 1024**3:.2f} GB") | |