pixparse/cc3m-wds
Viewer β’ Updated β’ 2.93M β’ 18.2k β’ 51
How to use gabehubner/vae-256px-8z with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("gabehubner/vae-256px-8z", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]This model is a UNet-style Variational Autoencoder (VAE) trained on the CC3M dataset for high-quality image reconstruction and generation. It integrates adversarial, perceptual, and identity-preserving loss terms to improve semantic and visual fidelity.
| Hyperparameter | Value |
|---|---|
| Dataset | CC3M (850k images) |
| Image Resolution | 256 x 256 |
| Batch Size | 16 |
| Optimizer | AdamW |
| Learning Rate | 5e-5 |
| Precision | bf16 (mixed precision) |
| Total Steps | 210,000 |
| GAN Start Step | 50,000 |
| KL Annealing | Yes (10% of training) |
| Augmentations | Crop, flip, jitter, blur, rotation |
Trained using a cosine learning rate schedule with gradient clipping and automatic mixed precision (torch.cuda.amp)