Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability

Website arXiv

Most existing video VAEs prioritize reconstruction fidelity, often overlooking the latent structure's impact on downstream diffusion training. Our research identifies properties of video VAE latent spaces that facilitate diffusion training through statistical analysis of VAE latents. Our key finding is that biased, rather than uniform, spectra lead to improved diffusability. Motivated by this, we introduce SSVAE (Spectral-Structured VAE), which optimizes the * spectral properties* of the latent space to enhance its "Diffusability".

Figure 1

πŸ”₯ Key Highlights

  • Spectral Analysis of Latents: We identify two statistical properties essential for efficient diffusion training: a low-frequency biased spatio-temporal spectrum and a few-mode biased channel eigenspectrum.
  • Local Correlation Regularization (LCR): A lightweight regularizer that explicitly enhances local spatio-temporal correlations to induce low-frequency bias.
  • Latent Masked Reconstruction (LMR): A mechanism that simultaneously promotes few-mode bias and improves decoder robustness against noise.
  • Superior Performance:
    • πŸš€ 3Γ— Faster Convergence: Accelerates text-to-video generation convergence by 3Γ— compared to strong baselines.
    • πŸ“ˆ Higher Quality: Achieves a 10% gain in video reward scores (UnifiedReward).
    • πŸ† Outperforms SOTA: Surpasses open-source VAEs (e.g., Wan 2.2, CogVideoX) in generation quality with fewer parameters.

Using Model

Please View our Github.

Citation

If you find this work useful in your research, please consider citing:

@misc{liu2025delvinglatentspectralbiasing,
      title={Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability}, 
      author={Shizhan Liu and Xinran Deng and Zhuoyi Yang and Jiayan Teng and Xiaotao Gu and Jie Tang},
      year={2025},
      eprint={2512.05394},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.05394}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support