Diffusers
Safetensors
English
MMVQVae / README.md
onkarsus13's picture
Update README.md
1e7e21f verified
metadata
license: mit
datasets:
  - APRIL-AIGC/UltraVideo
  - DropletX/DropletVideo-10M
language:
  - en

Pyramidal Spectrum

Frequency-based Hierarchically Vector Quantized VAE for Videos

Official Implementation β€” WACV 2026

This repository provides the official implementation of the paper:

Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos
Accepted at WACV 2026

We introduce a new autoencoder trained on 4K-resolution video data, featuring a hierarchical frequency-based vector quantization method.
The model leverages a pyramidal spectral representation to produce high-fidelity video reconstructions with an efficient latent structure.


πŸ“¦ Installation

This implementation requires installing Diffusers from the custom branch:

pip install git+https://github.com/Onkarsus13/diffusers@MMVQVae

πŸš€ Features

  • Novel hierarchical frequency-domain quantization
  • Trained on 4K-resolution video datasets
  • Multi-level pyramidal spectral decomposition
  • Highly efficient latent video representation
  • High-quality reconstructions suitable for generative pipelines

@inproceedings{pyramidal_spectrum_wacv2026,
  title     = {Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos},
  author    = {Tushar, Prakash and Onkar, Susladkar and Inderjit, 
              Inderjit Dhillon and Sparsh Mittal},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year      = {2026}
}