|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- APRIL-AIGC/UltraVideo |
|
|
- DropletX/DropletVideo-10M |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Pyramidal Spectrum |
|
|
### Frequency-based Hierarchically Vector Quantized VAE for Videos |
|
|
**Official Implementation โ WACV 2026** |
|
|
|
|
|
This repository provides the **official implementation** of the paper: |
|
|
|
|
|
**Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos** |
|
|
*Accepted at WACV 2026* |
|
|
|
|
|
We introduce a **new autoencoder trained on 4K-resolution video data**, featuring a **hierarchical frequency-based vector quantization** method. |
|
|
The model leverages a **pyramidal spectral representation** to produce high-fidelity video reconstructions with an efficient latent structure. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ฆ Installation |
|
|
|
|
|
This implementation requires installing Diffusers from the custom branch: |
|
|
|
|
|
```bash |
|
|
pip install git+https://github.com/Onkarsus13/diffusers@MMVQVae |
|
|
``` |
|
|
## ๐ Features |
|
|
- Novel **hierarchical frequency-domain quantization** |
|
|
- Trained on **4K-resolution** video datasets |
|
|
- Multi-level **pyramidal spectral decomposition** |
|
|
- Highly efficient latent video representation |
|
|
- High-quality reconstructions suitable for generative pipelines |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
``` |
|
|
@inproceedings{pyramidal_spectrum_wacv2026, |
|
|
title = {Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos}, |
|
|
author = {Tushar, Prakash and Onkar, Susladkar and Inderjit, |
|
|
Inderjit Dhillon and Sparsh Mittal}, |
|
|
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, |
|
|
year = {2026} |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|