onkarsus13
/

MMVQVae

Model card Files Files and versions

MMVQVae / README.md

onkarsus13's picture

Update README.md

1e7e21f verified 2 months ago

|

history blame contribute delete

1.59 kB

	---
	license: mit
	datasets:
	- APRIL-AIGC/UltraVideo
	- DropletX/DropletVideo-10M
	language:
	- en
	---

	# Pyramidal Spectrum
	### Frequency-based Hierarchically Vector Quantized VAE for Videos
	Official Implementation — WACV 2026

	This repository provides the official implementation of the paper:

	Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos
	Accepted at WACV 2026

	We introduce a new autoencoder trained on 4K-resolution video data, featuring a hierarchical frequency-based vector quantization method.
	The model leverages a pyramidal spectral representation to produce high-fidelity video reconstructions with an efficient latent structure.

	---

	## 📦 Installation

	This implementation requires installing Diffusers from the custom branch:

	```bash
	pip install git+https://github.com/Onkarsus13/diffusers@MMVQVae
	```
	## 🚀 Features
	- Novel hierarchical frequency-domain quantization
	- Trained on 4K-resolution video datasets
	- Multi-level pyramidal spectral decomposition
	- Highly efficient latent video representation
	- High-quality reconstructions suitable for generative pipelines

	---


	```
	@inproceedings{pyramidal_spectrum_wacv2026,
	title = {Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos},
	author = {Tushar, Prakash and Onkar, Susladkar and Inderjit,
	Inderjit Dhillon and Sparsh Mittal},
	booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
	year = {2026}
	}
	```