| | --- |
| | license: apache-2.0 |
| | tags: |
| | - vae |
| | - video |
| | - image |
| | - autoencoder |
| | - 3d-convolution |
| | library_name: image-video-vae |
| | --- |
| | |
| | # Image-Video-VAE |
| |
|
| | 3D Convolutional VAE for encoding and decoding both images and video, trained from scratch by Linum AI. [Read the blog post](https://www.linum.ai/field-notes/vae-reconstruction-vs-generation). |
| |
|
| | ## Model Description |
| |
|
| | A 346.6M parameter 3D convolutional autoencoder that compresses images and video into a compact latent space. |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | Spatial compression | 8x | |
| | | Temporal compression | 4x | |
| | | Latent channels | 16 | |
| | | Parameters | 346.6M (170.1M encoder, 176.5M decoder) | |
| |
|
| | ## Quick Start |
| |
|
| | **Full documentation: [GitHub - Linum-AI/image-video-vae](https://github.com/Linum-AI/image-video-vae)** |
| |
|
| | ```bash |
| | git clone https://github.com/Linum-AI/image-video-vae.git |
| | cd image-video-vae |
| | uv sync |
| | uv run python encode_decode.py --mode image --input examples/images/original/camel_closeup.jpg |
| | ``` |
| |
|
| | Weights are downloaded automatically on first run (~1.3GB). |
| |
|
| | ## Examples |
| |
|
| | ### Image |
| |
|
| | ```bash |
| | uv run python encode_decode.py \ |
| | --mode image \ |
| | --input examples/images/original/camel_closeup.jpg |
| | ``` |
| |
|
| |  |
| |
|
| | ### Video |
| |
|
| | ```bash |
| | uv run python encode_decode.py \ |
| | --mode video \ |
| | --input examples/videos/original/woman_in_breeze.mp4 |
| | ``` |
| |
|
| | <video src="https://huggingface.co/Linum-AI/image-video-vae/resolve/main/examples/woman_in_breeze.mp4" controls autoplay muted loop width="100%"></video> |
| |
|
| | ## Files |
| |
|
| | ``` |
| | └── vae.safetensors # VAE model weights (1.3GB) |
| | ``` |
| |
|
| | ## License |
| |
|
| | [Apache 2.0](LICENSE) |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @online{image_video_vae_2026, |
| | title = {VAE: Reconstruction vs. Generation}, |
| | author = {Linum AI}, |
| | year = {2026}, |
| | url = {https://www.linum.ai/field-notes/vae-reconstruction-vs-generation} |
| | } |
| | ``` |
| |
|