Linum-AI
/

image-video-vae

+---
+license: apache-2.0
+tags:
+  - vae
+  - video
+  - image
+  - autoencoder
+  - 3d-convolution
+library_name: image-video-vae
+---
+# Image-Video-VAE
+3D Convolutional VAE for encoding and decoding both images and video, trained from scratch by [Linum AI](https://linum.ai). [Read the blog post](https://www.linum.ai/field-notes/vae-reconstruction-vs-generation).
+## Model Description
+A 346.6M parameter 3D convolutional autoencoder that compresses images and video into a compact latent space.
+| Property | Value |
+|----------|-------|
+| Spatial compression | 8x |
+| Temporal compression | 4x |
+| Latent channels | 16 |
+| Parameters | 346.6M (170.1M encoder, 176.5M decoder) |
+| Pixel normalization | [0, 1] |
+| Precision | bfloat16 |
+## Quick Start
+**Full documentation: [GitHub - Linum-AI/image-video-vae](https://github.com/Linum-AI/image-video-vae)**
+```bash
+git clone https://github.com/Linum-AI/image-video-vae.git
+cd image-video-vae
+uv sync
+uv run python encode_decode.py --mode image --input examples/images/original/camel_closeup.jpg
+```
+Weights are downloaded automatically on first run (~1.3GB).
+## Files
+```
+└── vae.safetensors    # VAE model weights (1.3GB)
+```
+## License
+[Apache 2.0](LICENSE)
+## Citation
+```bibtex
+@online{image_video_vae_2026,
+  title = {VAE: Reconstruction vs. Generation},
+  author = {Linum AI},
+  year = {2026},
+  url = {https://www.linum.ai/field-notes/vae-reconstruction-vs-generation}
+}
+```