--- license: apache-2.0 tags: - vae - video - image - autoencoder - 3d-convolution library_name: image-video-vae --- # Image-Video-VAE 3D Convolutional VAE for encoding and decoding both images and video, trained from scratch by Linum AI. [Read the blog post](https://www.linum.ai/field-notes/vae-reconstruction-vs-generation). ## Model Description A 346.6M parameter 3D convolutional autoencoder that compresses images and video into a compact latent space. | Property | Value | |----------|-------| | Spatial compression | 8x | | Temporal compression | 4x | | Latent channels | 16 | | Parameters | 346.6M (170.1M encoder, 176.5M decoder) | ## Quick Start **Full documentation: [GitHub - Linum-AI/image-video-vae](https://github.com/Linum-AI/image-video-vae)** ```bash git clone https://github.com/Linum-AI/image-video-vae.git cd image-video-vae uv sync uv run python encode_decode.py --mode image --input examples/images/original/camel_closeup.jpg ``` Weights are downloaded automatically on first run (~1.3GB). ## Examples ### Image ```bash uv run python encode_decode.py \ --mode image \ --input examples/images/original/camel_closeup.jpg ``` ![Camel closeup](examples/camel_closeup.jpg) ### Video ```bash uv run python encode_decode.py \ --mode video \ --input examples/videos/original/woman_in_breeze.mp4 ``` ## Files ``` └── vae.safetensors # VAE model weights (1.3GB) ``` ## License [Apache 2.0](LICENSE) ## Citation ```bibtex @online{image_video_vae_2026, title = {VAE: Reconstruction vs. Generation}, author = {Linum AI}, year = {2026}, url = {https://www.linum.ai/field-notes/vae-reconstruction-vs-generation} } ```