File size: 1,838 Bytes

---
license: apache-2.0
tags:
  - vae
  - video
  - image
  - autoencoder
  - 3d-convolution
library_name: image-video-vae
---

# Image-Video-VAE

3D Convolutional VAE for encoding and decoding both images and video, trained from scratch by Linum AI. [Read the blog post](https://www.linum.ai/field-notes/vae-reconstruction-vs-generation).

## Model Description

A 346.6M parameter 3D convolutional autoencoder that compresses images and video into a compact latent space.

| Property | Value |
|----------|-------|
| Spatial compression | 8x |
| Temporal compression | 4x |
| Latent channels | 16 |
| Parameters | 346.6M (170.1M encoder, 176.5M decoder) |

## Quick Start

**Full documentation: [GitHub - Linum-AI/image-video-vae](https://github.com/Linum-AI/image-video-vae)**

```bash
git clone https://github.com/Linum-AI/image-video-vae.git
cd image-video-vae
uv sync
uv run python encode_decode.py --mode image --input examples/images/original/camel_closeup.jpg
```

Weights are downloaded automatically on first run (~1.3GB).

## Examples

### Image

```bash
uv run python encode_decode.py \
  --mode image \
  --input examples/images/original/camel_closeup.jpg
```

![Camel closeup](examples/camel_closeup.jpg)

### Video

```bash
uv run python encode_decode.py \
  --mode video \
  --input examples/videos/original/woman_in_breeze.mp4
```

<video src="https://huggingface.co/Linum-AI/image-video-vae/resolve/main/examples/woman_in_breeze.mp4" controls autoplay muted loop width="100%"></video>

## Files

```
└── vae.safetensors    # VAE model weights (1.3GB)
```

## License

[Apache 2.0](LICENSE)

## Citation

```bibtex
@online{image_video_vae_2026,
  title = {VAE: Reconstruction vs. Generation},
  author = {Linum AI},
  year = {2026},
  url = {https://www.linum.ai/field-notes/vae-reconstruction-vs-generation}
}
```