File size: 1,838 Bytes
0c17e75
 
 
 
 
 
 
 
 
 
 
 
 
1ed5315
0c17e75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
04c508b
 
 
 
0d85b02
 
 
 
 
 
 
04c508b
 
 
0d85b02
 
 
 
 
 
 
04c508b
0c17e75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: apache-2.0
tags:
  - vae
  - video
  - image
  - autoencoder
  - 3d-convolution
library_name: image-video-vae
---

# Image-Video-VAE

3D Convolutional VAE for encoding and decoding both images and video, trained from scratch by Linum AI. [Read the blog post](https://www.linum.ai/field-notes/vae-reconstruction-vs-generation).

## Model Description

A 346.6M parameter 3D convolutional autoencoder that compresses images and video into a compact latent space.

| Property | Value |
|----------|-------|
| Spatial compression | 8x |
| Temporal compression | 4x |
| Latent channels | 16 |
| Parameters | 346.6M (170.1M encoder, 176.5M decoder) |

## Quick Start

**Full documentation: [GitHub - Linum-AI/image-video-vae](https://github.com/Linum-AI/image-video-vae)**

```bash
git clone https://github.com/Linum-AI/image-video-vae.git
cd image-video-vae
uv sync
uv run python encode_decode.py --mode image --input examples/images/original/camel_closeup.jpg
```

Weights are downloaded automatically on first run (~1.3GB).

## Examples

### Image

```bash
uv run python encode_decode.py \
  --mode image \
  --input examples/images/original/camel_closeup.jpg
```

![Camel closeup](examples/camel_closeup.jpg)

### Video

```bash
uv run python encode_decode.py \
  --mode video \
  --input examples/videos/original/woman_in_breeze.mp4
```

<video src="https://huggingface.co/Linum-AI/image-video-vae/resolve/main/examples/woman_in_breeze.mp4" controls autoplay muted loop width="100%"></video>

## Files

```
└── vae.safetensors    # VAE model weights (1.3GB)
```

## License

[Apache 2.0](LICENSE)

## Citation

```bibtex
@online{image_video_vae_2026,
  title = {VAE: Reconstruction vs. Generation},
  author = {Linum AI},
  year = {2026},
  url = {https://www.linum.ai/field-notes/vae-reconstruction-vs-generation}
}
```