Linum-AI
/

image-video-vae

image-video-vae

Model card Files Files and versions

image-video-vae / README.md

schopra's picture

Upload README.md with huggingface_hub

de4856f verified 22 days ago

|

history blame contribute delete

1.84 kB

	---
	license: apache-2.0
	tags:
	- vae
	- video
	- image
	- autoencoder
	- 3d-convolution
	library_name: image-video-vae
	---

	# Image-Video-VAE

	3D Convolutional VAE for encoding and decoding both images and video, trained from scratch by Linum AI. [Read the blog post](https://www.linum.ai/field-notes/vae-reconstruction-vs-generation).

	## Model Description

	A 346.6M parameter 3D convolutional autoencoder that compresses images and video into a compact latent space.

	\| Property \| Value \|
	\|----------\|-------\|
	\| Spatial compression \| 8x \|
	\| Temporal compression \| 4x \|
	\| Latent channels \| 16 \|
	\| Parameters \| 346.6M (170.1M encoder, 176.5M decoder) \|

	## Quick Start

	Full documentation: [GitHub - Linum-AI/image-video-vae](https://github.com/Linum-AI/image-video-vae)

	```bash
	git clone https://github.com/Linum-AI/image-video-vae.git
	cd image-video-vae
	uv sync
	uv run python encode_decode.py --mode image --input examples/images/original/camel_closeup.jpg
	```

	Weights are downloaded automatically on first run (~1.3GB).

	## Examples

	### Image

	```bash
	uv run python encode_decode.py \
	--mode image \
	--input examples/images/original/camel_closeup.jpg
	```

	![Camel closeup](examples/camel_closeup.jpg)

	### Video

	```bash
	uv run python encode_decode.py \
	--mode video \
	--input examples/videos/original/woman_in_breeze.mp4
	```

	<video src="https://huggingface.co/Linum-AI/image-video-vae/resolve/main/examples/woman_in_breeze.mp4" controls autoplay muted loop width="100%"></video>

	## Files

	```
	└── vae.safetensors # VAE model weights (1.3GB)
	```

	## License

	[Apache 2.0](LICENSE)

	## Citation

	```bibtex
	@online{image_video_vae_2026,
	title = {VAE: Reconstruction vs. Generation},
	author = {Linum AI},
	year = {2026},
	url = {https://www.linum.ai/field-notes/vae-reconstruction-vs-generation}
	}
	```