File size: 3,061 Bytes
2bef96e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: mit
---

# Image compression autoencoder

A convolutional autoencoder trained to compress 256×256 RGB images into a compact 1024-dimensional latent representation, achieving 192× compression ratio.

## Model description

This model learns to compress high-quality images by encoding them into a compact latent space, then reconstructing them with minimal quality loss. The encoder reduces a 196,608-value image (256×256×3) to just 1024 numbers, while the decoder reconstructs the original image from this compressed representation.

**Architecture:**
- Encoder: Convolutional layers with downsampling (256×256×3 → 1024)
- Decoder: Transposed convolutional layers with upsampling (1024 → 256×256×3)
- Activation: LeakyReLU and Sigmoid
- Normalization: Batch normalization

**Performance:**
- Compression ratio: 192×
- Target PSNR: >30 dB
- Target SSIM: >0.90

## Intended use

This model is designed for **educational purposes** to demonstrate how autoencoders can learn compression automatically from data, rather than using hand-crafted rules like JPEG or PNG.

**Use cases:**
- Understanding autoencoder architectures
- Learning about lossy compression
- Exploring latent space representations
- Teaching AI/ML concepts in bootcamps

## Training data

Trained on [DF2K_OST](https://huggingface.co/datasets/gperdrizet/DF2K_OST), a combined dataset of high-quality images from:
- DIV2K (800 images)
- Flickr2K (2,650 images)
- OutdoorSceneTraining (10,424 images)

All images resized to 256×256 pixels using Lanczos resampling.

## Training details

**Hyperparameters:**
- Optimizer: Adam (lr=1e-3)
- Loss function: Mean Squared Error (MSE)
- Batch size: 16
- Epochs: Up to 100 (with early stopping)
- Train/validation split: 90/10

**Callbacks:**
- Early stopping (patience=5, monitoring validation loss)
- Learning rate reduction (factor=0.5, patience=3)
- Model checkpoint (best validation loss)

**Hardware:**
- Single NVIDIA GPU with memory growth enabled

## How to use

```python
import shutil

from tensorflow import keras
from huggingface_hub import hf_hub_download

# Download model
downloaded_model = hf_hub_download(
    repo_id='gperdrizet/compression_autoencoder',
    filename='models/compression_ae.keras',
    repo_type='model'
)

# Load model
autoencoder = keras.models.load_model(downloaded_model)

# Use for compression/decompression
compressed = autoencoder.predict(images)  # images shape: (N, 256, 256, 3)
```

For complete examples, see the [training notebook](https://github.com/gperdrizet/autoencoders/blob/main/notebooks/01-compression.ipynb).

## Limitations

- Fixed input size (256×256 RGB images)
- Lossy compression (some quality loss)
- Not optimized for specific image types
- Slower than traditional codecs
- Educational model, not production-ready

## Project repository

Full code, training notebooks, and interactive demo: [gperdrizet/autoencoders](https://github.com/gperdrizet/autoencoders)

## Citation

If you use this model for educational purposes, please reference the project repository.