Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,98 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Image compression autoencoder
|
| 6 |
+
|
| 7 |
+
A convolutional autoencoder trained to compress 256×256 RGB images into a compact 1024-dimensional latent representation, achieving 192× compression ratio.
|
| 8 |
+
|
| 9 |
+
## Model description
|
| 10 |
+
|
| 11 |
+
This model learns to compress high-quality images by encoding them into a compact latent space, then reconstructing them with minimal quality loss. The encoder reduces a 196,608-value image (256×256×3) to just 1024 numbers, while the decoder reconstructs the original image from this compressed representation.
|
| 12 |
+
|
| 13 |
+
**Architecture:**
|
| 14 |
+
- Encoder: Convolutional layers with downsampling (256×256×3 → 1024)
|
| 15 |
+
- Decoder: Transposed convolutional layers with upsampling (1024 → 256×256×3)
|
| 16 |
+
- Activation: LeakyReLU and Sigmoid
|
| 17 |
+
- Normalization: Batch normalization
|
| 18 |
+
|
| 19 |
+
**Performance:**
|
| 20 |
+
- Compression ratio: 192×
|
| 21 |
+
- Target PSNR: >30 dB
|
| 22 |
+
- Target SSIM: >0.90
|
| 23 |
+
|
| 24 |
+
## Intended use
|
| 25 |
+
|
| 26 |
+
This model is designed for **educational purposes** to demonstrate how autoencoders can learn compression automatically from data, rather than using hand-crafted rules like JPEG or PNG.
|
| 27 |
+
|
| 28 |
+
**Use cases:**
|
| 29 |
+
- Understanding autoencoder architectures
|
| 30 |
+
- Learning about lossy compression
|
| 31 |
+
- Exploring latent space representations
|
| 32 |
+
- Teaching AI/ML concepts in bootcamps
|
| 33 |
+
|
| 34 |
+
## Training data
|
| 35 |
+
|
| 36 |
+
Trained on [DF2K_OST](https://huggingface.co/datasets/gperdrizet/DF2K_OST), a combined dataset of high-quality images from:
|
| 37 |
+
- DIV2K (800 images)
|
| 38 |
+
- Flickr2K (2,650 images)
|
| 39 |
+
- OutdoorSceneTraining (10,424 images)
|
| 40 |
+
|
| 41 |
+
All images resized to 256×256 pixels using Lanczos resampling.
|
| 42 |
+
|
| 43 |
+
## Training details
|
| 44 |
+
|
| 45 |
+
**Hyperparameters:**
|
| 46 |
+
- Optimizer: Adam (lr=1e-3)
|
| 47 |
+
- Loss function: Mean Squared Error (MSE)
|
| 48 |
+
- Batch size: 16
|
| 49 |
+
- Epochs: Up to 100 (with early stopping)
|
| 50 |
+
- Train/validation split: 90/10
|
| 51 |
+
|
| 52 |
+
**Callbacks:**
|
| 53 |
+
- Early stopping (patience=5, monitoring validation loss)
|
| 54 |
+
- Learning rate reduction (factor=0.5, patience=3)
|
| 55 |
+
- Model checkpoint (best validation loss)
|
| 56 |
+
|
| 57 |
+
**Hardware:**
|
| 58 |
+
- Single NVIDIA GPU with memory growth enabled
|
| 59 |
+
|
| 60 |
+
## How to use
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
import shutil
|
| 64 |
+
|
| 65 |
+
from tensorflow import keras
|
| 66 |
+
from huggingface_hub import hf_hub_download
|
| 67 |
+
|
| 68 |
+
# Download model
|
| 69 |
+
downloaded_model = hf_hub_download(
|
| 70 |
+
repo_id='gperdrizet/compression_autoencoder',
|
| 71 |
+
filename='models/compression_ae.keras',
|
| 72 |
+
repo_type='model'
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
+
# Load model
|
| 76 |
+
autoencoder = keras.models.load_model(downloaded_model)
|
| 77 |
+
|
| 78 |
+
# Use for compression/decompression
|
| 79 |
+
compressed = autoencoder.predict(images) # images shape: (N, 256, 256, 3)
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
For complete examples, see the [training notebook](https://github.com/gperdrizet/autoencoders/blob/main/notebooks/01-compression.ipynb).
|
| 83 |
+
|
| 84 |
+
## Limitations
|
| 85 |
+
|
| 86 |
+
- Fixed input size (256×256 RGB images)
|
| 87 |
+
- Lossy compression (some quality loss)
|
| 88 |
+
- Not optimized for specific image types
|
| 89 |
+
- Slower than traditional codecs
|
| 90 |
+
- Educational model, not production-ready
|
| 91 |
+
|
| 92 |
+
## Project repository
|
| 93 |
+
|
| 94 |
+
Full code, training notebooks, and interactive demo: [gperdrizet/autoencoders](https://github.com/gperdrizet/autoencoders)
|
| 95 |
+
|
| 96 |
+
## Citation
|
| 97 |
+
|
| 98 |
+
If you use this model for educational purposes, please reference the project repository.
|