| --- |
| license: mit |
| --- |
| |
| # Image compression autoencoder |
|
|
| A convolutional autoencoder trained to compress 256×256 RGB images into a compact 1024-dimensional latent representation, achieving 192× compression ratio. |
|
|
| ## Model description |
|
|
| This model learns to compress high-quality images by encoding them into a compact latent space, then reconstructing them with minimal quality loss. The encoder reduces a 196,608-value image (256×256×3) to just 1024 numbers, while the decoder reconstructs the original image from this compressed representation. |
|
|
| **Architecture:** |
| - Encoder: Convolutional layers with downsampling (256×256×3 → 1024) |
| - Decoder: Transposed convolutional layers with upsampling (1024 → 256×256×3) |
| - Activation: LeakyReLU and Sigmoid |
| - Normalization: Batch normalization |
|
|
| **Performance:** |
| - Compression ratio: 192× |
| - Target PSNR: >30 dB |
| - Target SSIM: >0.90 |
|
|
| ## Intended use |
|
|
| This model is designed for **educational purposes** to demonstrate how autoencoders can learn compression automatically from data, rather than using hand-crafted rules like JPEG or PNG. |
|
|
| **Use cases:** |
| - Understanding autoencoder architectures |
| - Learning about lossy compression |
| - Exploring latent space representations |
| - Teaching AI/ML concepts in bootcamps |
|
|
| ## Training data |
|
|
| Trained on [DF2K_OST](https://huggingface.co/datasets/gperdrizet/DF2K_OST), a combined dataset of high-quality images from: |
| - DIV2K (800 images) |
| - Flickr2K (2,650 images) |
| - OutdoorSceneTraining (10,424 images) |
|
|
| All images resized to 256×256 pixels using Lanczos resampling. |
|
|
| ## Training details |
|
|
| **Hyperparameters:** |
| - Optimizer: Adam (lr=1e-3) |
| - Loss function: Mean Squared Error (MSE) |
| - Batch size: 16 |
| - Epochs: Up to 100 (with early stopping) |
| - Train/validation split: 90/10 |
|
|
| **Callbacks:** |
| - Early stopping (patience=5, monitoring validation loss) |
| - Learning rate reduction (factor=0.5, patience=3) |
| - Model checkpoint (best validation loss) |
|
|
| **Hardware:** |
| - Single NVIDIA GPU with memory growth enabled |
|
|
| ## How to use |
|
|
| ```python |
| import shutil |
| |
| from tensorflow import keras |
| from huggingface_hub import hf_hub_download |
| |
| # Download model |
| downloaded_model = hf_hub_download( |
| repo_id='gperdrizet/compression_autoencoder', |
| filename='models/compression_ae.keras', |
| repo_type='model' |
| ) |
| |
| # Load model |
| autoencoder = keras.models.load_model(downloaded_model) |
| |
| # Use for compression/decompression |
| compressed = autoencoder.predict(images) # images shape: (N, 256, 256, 3) |
| ``` |
|
|
| For complete examples, see the [training notebook](https://github.com/gperdrizet/autoencoders/blob/main/notebooks/01-compression.ipynb). |
|
|
| ## Limitations |
|
|
| - Fixed input size (256×256 RGB images) |
| - Lossy compression (some quality loss) |
| - Not optimized for specific image types |
| - Slower than traditional codecs |
| - Educational model, not production-ready |
|
|
| ## Project repository |
|
|
| Full code, training notebooks, and interactive demo: [gperdrizet/autoencoders](https://github.com/gperdrizet/autoencoders) |
|
|
| ## Citation |
|
|
| If you use this model for educational purposes, please reference the project repository. |