mnist-gan / README.md
rvikranth10's picture
fix: update dependencies and Gradio 5 compat for HF Spaces rebuild
81e99e4

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: MNIST GAN Digit Generator
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: mit

MNIST Digit Generator

A production-ready Generative Adversarial Network (GAN) for synthesizing realistic handwritten digits, deployed as an interactive web application.

Hugging Face Spaces PyTorch License: MIT

Overview

This project implements a Generative Adversarial Network from scratch using PyTorch, demonstrating proficiency in:

  • Deep learning architecture design and implementation
  • Adversarial training with stability techniques
  • Production-grade code with error handling and logging
  • Interactive application deployment

Launch Demo on Hugging Face πŸš€

Key Features

  • High-Quality Generation: Synthesizes realistic handwritten digits indistinguishable from MNIST samples
  • Production-Ready Code: Comprehensive error handling, logging, input validation, and type hints
  • Interactive Interface: Professional Gradio application with configurable parameters
  • Stable Training: Achieved convergence over 200 epochs without mode collapse

Technical Specifications

Model Architecture

Generator Network

Input: 100-dim noise vector (N(0,1))
  ↓ Linear(100β†’256) + LeakyReLU + BatchNorm
  ↓ Linear(256β†’512) + LeakyReLU + BatchNorm  
  ↓ Linear(512β†’1024) + LeakyReLU + BatchNorm
  ↓ Linear(1024β†’784) + Tanh
Output: 28Γ—28 image (range: [-1,1])

Parameters: 1,489,936

Discriminator Network

Input: 784-dim flattened image
  ↓ Linear(784β†’1024) + LeakyReLU + Dropout(0.3)
  ↓ Linear(1024β†’512) + LeakyReLU + Dropout(0.3)
  ↓ Linear(512β†’256) + LeakyReLU + Dropout(0.3)
  ↓ Linear(256β†’1) + Sigmoid
Output: Probability [0,1]

Parameters: 1,460,225

Training Configuration

Parameter Value
Dataset MNIST (60,000 samples)
Epochs 200
Batch Size 128
Optimizer Adam
Learning Rate 0.0002
Beta1 0.5
Loss Function Binary Cross-Entropy
Device CPU / GPU (MPS/CUDA)
Training Time ~30-45 min (GPU) / ~2-3 hours (CPU)

Performance Metrics

Metric Value
Final Generator Loss 0.981
Final Discriminator Loss 1.213
Mode Collapse None observed
Output Quality Realistic, diverse
Convergence Stable

Results

Training Progress

Epoch Discriminator Loss Generator Loss Quality
1 0.698 1.638 Random noise
50 1.314 0.816 Emerging shapes
100 1.298 0.838 Clear digits
150 1.253 0.916 Refined quality
200 1.213 0.981 High quality

Sample Outputs


Epoch 50

Epoch 100

Epoch 200

Training Loss

Project Structure

mnist-gan/
β”œβ”€β”€ app.py                          # Production Gradio application
β”œβ”€β”€ GAN_MNIST_Assignment.ipynb      # Training notebook with analysis
β”œβ”€β”€ generator_model.pth             # Trained model weights (~17 MB)
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ README.md                       # Project documentation
β”œβ”€β”€ GITHUB_SETUP.md                 # Deployment instructions
β”œβ”€β”€ LICENSE                         # MIT License
β”œβ”€β”€ .gitignore                      # Git configuration
β”œβ”€β”€ losses.png                      # Training visualization
β”œβ”€β”€ samples/                         # Generated samples
β”‚   β”œβ”€β”€ epoch_001.png
β”‚   β”œβ”€β”€ epoch_050.png
β”‚   β”œβ”€β”€ epoch_100.png
β”‚   β”œβ”€β”€ epoch_150.png
β”‚   └── epoch_200.png
└── data/                           # MNIST dataset (created automatically)
    └── MNIST/                      # Downloaded by torchvision on first run

Note: The data/ directory is created automatically when you run the notebook. It's excluded from git via .gitignore to keep the repository size small.

Hardware Acceleration

This project automatically detects and uses GPU acceleration when available:

Device Type Technology Training Time Speedup
Apple Silicon (M1/M2/M3/M4) Metal (MPS) ~30-45 min 3-5x
NVIDIA GPU CUDA ~20-30 min 5-10x
CPU Native ~2-3 hours 1x (baseline)

No configuration needed - the code automatically selects the best available device!

Quick Start

Option 1: Use Pre-trained Model (Fast)

If you just want to run the application with the pre-trained model:

# Clone repository
git clone https://github.com/vikranth1000/mnist-gan.git
cd mnist-gan

# Install dependencies
pip install -r requirements.txt

# Launch application (uses pre-trained generator_model.pth)
python app.py

Access at http://localhost:7860

Option 2: Reproduce Training from Scratch

To reproduce the entire training process:

Prerequisites

  • Python 3.9 or higher
  • Jupyter Notebook or JupyterLab
  • ~2-3 hours for training (on CPU)

Step-by-Step Instructions

  1. Clone the repository

    git clone https://github.com/vikranth1000/mnist-gan.git
    cd mnist-gan
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Open the training notebook

    jupyter notebook GAN_MNIST_Assignment.ipynb
    # Or use: jupyter lab GAN_MNIST_Assignment.ipynb
    
  4. Run all cells sequentially

    • The notebook will automatically download the MNIST dataset (stored in data/ directory)
    • Dataset size: ~60MB (downloaded automatically on first run)
    • Training takes approximately 2-3 hours on CPU
  5. Model is automatically saved

    • Cell 26 automatically saves generator_model.pth after training completes
    • This file is required for the Gradio application
  6. Verify outputs

    • Check samples/ directory for generated images at epochs 1, 50, 100, 150, 200
    • Check losses.png for training loss visualization
    • Verify generator_model.pth exists (~17 MB)

Expected Results

After training, you should see:

  • Final Generator Loss: ~0.98
  • Final Discriminator Loss: ~1.21
  • Training Time:
    • Apple Silicon M4: ~30-45 minutes
    • NVIDIA GPU: ~20-30 minutes
    • CPU: ~2-3 hours
  • Model File: generator_model.pth (~17 MB)
  • Sample Images: samples/epoch_*.png (5 files)
  • Loss Plot: losses.png

Notes

  • Data Download: MNIST dataset downloads automatically via torchvision.datasets.MNIST (no manual download needed)
  • Reproducibility: The notebook uses seed=42 by default for reproducible results
  • GPU Acceleration: Automatically detects and uses available GPU:
    • Apple Silicon (M1/M2/M3/M4): Uses Metal Performance Shaders (MPS) - 3-5x faster
    • NVIDIA GPUs: Uses CUDA - 5-10x faster
    • CPU Fallback: Works on any system, just slower
  • Memory: Training requires ~2-4 GB RAM

Deployment

See GITHUB_SETUP.md for detailed deployment instructions to Hugging Face Spaces.

Code Quality

Production Features

  • Type Hints: Full type annotation for better code maintainability
  • Error Handling: Comprehensive exception handling with graceful degradation
  • Logging: Structured logging for debugging and monitoring
  • Input Validation: Parameter validation and sanitization
  • Memory Management: Proper resource cleanup and matplotlib figure handling
  • Documentation: Detailed docstrings and inline comments
  • Configuration: Constants and configuration management
  • Modularity: Clean separation of concerns with dedicated classes

Best Practices

  • PEP 8 style compliance
  • Defensive programming
  • Single Responsibility Principle
  • DRY (Don't Repeat Yourself)
  • Clear error messages
  • Resource management

Technical Skills Demonstrated

  • Deep Learning: GAN architecture, adversarial training, loss functions
  • PyTorch: Model building, training loops, gradient management
  • Software Engineering: Code organization, error handling, logging
  • Web Development: Gradio interface, user experience design
  • DevOps: Deployment, monitoring, documentation
  • Best Practices: Type hints, validation, testing

Implementation Details

Key Design Decisions

  1. LeakyReLU Activation: Prevents dying neurons in discriminator (Ξ±=0.2)
  2. Batch Normalization: Stabilizes generator training and improves convergence
  3. Dropout Regularization: Prevents discriminator overfitting (p=0.3)
  4. Tanh Output: Maps to [-1, 1] range matching normalized MNIST data
  5. Temperature Scaling: Allows control over output diversity

Training Stability Techniques

  • Balanced learning rates for generator and discriminator
  • Batch normalization in generator
  • Dropout in discriminator
  • Separate optimization steps
  • Gradient clipping (implicit through Adam)

Future Enhancements

  • Conditional GAN (cGAN) for digit-specific generation
  • Deep Convolutional GAN (DCGAN) architecture
  • Wasserstein GAN (WGAN) for improved stability
  • FID score evaluation
  • Latent space interpolation visualization
  • Batch generation API
  • Model versioning and A/B testing

Reproducing This Project

Quick Summary:

  1. Clone repository
  2. Install dependencies: pip install -r requirements.txt
  3. Open GAN_MNIST_Assignment.ipynb in Jupyter
  4. Run all cells sequentially (Cells 1-26)
  5. MNIST dataset downloads automatically (~60 MB)
  6. Training automatically uses GPU if available:
    • Apple Silicon: ~30-45 minutes
    • NVIDIA GPU: ~20-30 minutes
    • CPU: ~2-3 hours
  7. Cell 26 automatically saves generator_model.pth after training completes

References

License

This project is licensed under the MIT License - see LICENSE for details.

Author

Vikranth Reddimasu

Acknowledgments

  • MNIST dataset by Yann LeCun, Corinna Cortes, and Christopher Burges
  • PyTorch team for the deep learning framework
  • Hugging Face for the Spaces platform
  • Ian Goodfellow for pioneering GANs

Built with PyTorch and Gradio