MobileNetDemo / README.md
Jahnavibh's picture
Update Space title and remove Gradio footer
583aa78

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: Person Classification Demo
emoji: πŸ“±
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

πŸ“± MobileNetV2 Image Classification Demo

A lightweight, interactive image classification demo built with Hugging Face Transformers and Gradio. This demo replicates the clean interface style of popular Whisper demos but for computer vision classification tasks.

✨ Features

  • Mobile-Optimized Model: Uses MobileNetV2, designed specifically for efficient mobile and edge deployment
  • Interactive Web Interface: Clean, modern Gradio interface similar to Whisper demos
  • Real-time Classification: Instant image classification with top-5 predictions
  • 1000 ImageNet Classes: Recognizes a wide variety of objects, animals, vehicles, and scenes
  • Confidence Scores: Shows prediction confidence as percentages
  • Example Images: Pre-loaded example images for quick testing
  • Responsive Design: Works seamlessly on desktop and mobile devices
  • Lightweight: Only 3.4M parameters for fast inference

πŸš€ Quick Start

Option 1: Hugging Face Spaces (Recommended)

Deploy instantly to Hugging Face Spaces:

  1. Create a new Space on Hugging Face Spaces
  2. Upload these files: app.py, requirements.txt, README.md
  3. Your demo will be live automatically!

Option 2: Local Development

# Clone this repository
git clone <your-repo-url>
cd mobilenetv2-classification-demo

# Install dependencies
pip install -r requirements.txt

# Run the demo
python app.py

The demo will be available at http://localhost:7860

🎯 How to Use

  1. Upload Image: Click on the upload area or drag and drop an image
  2. Get Results: Classification happens automatically, showing top-5 predictions
  3. Try Examples: Use the example buttons to test with sample images
  4. View Confidence: Each prediction shows a confidence percentage

πŸ“Š Model Information

  • Model: google/mobilenet_v2_1.0_224
  • Architecture: MobileNetV2 with 1.0 width multiplier
  • Input Size: 224Γ—224 pixels
  • Parameters: 3.4 million (lightweight!)
  • Classes: 1,000 ImageNet categories
  • Optimization: Designed for mobile and edge devices

πŸ”§ Technical Details

Dependencies

  • PyTorch: Deep learning framework
  • Transformers: Hugging Face model library
  • Gradio: Web interface framework
  • Pillow: Image processing
  • NumPy: Numerical computing

Model Architecture

MobileNetV2 uses:

  • Depthwise Separable Convolutions: Reduces computational cost
  • Inverted Residuals: Efficient feature extraction
  • Linear Bottlenecks: Maintains representational power
  • ReLU6 Activation: Optimized for mobile hardware

🌟 Key Advantages

vs. Heavy Models (ResNet, EfficientNet)

  • βœ… Faster inference (optimized for mobile)
  • βœ… Smaller memory footprint
  • βœ… Better battery efficiency
  • βœ… Edge deployment ready

vs. Other Demos

  • βœ… Whisper-style interface (familiar UX)
  • βœ… Auto-classification (no manual buttons needed)
  • βœ… Clean, modern design
  • βœ… Mobile-responsive

πŸ“± Mobile Deployment

This model is specifically designed for mobile deployment:

# Example mobile optimization
model = MobileNetV2ForImageClassification.from_pretrained(
    "google/mobilenet_v2_1.0_224",
    torch_dtype=torch.float16,  # Half precision
    low_cpu_mem_usage=True      # Memory optimization
)

🎨 Customization

Adding New Examples

example_urls = {
    "Your Category": "https://your-image-url.com/image.jpg",
    # Add more examples here
}

Adjusting UI Theme

theme = gr.themes.Soft()  # Options: Soft, Default, Monochrome

Changing Model

MODEL_NAME = "google/mobilenet_v2_1.4_224"  # Larger variant
MODEL_NAME = "google/mobilenet_v2_0.75_224"  # Smaller variant

πŸ”„ Image Processing Pipeline

  1. Input: User uploads image (any format/size)
  2. Preprocessing: Resize to 224Γ—224, normalize
  3. Inference: MobileNetV2 forward pass
  4. Postprocessing: Apply softmax, get top-5
  5. Output: Formatted predictions with confidence

πŸš€ Performance

  • Inference Speed: ~50ms on CPU, ~10ms on GPU
  • Memory Usage: ~200MB RAM
  • Model Size: ~14MB
  • Throughput: 20+ images/second on modern hardware

πŸ“š Example Classes

The model can classify 1,000 ImageNet categories including:

  • Animals: Dogs, cats, birds, wildlife
  • Vehicles: Cars, trucks, motorcycles, aircraft
  • Objects: Furniture, electronics, tools
  • Food: Fruits, vegetables, dishes
  • Nature: Plants, landscapes, weather

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your improvements
  4. Submit a pull request

πŸ“„ License

This project is open source and available under the MIT License.

πŸ™ Acknowledgments

  • Google Research: For MobileNetV2 architecture
  • Hugging Face: For the Transformers library and model hosting
  • Gradio Team: For the amazing web interface framework
  • ImageNet: For the comprehensive dataset

πŸ”— Links


Built with ❀️ using Hugging Face Transformers and Gradio