cyberai-1
updat file
7902c8d
metadata
title: Computer Vison | Image Classification
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

Intel Scene Classifier β€” Parfait TOLEFO

CNN-based image classification Β· 6 scene categories Β· PyTorch & TensorFlow


Table of Contents


1. Project Overview

This project implements a complete image classification pipeline for the Intel Image Classification dataset. It includes:

  • Two independent CNN models: one in PyTorch, one in TensorFlow/Keras
  • A unified CLI entry point (main.py) with --mode train and --mode eval
  • A Flask web application with file upload and URL-based image loading
  • A professional green/black UI with real-time probability bars

Classes (6 categories): buildings Β· forest Β· glacier Β· mountain Β· sea Β· street


2. Dataset

Property Value
Source Kaggle β€” Intel Image Classification
Images ~25,000 RGB images (150Γ—150 px)
Train split ~14,000 images (seg_train)
Test split ~3,000 images (seg_test)
Prediction ~7,000 images (seg_pred β€” unlabeled)
Format JPEG, organized in class-named subdirectories

Expected folder structure after download:

data/
β”œβ”€β”€ seg_train/
β”‚   └── seg_train/
β”‚       β”œβ”€β”€ buildings/
β”‚       β”œβ”€β”€ forest/
β”‚       β”œβ”€β”€ glacier/
β”‚       β”œβ”€β”€ mountain/
β”‚       β”œβ”€β”€ sea/
β”‚       └── street/
β”œβ”€β”€ seg_test/
β”‚   └── seg_test/
β”‚       └── (same 6 subdirectories)
└── seg_pred/
    └── seg_pred/
        └── (unlabeled images)

3. Project Architecture

project/
β”œβ”€β”€ app.py                  ← Flask web server (inference via file or URL)
β”œβ”€β”€ main.py                 ← Unified CLI: train + eval
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ __init__.py         ← Exports CNN_Torch, build_cnn_tf, Trainer
β”‚   β”œβ”€β”€ cnn.py              ← CNN architectures (PyTorch + TensorFlow)
β”‚   └── train.py            ← Trainer class (PyTorch only)
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py         ← Exports all preprocessing functions
β”‚   └── prep.py             ← Transforms, DataLoaders, inference preprocessing
β”œβ”€β”€ templates/
β”‚   └── index.html          ← Web UI (green/black terminal aesthetic)
β”œβ”€β”€ parfait_model.pth       ← Trained PyTorch weights (after training)
β”œβ”€β”€ parfait_model.keras     ← Trained TensorFlow weights (after training)
β”œβ”€β”€ requirements.txt
└── README.md

4. Model Architecture

4.1 TensorFlow / Keras model

Input: (228, 228, 3)

Block 1: Conv2D(32, 5Γ—5, ReLU) β†’ MaxPool(2Γ—2)         β†’ 224Γ—224Γ—32 β†’ 112Γ—112Γ—32
Block 2: Conv2D(32, 5Γ—5, ReLU) β†’ MaxPool(2Γ—2)         β†’ 108Γ—108Γ—32 β†’ 54Γ—54Γ—32
Block 3: Conv2D(32, 3Γ—3, ReLU) β†’ MaxPool(2Γ—2)         β†’ 52Γ—52Γ—32  β†’ 26Γ—26Γ—32
Block 4: Conv2D(64, 3Γ—3, ReLU) β†’ MaxPool(2Γ—2)         β†’ 24Γ—24Γ—64  β†’ 12Γ—12Γ—64
Block 5: Conv2D(64, 3Γ—3, ReLU) β†’ MaxPool(2Γ—2)         β†’ 10Γ—10Γ—64  β†’ 5Γ—5Γ—64

Flatten                                                  β†’ 1600
Dense(1024, ReLU)
Dropout(0.20)
Dense(124, ReLU)
Dropout(0.20)
Dense(6, Softmax)

Trainable parameters : 1,86M 
Input size           : 228 Γ— 228 Γ— 3 (RGB)

4.1 PyTorch model

Input: (B, 3, 150, 150)

Block 1:
  Conv2d(3 β†’ 32, 3Γ—3, padding=1) 
  BatchNorm2d(32)
  ReLU
  Conv2d(32 β†’ 32, 3Γ—3, padding=1)
  BatchNorm2d(32)
  ReLU
  MaxPool2d(2)

Block 2:
  Conv2d(32 β†’ 64, 3Γ—3, padding=1)
  BatchNorm2d(64)
  ReLU
  Conv2d(64 β†’ 64, 3Γ—3, padding=1)
  BatchNorm2d(64)
  ReLU
  MaxPool2d(2)
  Dropout2d(0.10)

Block 3:
  Conv2d(64 β†’ 128, 3Γ—3, padding=1)
  BatchNorm2d(128)
  ReLU
  Conv2d(128 β†’ 128, 3Γ—3, padding=1)
  BatchNorm2d(128)
  ReLU
  MaxPool2d(2)
  Dropout2d(0.15)

Block 4:
  Conv2d(128 β†’ 256, 3Γ—3, padding=1)
  BatchNorm2d(256)
  ReLU
  Conv2d(256 β†’ 256, 3Γ—3, padding=1)
  BatchNorm2d(256)
  ReLU
  MaxPool2d(2)
  Dropout2d(0.20)

AdaptiveAvgPool2d(1)     β†’ (B, 256, 1, 1)
Flatten                  β†’ (B, 256)
Linear(256 β†’ 256)
ReLU
Dropout(0.30)
Linear(256 β†’ 6)


Trainable parameters :  1.24M 
Input size           : 150 Γ— 150 Γ— 3 (RGB)

Training configuration:

Parameter Value
Optimizer Adam
Learning rate 1e-4
LR scheduler ReduceLROnPlateau (factor=0.5, patience=3)
Early stopping patience=5
Batch size 32
Max epochs 50
Loss function CrossEntropyLoss / SparseCategoricalCrossentropy

5. Dependencies & Installation

Python 3.9+ is required.

# Install dependencies
pip install -r requirements.txt

requirements.txt:

torch>=2.0.0
torchvision>=0.15.0
tensorflow>=2.13.0
flask>=3.0.0
pillow>=10.0.0
numpy>=1.24.0
matplotlib>=3.7.0
tqdm>=4.65.0
scikit-learn>=1.3.0
gunicorn>=21.0.0

6. Usage

6.1 Training

# Train with PyTorch (saves β†’ parfait_model.pth)
python main.py --model pytorch --mode train

# Train with TensorFlow (saves β†’ parfait_model.keras)
python main.py --model tensorflow --mode train

# Full example with all options
python main.py \
    --model      pytorch \
    --mode       train \
    --data_dir   ./data \
    --output_dir ./outputs \
    --epochs     50 \
    --batch_size 32 \
    --lr         1e-4 \
    --patience   15

All CLI arguments:

Argument Default Description
--model (required) pytorch or tensorflow
--mode (required) train or eval
--data_dir /kaggle/input/.../intel-image-... Root directory of the dataset
--output_dir /kaggle/working Where to save models and plots
--epochs 50 Max training epochs
--batch_size 32 Batch size
--lr 1e-4 Initial learning rate
--patience 15 Early stopping patience
--model_path (auto) (eval only) Path to .pth or .keras

Training outputs:

outputs/
β”œβ”€β”€ parfait_model.pth          ← Best PyTorch weights
β”œβ”€β”€ parfait_model.keras        ← Best TensorFlow weights
β”œβ”€β”€ history_pytorch.png        ← Train/Val Loss & Accuracy curves
└── history_tf.png

6.2 Evaluation

The eval mode loads a saved model and produces a full diagnostic report:

  • Global accuracy & loss
  • Per-class accuracy
  • Precision / Recall / F1-score (classification report)
  • Confusion matrix (saved as PNG)
  • 4Γ—4 grid of sample predictions (color-coded: green=correct, red=wrong)
# Evaluate PyTorch model
python main.py \
    --model      pytorch \
    --mode       eval \
    --model_path parfait_model.pth \
    --data_dir   ../data \
    --output_dir ./outputs

# Evaluate TensorFlow model
python main.py \
    --model      tensorflow \
    --mode       eval \
    --model_path parfait_model.keras \
    --data_dir   ../data \
    --output_dir ./outputs

Evaluation outputs:

outputs/
β”œβ”€β”€ confusion_matrix_pytorch.png      ← Confusion matrix heatmap
β”œβ”€β”€ confusion_matrix_tf.png
β”œβ”€β”€ sample_predictions_pytorch.png    ← 16-image prediction grid
└── sample_predictions_tf.png

6.3 Web Application

# Start Flask server
gunicorn app:app --bind 0.0.0.0:8000 --workers 1 --timeout 120

Live link

For instance the app is available at: https://huggingface.co/spaces/CyberAl/Image_Classification_Parfait_TOLEFO

Features:

  • Model selector: PyTorch or TensorFlow
  • Input: file upload (drag & drop) or image URL
  • Output: predicted class + confidence score + probability bars for all 6 classes
  • Animated plexus background with terminal green/black aesthetic

7. Performance

Results on the Intel Image Classification test set (3,000 images). Reported after training with default hyperparameters on Kaggle GPU T4.

Model Test Accuracy Test Loss
PyTorch CNN ~89–91% ~0.30
TF/Keras CNN ~88–90% ~0.32

Per-class performance (approximate):

Class Precision Recall F1-score
buildings 0.87 0.85 0.86
forest 0.97 0.97 0.97
glacier 0.88 0.86 0.87
mountain 0.84 0.87 0.85
sea 0.92 0.93 0.92
street 0.90 0.91 0.90

Note: buildings vs street is the hardest pair due to visual overlap.


8. Preprocessing & Augmentation

All preprocessing is centralized in utils/prep.py.

Training augmentation pipeline (PyTorch)

Resize(150Γ—150)
RandomHorizontalFlip(p=0.5)
RandomVerticalFlip(p=0.1)
RandomRotation(Β±40Β°)
ColorJitter(brightness=0.3, contrast=0.2, saturation=0.1, hue=0.05)
RandomGrayscale(p=0.05)         ← forces texture learning over color
ToTensor()
Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225])   ← ImageNet stats
RandomErasing(p=0.15, scale=[0.02,0.15])    ← occlusion simulation

Validation / inference (no augmentation)

Resize(150Γ—150)
ToTensor()
Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225])

Why ImageNet normalization?

The dataset consists of natural outdoor scenes (RGB, 3-channel images similar to ImageNet). Using ImageNet mean/std ensures stable gradient flow and faster convergence even for a custom-trained CNN.


9. Reproducibility (Seed)

The project uses a global seed (SEED=42) to ensure identical results between runs and between training and production inference.

The seed fixes:

  • Python random module
  • NumPy RNG
  • PyTorch CPU and GPU (torch.manual_seed, torch.cuda.manual_seed_all)
  • cudnn.deterministic=True, cudnn.benchmark=False
  • TensorFlow RNG (tf.random.set_seed)
  • PYTHONHASHSEED environment variable
  • DataLoader worker seeds (via worker_init_fn)