faizan
feat: complete Phase 0 - project setup
fd17037

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

Scripts Directory

This directory contains reusable Python modules for the MNIST digit classification project.

Modules

Data Processing

  • data_loader.py - MNIST data loading from IDX binary format

    • MnistDataloader class for loading train/test data
  • preprocessing.py - Data preprocessing and PyTorch Dataset

    • MnistDataset - PyTorch Dataset with normalization
    • create_train_val_split() - Split training data into train/val
  • data_quality.py - Data quality analysis functions

    • Quality checks: missing values, outliers, class balance
    • generate_quality_report() - Comprehensive quality report
  • augmentation.py - Data augmentation pipeline

    • get_augmentation_pipeline() - Transform composition for training

Model

  • models.py - CNN architectures

    • BaselineCNN - 2-layer CNN baseline model
  • train.py - Training pipeline

    • train_epoch() - Single epoch training
    • validate() - Validation evaluation
    • train_model() - Complete training loop with MLflow logging
  • evaluate.py - Model evaluation

    • evaluate_model() - Comprehensive metrics computation
    • Accuracy, precision, recall, confusion matrix
  • inference.py - Production inference

    • DigitClassifier - Inference wrapper for deployment

Experiment Tracking

  • mlflow_setup.py - MLflow configuration
    • setup_mlflow() - Initialize MLflow experiment
    • Tracking URI and experiment management

Utilities

  • launch_mlflow_ui.sh - Launch MLflow UI server
  • docker_start.sh - Start Docker containers
  • docker_stop.sh - Stop Docker containers
  • docker_logs.sh - View Docker container logs

Usage

All modules are designed to be imported and used in notebooks or other scripts:

# Example: Load data
from scripts.data_loader import MnistDataloader

loader = MnistDataloader(
    training_images_filepath='data/raw/train-images.idx3-ubyte',
    training_labels_filepath='data/raw/train-labels.idx1-ubyte',
    test_images_filepath='data/raw/t10k-images.idx3-ubyte',
    test_labels_filepath='data/raw/t10k-labels.idx1-ubyte'
)
(x_train, y_train), (x_test, y_test) = loader.load_data()

Development Guidelines

  • All functions include type hints
  • All public functions have docstrings
  • Follow naming conventions (snake_case for functions, PascalCase for classes)
  • Run ruff check . --fix before committing
  • Add unit tests in tests/ directory for critical functions