# Scripts Directory This directory contains reusable Python modules for the MNIST digit classification project. ## Modules ### Data Processing - **`data_loader.py`** - MNIST data loading from IDX binary format - `MnistDataloader` class for loading train/test data - **`preprocessing.py`** - Data preprocessing and PyTorch Dataset - `MnistDataset` - PyTorch Dataset with normalization - `create_train_val_split()` - Split training data into train/val - **`data_quality.py`** - Data quality analysis functions - Quality checks: missing values, outliers, class balance - `generate_quality_report()` - Comprehensive quality report - **`augmentation.py`** - Data augmentation pipeline - `get_augmentation_pipeline()` - Transform composition for training ### Model - **`models.py`** - CNN architectures - `BaselineCNN` - 2-layer CNN baseline model - **`train.py`** - Training pipeline - `train_epoch()` - Single epoch training - `validate()` - Validation evaluation - `train_model()` - Complete training loop with MLflow logging - **`evaluate.py`** - Model evaluation - `evaluate_model()` - Comprehensive metrics computation - Accuracy, precision, recall, confusion matrix - **`inference.py`** - Production inference - `DigitClassifier` - Inference wrapper for deployment ### Experiment Tracking - **`mlflow_setup.py`** - MLflow configuration - `setup_mlflow()` - Initialize MLflow experiment - Tracking URI and experiment management ### Utilities - **`launch_mlflow_ui.sh`** - Launch MLflow UI server - **`docker_start.sh`** - Start Docker containers - **`docker_stop.sh`** - Stop Docker containers - **`docker_logs.sh`** - View Docker container logs ## Usage All modules are designed to be imported and used in notebooks or other scripts: ```python # Example: Load data from scripts.data_loader import MnistDataloader loader = MnistDataloader( training_images_filepath='data/raw/train-images.idx3-ubyte', training_labels_filepath='data/raw/train-labels.idx1-ubyte', test_images_filepath='data/raw/t10k-images.idx3-ubyte', test_labels_filepath='data/raw/t10k-labels.idx1-ubyte' ) (x_train, y_train), (x_test, y_test) = loader.load_data() ``` ## Development Guidelines - All functions include type hints - All public functions have docstrings - Follow naming conventions (snake_case for functions, PascalCase for classes) - Run `ruff check . --fix` before committing - Add unit tests in `tests/` directory for critical functions