Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
Scripts Directory
This directory contains reusable Python modules for the MNIST digit classification project.
Modules
Data Processing
data_loader.py- MNIST data loading from IDX binary formatMnistDataloaderclass for loading train/test data
preprocessing.py- Data preprocessing and PyTorch DatasetMnistDataset- PyTorch Dataset with normalizationcreate_train_val_split()- Split training data into train/val
data_quality.py- Data quality analysis functions- Quality checks: missing values, outliers, class balance
generate_quality_report()- Comprehensive quality report
augmentation.py- Data augmentation pipelineget_augmentation_pipeline()- Transform composition for training
Model
models.py- CNN architecturesBaselineCNN- 2-layer CNN baseline model
train.py- Training pipelinetrain_epoch()- Single epoch trainingvalidate()- Validation evaluationtrain_model()- Complete training loop with MLflow logging
evaluate.py- Model evaluationevaluate_model()- Comprehensive metrics computation- Accuracy, precision, recall, confusion matrix
inference.py- Production inferenceDigitClassifier- Inference wrapper for deployment
Experiment Tracking
mlflow_setup.py- MLflow configurationsetup_mlflow()- Initialize MLflow experiment- Tracking URI and experiment management
Utilities
launch_mlflow_ui.sh- Launch MLflow UI serverdocker_start.sh- Start Docker containersdocker_stop.sh- Stop Docker containersdocker_logs.sh- View Docker container logs
Usage
All modules are designed to be imported and used in notebooks or other scripts:
# Example: Load data
from scripts.data_loader import MnistDataloader
loader = MnistDataloader(
training_images_filepath='data/raw/train-images.idx3-ubyte',
training_labels_filepath='data/raw/train-labels.idx1-ubyte',
test_images_filepath='data/raw/t10k-images.idx3-ubyte',
test_labels_filepath='data/raw/t10k-labels.idx1-ubyte'
)
(x_train, y_train), (x_test, y_test) = loader.load_data()
Development Guidelines
- All functions include type hints
- All public functions have docstrings
- Follow naming conventions (snake_case for functions, PascalCase for classes)
- Run
ruff check . --fixbefore committing - Add unit tests in
tests/directory for critical functions