π§ MyTorch Elite MNIST Model
Curated by Aryan Singh Chandel (Shiro) at Rustamji Institute of Technology (RJIT).
This repository hosts the Elite MyTorch MNIST Model, which achieved a remarkable 98.59% accuracy on the MNIST handwritten digit classification task.
The model is built entirely from scratch using NumPy, replicating core deep learning functionalities. It incorporates advanced features such as AdamW optimizer, Batch Normalization, Dropout, and Label Smoothing to achieve state-of-the-art performance within a custom framework.
π Model Architecture
This is a multi-layer perceptron (MLP) with a "Diamond Architecture":
784 (Input) -> 1024 -> 2048 (Expansion) -> 512 (Bottleneck) -> 10 (Output)
Key components:
- Linear Layers with Kaiming Initialization
- Batch Normalization
- ReLU Activation Functions
- Dropout for regularization
β‘ Training Details
- Optimizer: AdamW with Weight Decay
- Loss Function: Cross-Entropy Loss with Label Smoothing
- Learning Rate Scheduler: StepLR
- Epochs: 15
- Batch Size: 256
- Data Augmentation: Ultra-light Gaussian noise
π Performance
- Validation Accuracy: 98.59%
- Confusion Matrix: (See
visuals/final_heatmap.pngin the main GitHub repo)
π¦ Files
best_model.pkl: The serialized Python pickle file containing the trained MyTorch model instance.
π‘ Usage
To load and use this model in your MyTorch project:
import pickle
import numpy as np
# Assuming you have MyTorch installed or in your Python path
# from mytorch.nn.sequential import Sequential # (and other modules)
# Load the model
with open('best_model.pkl', 'rb') as f:
loaded_model = pickle.load(f)
# Example inference (assuming X_test is your preprocessed test data)
# predictions = loaded_model(X_test)
# print(np.argmax(predictions, axis=1))
π Citation If you use this model in your research, please attribute: Chandel, A. S. (2026). MyTorch: Deep Learning from Scratch at RJIT.