Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: mit
|
| 5 |
+
library_name: numpy
|
| 6 |
+
tasks:
|
| 7 |
+
- image-classification
|
| 8 |
+
tags:
|
| 9 |
+
- mnist
|
| 10 |
+
- mytorch
|
| 11 |
+
- deep-learning
|
| 12 |
+
- from-scratch
|
| 13 |
+
- numpy-only
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# 🧠 MyTorch Elite MNIST Model
|
| 17 |
+
|
| 18 |
+
Curated by **Aryan Singh Chandel (Shiro)** at **Rustamji Institute of Technology (RJIT)**.
|
| 19 |
+
|
| 20 |
+
This repository hosts the **Elite MyTorch MNIST Model**, which achieved a remarkable **98.59% accuracy** on the MNIST handwritten digit classification task.
|
| 21 |
+
|
| 22 |
+
The model is built entirely from scratch using NumPy, replicating core deep learning functionalities. It incorporates advanced features such as AdamW optimizer, Batch Normalization, Dropout, and Label Smoothing to achieve state-of-the-art performance within a custom framework.
|
| 23 |
+
|
| 24 |
+
## 🚀 Model Architecture
|
| 25 |
+
This is a multi-layer perceptron (MLP) with a "Diamond Architecture":
|
| 26 |
+
`784 (Input) -> 1024 -> 2048 (Expansion) -> 512 (Bottleneck) -> 10 (Output)`
|
| 27 |
+
|
| 28 |
+
Key components:
|
| 29 |
+
- **Linear Layers** with Kaiming Initialization
|
| 30 |
+
- **Batch Normalization**
|
| 31 |
+
- **ReLU** Activation Functions
|
| 32 |
+
- **Dropout** for regularization
|
| 33 |
+
|
| 34 |
+
## ⚡ Training Details
|
| 35 |
+
- **Optimizer:** AdamW with Weight Decay
|
| 36 |
+
- **Loss Function:** Cross-Entropy Loss with Label Smoothing
|
| 37 |
+
- **Learning Rate Scheduler:** StepLR
|
| 38 |
+
- **Epochs:** 15
|
| 39 |
+
- **Batch Size:** 256
|
| 40 |
+
- **Data Augmentation:** Ultra-light Gaussian noise
|
| 41 |
+
|
| 42 |
+
## 📊 Performance
|
| 43 |
+
- **Validation Accuracy:** 98.59%
|
| 44 |
+
- **Confusion Matrix:** (See `visuals/final_heatmap.png` in the main GitHub repo)
|
| 45 |
+
|
| 46 |
+
## 📦 Files
|
| 47 |
+
- `best_model.pkl`: The serialized Python pickle file containing the trained MyTorch model instance.
|
| 48 |
+
|
| 49 |
+
## 💡 Usage
|
| 50 |
+
To load and use this model in your MyTorch project:
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
import pickle
|
| 54 |
+
import numpy as np
|
| 55 |
+
|
| 56 |
+
# Assuming you have MyTorch installed or in your Python path
|
| 57 |
+
# from mytorch.nn.sequential import Sequential # (and other modules)
|
| 58 |
+
|
| 59 |
+
# Load the model
|
| 60 |
+
with open('best_model.pkl', 'rb') as f:
|
| 61 |
+
loaded_model = pickle.load(f)
|
| 62 |
+
|
| 63 |
+
# Example inference (assuming X_test is your preprocessed test data)
|
| 64 |
+
# predictions = loaded_model(X_test)
|
| 65 |
+
# print(np.argmax(predictions, axis=1))
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
🎓 Citation
|
| 69 |
+
If you use this model in your research, please attribute:
|
| 70 |
+
Chandel, A. S. (2026). MyTorch: Deep Learning from Scratch at RJIT.
|