File size: 1,787 Bytes
06ca6aa 0c46e76 4b41bba 0c46e76 2cc4e2c 0c46e76 90b7d9b 4b41bba 0c46e76 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | ---
license: mit
datasets:
- ylecun/mnist
metrics:
- accuracy
pipeline_tag: image-classification
model-index:
- name: MoE-CNN
results:
- task:
type: image-classification
dataset:
name: MNIST
type: image-classification
metrics:
- name: Accuracy
type: Accuracy
value: 99.75
---
# MoE-CNN
### Model Description
- **Model type:** Image Classification
- **License:** MIT
## How to Get Started with the Model
Use the code below to get started with the model.
```python
model = MixtureOfExperts(num_experts=10)
checkpoint_path = "FP_ML_MOE_SIMPLE_99_75.pth"
checkpoint = torch.load(checkpoint_path)
model.load_state_dict(checkpoint['model_state_dict'])
print(f"Validation Accuracy: {checkpoint["val_accuracy"]:.2f}")
input_data = torch.randn(1, 1, 28, 28)
results = model.predict(input_data.to(device))
print("Results:", results)
```
## Training Details
### Training Data
https://huggingface.co/datasets/ylecun/mnist
### Training Procedure
#### Data Augmentation
- RandomRotation(10)
- RandomAffine(0, shear=10)
- RandomAffine(0, translate=(0.1, 0.1))
- RandomResizedCrop(28, scale=(0.8, 1.0))
- RandomPerspective(distortion_scale=0.2, p=0.5)
- Resize((28, 28))
#### Training Hyperparameters
Adam with learning rate of 0.001 for fast initial convergence
SGD with learning rate of 0.01 and learning rate decay to 0.001
#### Size
2,247,151 parameters with 674,145 effective parameters
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
https://huggingface.co/datasets/ylecun/mnist
#### Metrics
- Accuracy: 99.75%
- Error rate: 0.25%
## Technical Specifications [optional]
### Model Architecture
Mixture-of-Experts (MoE) architecture with a simple CNN as the experts. |