File size: 1,787 Bytes

---
license: mit
datasets:
- ylecun/mnist
metrics:
- accuracy
pipeline_tag: image-classification
model-index:
  - name: MoE-CNN
    results:
      - task:
          type: image-classification
        dataset:
          name: MNIST
          type: image-classification
        metrics:
          - name: Accuracy
            type: Accuracy
            value: 99.75
---
# MoE-CNN

### Model Description

- **Model type:** Image Classification
- **License:** MIT

## How to Get Started with the Model

Use the code below to get started with the model.

```python
model = MixtureOfExperts(num_experts=10)

checkpoint_path = "FP_ML_MOE_SIMPLE_99_75.pth"
checkpoint = torch.load(checkpoint_path)

model.load_state_dict(checkpoint['model_state_dict'])

print(f"Validation Accuracy: {checkpoint["val_accuracy"]:.2f}")

input_data = torch.randn(1, 1, 28, 28)
results = model.predict(input_data.to(device))
print("Results:", results)
```

## Training Details

### Training Data
https://huggingface.co/datasets/ylecun/mnist

### Training Procedure

#### Data Augmentation
- RandomRotation(10)
- RandomAffine(0, shear=10)
- RandomAffine(0, translate=(0.1, 0.1))
- RandomResizedCrop(28, scale=(0.8, 1.0))
- RandomPerspective(distortion_scale=0.2, p=0.5)
- Resize((28, 28))

#### Training Hyperparameters
Adam with learning rate of 0.001 for fast initial convergence
SGD with learning rate of 0.01 and learning rate decay to 0.001

#### Size
2,247,151 parameters with 674,145 effective parameters

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data
https://huggingface.co/datasets/ylecun/mnist

#### Metrics
- Accuracy: 99.75%
- Error rate: 0.25%

## Technical Specifications [optional]

### Model Architecture
Mixture-of-Experts (MoE) architecture with a simple CNN as the experts.