File size: 1,787 Bytes
06ca6aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c46e76
 
 
 
 
 
 
 
 
 
 
 
4b41bba
0c46e76
 
 
 
 
 
 
 
2cc4e2c
 
 
 
0c46e76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90b7d9b
 
4b41bba
0c46e76
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: mit
datasets:
- ylecun/mnist
metrics:
- accuracy
pipeline_tag: image-classification
model-index:
  - name: MoE-CNN
    results:
      - task:
          type: image-classification
        dataset:
          name: MNIST
          type: image-classification
        metrics:
          - name: Accuracy
            type: Accuracy
            value: 99.75
---
# MoE-CNN

### Model Description

- **Model type:** Image Classification
- **License:** MIT

## How to Get Started with the Model

Use the code below to get started with the model.

```python
model = MixtureOfExperts(num_experts=10)

checkpoint_path = "FP_ML_MOE_SIMPLE_99_75.pth"
checkpoint = torch.load(checkpoint_path)

model.load_state_dict(checkpoint['model_state_dict'])

print(f"Validation Accuracy: {checkpoint["val_accuracy"]:.2f}")

input_data = torch.randn(1, 1, 28, 28)
results = model.predict(input_data.to(device))
print("Results:", results)
```

## Training Details

### Training Data
https://huggingface.co/datasets/ylecun/mnist

### Training Procedure

#### Data Augmentation
- RandomRotation(10)
- RandomAffine(0, shear=10)
- RandomAffine(0, translate=(0.1, 0.1))
- RandomResizedCrop(28, scale=(0.8, 1.0))
- RandomPerspective(distortion_scale=0.2, p=0.5)
- Resize((28, 28))

#### Training Hyperparameters
Adam with learning rate of 0.001 for fast initial convergence
SGD with learning rate of 0.01 and learning rate decay to 0.001

#### Size
2,247,151 parameters with 674,145 effective parameters

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data
https://huggingface.co/datasets/ylecun/mnist

#### Metrics
- Accuracy: 99.75%
- Error rate: 0.25%

## Technical Specifications [optional]

### Model Architecture
Mixture-of-Experts (MoE) architecture with a simple CNN as the experts.