Mikask commited on
Commit
0c46e76
·
verified ·
1 Parent(s): 06ca6aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -1
README.md CHANGED
@@ -17,4 +17,74 @@ model-index:
17
  - name: Accuracy
18
  type: Accuracy
19
  value: 99.75
20
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  - name: Accuracy
18
  type: Accuracy
19
  value: 99.75
20
+ ---
21
+ # MoE-CNN
22
+
23
+ ## Model Details
24
+
25
+ ### Model Description
26
+
27
+ - **Model type:** Image Classification
28
+ - **License:** MIT
29
+
30
+ ## How to Get Started with the Model
31
+
32
+ Use the code below to get started with the model.
33
+
34
+ ```
35
+ model = MixtureOfExperts(num_experts=10)
36
+
37
+ checkpoint_path = "FP_ML_MOE_SIMPLE_99_75.pth"
38
+ checkpoint = torch.load(checkpoint_path)
39
+
40
+ model.load_state_dict(checkpoint['model_state_dict'])
41
+
42
+ print(f"Validation Accuracy: {checkpoint["val_accuracy"]:.2f}")
43
+ ```
44
+
45
+ ## Training Details
46
+
47
+ ### Training Data
48
+ https://huggingface.co/datasets/ylecun/mnist
49
+
50
+ ### Training Procedure
51
+
52
+ #### Data Augmentation
53
+ - RandomRotation(10)
54
+ - RandomAffine(0, shear=10)
55
+ - RandomAffine(0, translate=(0.1, 0.1))
56
+ - RandomResizedCrop(28, scale=(0.8, 1.0))
57
+ - RandomPerspective(distortion_scale=0.2, p=0.5)
58
+ - Resize((28, 28))
59
+
60
+ #### Training Hyperparameters
61
+ Adam with learning rate of 0.001 for fast initial convergence
62
+ SGD with learning rate of 0.01 and learning rate decay to 0.001
63
+
64
+ #### Size
65
+ 2,247,151 parameters with 674,145 effective parameters
66
+
67
+ ## Evaluation
68
+
69
+ ### Testing Data, Factors & Metrics
70
+
71
+ #### Testing Data
72
+ https://huggingface.co/datasets/ylecun/mnist
73
+
74
+ #### Metrics
75
+ model-index:
76
+ - name: MoE-CNN
77
+ results:
78
+ - task:
79
+ type: image-classification
80
+ dataset:
81
+ name: MNIST
82
+ type: image-classification
83
+ metrics:
84
+ - name: Accuracy
85
+ type: Accuracy
86
+ value: 99.75
87
+ ## Technical Specifications [optional]
88
+
89
+ ### Model Architecture
90
+ Mixture-of-Experts (MoE) architecture with a simple CNN as the experts.