| | --- |
| | license: mit |
| | datasets: |
| | - ylecun/mnist |
| | metrics: |
| | - accuracy |
| | pipeline_tag: image-classification |
| | model-index: |
| | - name: MoE-CNN |
| | results: |
| | - task: |
| | type: image-classification |
| | dataset: |
| | name: MNIST |
| | type: image-classification |
| | metrics: |
| | - name: Accuracy |
| | type: Accuracy |
| | value: 99.75 |
| | --- |
| | # MoE-CNN |
| |
|
| | ### Model Description |
| |
|
| | - **Model type:** Image Classification |
| | - **License:** MIT |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Use the code below to get started with the model. |
| |
|
| | ```python |
| | model = MixtureOfExperts(num_experts=10) |
| | |
| | checkpoint_path = "FP_ML_MOE_SIMPLE_99_75.pth" |
| | checkpoint = torch.load(checkpoint_path) |
| | |
| | model.load_state_dict(checkpoint['model_state_dict']) |
| | |
| | print(f"Validation Accuracy: {checkpoint["val_accuracy"]:.2f}") |
| | |
| | input_data = torch.randn(1, 1, 28, 28) |
| | results = model.predict(input_data.to(device)) |
| | print("Results:", results) |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| | https://huggingface.co/datasets/ylecun/mnist |
| |
|
| | ### Training Procedure |
| |
|
| | #### Data Augmentation |
| | - RandomRotation(10) |
| | - RandomAffine(0, shear=10) |
| | - RandomAffine(0, translate=(0.1, 0.1)) |
| | - RandomResizedCrop(28, scale=(0.8, 1.0)) |
| | - RandomPerspective(distortion_scale=0.2, p=0.5) |
| | - Resize((28, 28)) |
| | |
| | #### Training Hyperparameters |
| | Adam with learning rate of 0.001 for fast initial convergence |
| | SGD with learning rate of 0.01 and learning rate decay to 0.001 |
| | |
| | #### Size |
| | 2,247,151 parameters with 674,145 effective parameters |
| | |
| | ## Evaluation |
| | |
| | ### Testing Data, Factors & Metrics |
| | |
| | #### Testing Data |
| | https://huggingface.co/datasets/ylecun/mnist |
| | |
| | #### Metrics |
| | - Accuracy: 99.75% |
| | - Error rate: 0.25% |
| | |
| | ## Technical Specifications [optional] |
| | |
| | ### Model Architecture |
| | Mixture-of-Experts (MoE) architecture with a simple CNN as the experts. |