--- license: mit datasets: - ylecun/mnist metrics: - accuracy pipeline_tag: image-classification model-index: - name: MoE-CNN results: - task: type: image-classification dataset: name: MNIST type: image-classification metrics: - name: Accuracy type: Accuracy value: 99.75 --- # MoE-CNN ### Model Description - **Model type:** Image Classification - **License:** MIT ## How to Get Started with the Model Use the code below to get started with the model. ```python model = MixtureOfExperts(num_experts=10) checkpoint_path = "FP_ML_MOE_SIMPLE_99_75.pth" checkpoint = torch.load(checkpoint_path) model.load_state_dict(checkpoint['model_state_dict']) print(f"Validation Accuracy: {checkpoint["val_accuracy"]:.2f}") input_data = torch.randn(1, 1, 28, 28) results = model.predict(input_data.to(device)) print("Results:", results) ``` ## Training Details ### Training Data https://huggingface.co/datasets/ylecun/mnist ### Training Procedure #### Data Augmentation - RandomRotation(10) - RandomAffine(0, shear=10) - RandomAffine(0, translate=(0.1, 0.1)) - RandomResizedCrop(28, scale=(0.8, 1.0)) - RandomPerspective(distortion_scale=0.2, p=0.5) - Resize((28, 28)) #### Training Hyperparameters Adam with learning rate of 0.001 for fast initial convergence SGD with learning rate of 0.01 and learning rate decay to 0.001 #### Size 2,247,151 parameters with 674,145 effective parameters ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data https://huggingface.co/datasets/ylecun/mnist #### Metrics - Accuracy: 99.75% - Error rate: 0.25% ## Technical Specifications [optional] ### Model Architecture Mixture-of-Experts (MoE) architecture with a simple CNN as the experts.