| --- |
| license: mit |
| tags: |
| - vision |
| - image-classification |
| - pytorch |
| - efficientnet |
| datasets: |
| - Shad0wKillar/pizza_steak_sushi |
| metrics: |
| - accuracy |
| --- |
| # EfficientNet-B5 Pizza/Steak/Sushi Classifier |
|
|
| I fine-tuned a pre-trained EfficientNet-B5 model to classify images into three categories: pizza, steak, and sushi. |
|
|
| ## Model Details |
| * **Architecture:** `torchvision.models.efficientnet_b5` |
| * **Weights:** `EfficientNet_B5_Weights.DEFAULT` |
| * **Modifications:** I froze all the base feature layers. |
|
|
| ## Training Procedure |
| I trained the model for 10 epochs using the Adam optimizer. |
|
|
| * **Batch Size:** 32 |
| * **Learning Rate:** 0.001 |
| * **Loss Function:** CrossEntropyLoss |
| * **Transforms:** I used the automatic transforms provided by the default EfficientNet-B5 weights. |
| * **Hardware:** Trained using `cuda` (if available) with a set manual seed of 37 for reproducibility. |
|
|
| ## Dataset |
| I used a 20% subset of a pizza, steak, and sushi dataset. The data was split into `train` and `test` directories. |
|
|
| ## Evaluation Results |
|
|
| ### Accuracy and Loss Curves |
| Over the 10 epochs, the training and testing loss steadily decreased, with the testing loss ending impressively below 0.20. The testing accuracy consistently outperformed the training accuracy and finished highly stable at roughly 97.5%. |
|
|
|  |
|
|
| ### Confusion Matrix |
| The model performs exceptionally well across all three classes on the test set, showing strong improvements over previous iterations: |
| * **Pizza:** 45 correct, 0 misclassified as steak, 1 misclassified as sushi. |
| * **Steak:** 55 correct, 0 misclassified as pizza, 3 misclassified as sushi. |
| * **Sushi:** 46 correct, 0 misclassified as pizza, 0 misclassified as steak. |
|
|
|  |
|
|
| ### Most Confident Wrong Predictions |
| I plotted the instances where the model was confident but incorrect. The model struggled specifically with predicting complex textures as sushi, though its highest confidence on these incorrect predictions hovered around only 0.51, indicating it was less confident in its errors compared to earlier models. |
|
|
|  |
|
|
| ## How to use |
| ```python |
| import torch |
| import torchvision |
| |
| # I loaded the model architecture |
| weights = torchvision.models.EfficientNet_B5_Weights.DEFAULT |
| model = torchvision.models.efficientnet_b5(weights=weights) |
| |
| # I modified the classifier |
| model.classifier = torch.nn.Sequential( |
| torch.nn.Dropout(p=0.2, inplace=True), |
| torch.nn.Linear(in_features=2048, out_features=3, bias=True), |
| ) |
| |
| # I loaded the saved weights |
| model.load_state_dict(torch.load("EfficientNet_B5_20percent.pth", map_location="cpu")) |
| model.eval() |