sdtemple
/

shape-prediction-model

+---
+license: mit
+datasets:
+- sdtemple/colored-shapes
+language:
+- en
+metrics:
+- accuracy
+- precision
+- recall
+- roc_auc
+pipeline_tag: image-classification
+tags:
+- tutorial
+---
+This model predicts the shape (circle, rectangle, diamond, or triangle) of the 1 colored shape (8 colors) in a 224 x 224 x 3 image.
+This model is a part of a how to tutorial on fitting PyTorch models.
+The model is trained on 2000 examples for each color and shape combo (64,000 samples in total) simulated according to [https://github.com/sdtemple/zootopia3](https://github.com/sdtemple/zootopia3).
+The model is tested/evaluated on the dataset [https://huggingface.co/datasets/sdtemple/colored-shapes](https://huggingface.co/datasets/sdtemple/colored-shapes) which has slightly smaller shapes simulated (out of distribution) relative to the training data.
+- Accuracy: 75%
+- Min precision (triangle): 56%
+- Max precision (rectangle): 89%
+- Min recall (diamond): 59%
+- Max recall (triangle): 86%
+- AUROC (macro-averaged): 91%
+- Min AUROC (diamond): 90%
+- Max AUROC (circle): 93%
+Compared to [https://huggingface.co/sdtemple/color-prediction-model](https://huggingface.co/sdtemple/color-prediction-model), it is harder to predict the shape than the color of the object.
+The model architecture is the following. In light experimentation, I found it important to have multiple convolutions and that too many parameters leads to noisy validation losses by epoch.
+```
+MyCNN(
+  (conv_block): Sequential(
+    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
+    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+    (3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+    (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
+    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+    (6): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+    (7): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
+    (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+    (9): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+    (10): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
+    (11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+    (12): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+    (13): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
+    (14): AvgPool2d(kernel_size=2, stride=2, padding=0)
+  )
+  (linear_block): Sequential(
+    (0): Linear(in_features=784, out_features=16, bias=True)
+    (1): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
+    (2): ReLU()
+    (3): Dropout(p=0.2, inplace=False)
+    (4): Linear(in_features=16, out_features=16, bias=True)
+    (5): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
+    (6): ReLU()
+    (7): Dropout(p=0.2, inplace=False)
+  )
+  (output_block): Linear(in_features=16, out_features=4, bias=True)
+)
+```