Commit ·
af4ee56
1
Parent(s): a4c3924
Update write-up
Browse files
write-up
CHANGED
|
@@ -1,7 +1,33 @@
|
|
| 1 |
**Summary & Overview:**
|
| 2 |
-
For this section, I used CIFAR-10 as the base dataset for constructing my new dataset,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
**Future Work:**
|
| 5 |
-
The model does decently well, with the best performance achieving 82% test accuracy after 15 epochs and 83% at the 43rd epoch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
Another possibility may be using images of differing (or higher) resolutions in training CNNs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
**Summary & Overview:**
|
| 2 |
+
For this section, I used CIFAR-10 as the base dataset for constructing my new dataset,
|
| 3 |
+
a flipped version of CIFAR-10 (same images but some are flipped upside down/180 degrees while others are left unchanged).
|
| 4 |
+
To ensure a balanced split between training and test sets, as well as within each class,
|
| 5 |
+
I set the parameter 𝑝=0.5 when flipping images to ensure that, roughly speaking, half the images are flipped upside down.
|
| 6 |
+
The labels are encoded so that a label of 1 corresponds to an image that is flipped upside down and vice versa for 0.
|
| 7 |
+
For the model, I built a relatively simple convolutional network (CustomNet) with batch normalization throughout.
|
| 8 |
+
Finally, for training, I use a batch size of 128, stochastic gradient with an initial learning rate of 0.1,
|
| 9 |
+
momentum of 0.9, and weight decay of 5e-4, training for a total of 60 epochs while saving the best performing model(s) along the way;
|
| 10 |
+
to facilitate training, I also use a learning rate scheduler that reduced the learning rate by half
|
| 11 |
+
if the training loss fails to improve after a certain period of time (3 epochs) by a certain amount (1e-3).
|
| 12 |
|
| 13 |
**Future Work:**
|
| 14 |
+
The model does decently well, with the best performance achieving 82% test accuracy after 15 epochs and 83% at the 43rd epoch.
|
| 15 |
+
While CNNs tend to be translation-ally invariant and their inductive biases useful for many image classification and related tasks,
|
| 16 |
+
CNNs are generally not invariant to rotations (like flipping upside down, a rotation of 180 degrees) or other more general affine transformations by themselves.
|
| 17 |
+
The most straightforward way to address these issues is data augmentation in training (as done here) where additional images,
|
| 18 |
+
rotated or otherwise, and their corresponding labels can help CNNs recognize and classify flipped images.
|
| 19 |
+
Going further with this method, one can subject images to varying degrees of rotation (30/45/90 degrees, etc) and/or phrase the task differently so as to better learn concepts associated with rotations (e.g. changing a classification task to a regression task where the model
|
| 20 |
+
is to predict how many degrees an image needs to be rotated in order to be oriented a certain way).
|
| 21 |
+
Similarly, this can be extended to help make CNNs more robust to images from different orientations, lightings, etc.
|
| 22 |
|
| 23 |
+
Another possibility may be using images of differing (or higher) resolutions in training CNNs.
|
| 24 |
+
Given the low-resolution of CIFAR-10 compared to some other datasets, there may be certain image artifacts that
|
| 25 |
+
the CNN has learnt to associate with certain rotations rather than learning/detecting rotated images themselves.
|
| 26 |
+
Additionally, when looking at two examples of mis-classified images in my model,
|
| 27 |
+
there were cases where it can look like there are several objects in an image (e.g. due to the low resolution) or
|
| 28 |
+
the main object in different/atypical positions or orientations; this again can be alleviated by further data augmentation
|
| 29 |
+
that includes images of objects in varying positions/orientations.
|
| 30 |
+
Finally, some other ways can include moving beyond the inductive bias of convolutional kernels and
|
| 31 |
+
CNNs altogether to experiment with other architectures—for instance, vision transformers and their
|
| 32 |
+
attention mechanisms may have a different or more “relaxed” inductive bias so that their performance on
|
| 33 |
+
images may be more or less robust to things like rotation, orientation, etc.
|