| **Summary & Overview:** | |
| For this section, I used CIFAR-10 as the base dataset for constructing my new dataset, | |
| a flipped version of CIFAR-10 (same images but some are flipped upside down/180 degrees while others are left unchanged). | |
| To ensure a balanced split between training and test sets, as well as within each class, | |
| I set the parameter 𝑝=0.5 when flipping images to ensure that, roughly speaking, half the images are flipped upside down. | |
| The labels are encoded so that a label of 1 corresponds to an image that is flipped upside down and vice versa for 0. | |
| For the model, I built a relatively simple convolutional network (CustomNet) with batch normalization throughout. | |
| Finally, for training, I use a batch size of 128, stochastic gradient with an initial learning rate of 0.1, | |
| momentum of 0.9, and weight decay of 5e-4, training for a total of 60 epochs while saving the best performing model(s) along the way; | |
| to facilitate training, I also use a learning rate scheduler that reduced the learning rate by half | |
| if the training loss fails to improve after a certain period of time (3 epochs) by a certain amount (1e-3). | |
| **Future Work:** | |
| The model does decently well, with the best performance achieving 82% test accuracy after 15 epochs and 83% at the 43rd epoch. | |
| While CNNs tend to be translation-ally invariant and their inductive biases useful for many image classification and related tasks, | |
| CNNs are generally not invariant to rotations (like flipping upside down, a rotation of 180 degrees) or other more general affine transformations by themselves. | |
| The most straightforward way to address these issues is data augmentation in training (as done here) where additional images, | |
| rotated or otherwise, and their corresponding labels can help CNNs recognize and classify flipped images. | |
| Going further with this method, one can subject images to varying degrees of rotation (30/45/90 degrees, etc) and/or phrase the task differently so as to better learn concepts associated with rotations (e.g. changing a classification task to a regression task where the model | |
| is to predict how many degrees an image needs to be rotated in order to be oriented a certain way). | |
| Similarly, this can be extended to help make CNNs more robust to images from different orientations, lightings, etc. | |
| Another possibility may be using images of differing (or higher) resolutions in training CNNs. | |
| Given the low-resolution of CIFAR-10 compared to some other datasets, there may be certain image artifacts that | |
| the CNN has learnt to associate with certain rotations rather than learning/detecting rotated images themselves. | |
| Additionally, when looking at two examples of mis-classified images in my model, | |
| there were cases where it can look like there are several objects in an image (e.g. due to the low resolution) or | |
| the main object in different/atypical positions or orientations; this again can be alleviated by further data augmentation | |
| that includes images of objects in varying positions/orientations. | |
| Finally, some other ways can include moving beyond the inductive bias of convolutional kernels and | |
| CNNs altogether to experiment with other architectures—for instance, vision transformers and their | |
| attention mechanisms may have a different or more “relaxed” inductive bias so that their performance on | |
| images may be more or less robust to things like rotation, orientation, etc. |