satoshiz01's picture
Update write-up
af4ee56
**Summary & Overview:**
For this section, I used CIFAR-10 as the base dataset for constructing my new dataset,
a flipped version of CIFAR-10 (same images but some are flipped upside down/180 degrees while others are left unchanged).
To ensure a balanced split between training and test sets, as well as within each class,
I set the parameter 𝑝=0.5 when flipping images to ensure that, roughly speaking, half the images are flipped upside down.
The labels are encoded so that a label of 1 corresponds to an image that is flipped upside down and vice versa for 0.
For the model, I built a relatively simple convolutional network (CustomNet) with batch normalization throughout.
Finally, for training, I use a batch size of 128, stochastic gradient with an initial learning rate of 0.1,
momentum of 0.9, and weight decay of 5e-4, training for a total of 60 epochs while saving the best performing model(s) along the way;
to facilitate training, I also use a learning rate scheduler that reduced the learning rate by half
if the training loss fails to improve after a certain period of time (3 epochs) by a certain amount (1e-3).
**Future Work:**
The model does decently well, with the best performance achieving 82% test accuracy after 15 epochs and 83% at the 43rd epoch.
While CNNs tend to be translation-ally invariant and their inductive biases useful for many image classification and related tasks,
CNNs are generally not invariant to rotations (like flipping upside down, a rotation of 180 degrees) or other more general affine transformations by themselves.
The most straightforward way to address these issues is data augmentation in training (as done here) where additional images,
rotated or otherwise, and their corresponding labels can help CNNs recognize and classify flipped images.
Going further with this method, one can subject images to varying degrees of rotation (30/45/90 degrees, etc) and/or phrase the task differently so as to better learn concepts associated with rotations (e.g. changing a classification task to a regression task where the model
is to predict how many degrees an image needs to be rotated in order to be oriented a certain way).
Similarly, this can be extended to help make CNNs more robust to images from different orientations, lightings, etc.
Another possibility may be using images of differing (or higher) resolutions in training CNNs.
Given the low-resolution of CIFAR-10 compared to some other datasets, there may be certain image artifacts that
the CNN has learnt to associate with certain rotations rather than learning/detecting rotated images themselves.
Additionally, when looking at two examples of mis-classified images in my model,
there were cases where it can look like there are several objects in an image (e.g. due to the low resolution) or
the main object in different/atypical positions or orientations; this again can be alleviated by further data augmentation
that includes images of objects in varying positions/orientations.
Finally, some other ways can include moving beyond the inductive bias of convolutional kernels and
CNNs altogether to experiment with other architectures—for instance, vision transformers and their
attention mechanisms may have a different or more “relaxed” inductive bias so that their performance on
images may be more or less robust to things like rotation, orientation, etc.