File size: 4,638 Bytes
5e2f346 7f802a1 5e2f346 fc212d6 d0000f0 9687be0 925f626 7f802a1 5e2f346 a73d4d4 5e2f346 a73d4d4 5e2f346 97987d0 5e2f346 80e4e04 5e2f346 80e4e04 5e2f346 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | ---
license: apache-2.0
tags:
- image-classification
- rotation-prediction
- resnet
- pytorch
- vision
datasets:
- ILSVRC/imagenet-1k
pipeline_tag: image-classification
library_name: transformers
---
# π GyroScope β Image Rotation Prediction
**GyroScope** is a ResNet-18 trained **from scratch** to detect whether an image is rotated by **0Β°, 90Β°, 180Β°, or 270Β°** β and correct it automatically.
> Is that photo upside down? Let GyroScope figure it out.
---
## π― Task
Given any image, GyroScope classifies its orientation into one of **4 classes**:
| Label | Meaning | Correction |
|-------|---------|------------|
| 0 | 0Β° β upright β
| None |
| 1 | 90Β° CCW | Rotate 270Β° CCW |
| 2 | 180Β° β upside down | Rotate 180Β° |
| 3 | 270Β° CCW (= 90Β° CW) | Rotate 90Β° CCW |
**Correction formula:** `correction = (360 β detected_angle) % 360`
---
## π Benchmarks
Trained on **50,000 images** from [ImageNet-1k](https://huggingface.co/datasets/ILSVRC/imagenet-1k) Γ 4 rotations = **200k training samples**.
Validated on **5,000 images** Γ 4 rotations = **20k validation samples**.
| Metric | Value |
|--------|-------|
| **Overall Val Accuracy** | **79.81%%** |
| Per-class: 0Β° (upright) | 79.8% |
| Per-class: 90Β° CCW | 80.1% |
| Per-class: 180Β° | 79.4% |
| Per-class: 270Β° CCW | 79.8% |
| Training Epochs | 12 |
| Training Time | ~4h (Kaggle T4 GPU) |
### Training Curve
| Epoch | Train Acc | Val Acc |
|-------|----------|---------|
| 1 | 41.4% | 43.2% |
| 2 | 52.0% | 46.9% |
| 3 | 59.4% | 62.8% |
| 4 | 64.1% | 66.0% |
| 5 | 67.8% | 69.48% |
| 6 | 70.6% | 72.22% |
| 7 | 73.3% | 74.25% |
| 8 | 75.6% | 76.49% |
| 9 | 77.5% | 77.47% |
| 10 | 79.1% | 79.47% |
| 11 | 80.3% | 79.78% |
| 12 | 80.9% | 79.81% |
---
## ποΈ Architecture
| Detail | Value |
|--------|-------|
| Base | ResNet-18 (from scratch, **no pretrained weights**) |
| Parameters | 11.2M |
| Input | 224 Γ 224 RGB |
| Output | 4 classes (0Β°, 90Β°, 180Β°, 270Β°) |
| Framework | π€ Hugging Face Transformers (`ResNetForImageClassification`) |
### Training Details
- **Optimizer:** AdamW (lr=1e-3, weight_decay=0.05)
- **Scheduler:** Cosine annealing with 1-epoch linear warmup
- **Loss:** CrossEntropy with label smoothing (0.1)
- **Augmentations:** RandomCrop, ColorJitter, RandomGrayscale, RandomErasing
- **β οΈ No flips** β horizontal/vertical flips would corrupt rotation labels
- **Mixed precision:** FP16 via `torch.cuda.amp`
---
## π Quick Start
### Installation
```bash
pip install transformers torch torchvision pillow requests
```
### Inference β Single Image from URL
```bash
python3 use_with_UI.py
```
--> Download `use_with_UI.py` first π
## π‘ Example
Input (rotated 180Β°):

GyroScope Output:
**π Recognized: 90Β° | Correction: 270Β°**
**π Probs: {'0Β°': '0.0257', '90Β°': '0.8706', '180Β°': '0.0735', '270Β°': '0.0300'}**
<br>
Corrected:

*Original Image Source: [Link to Pexels](https://www.pexels.com/de-de/foto/ruhiger-schlaf-einer-getigerten-hauskatze-auf-einem-sofa-32441547/)*
## β οΈ Limitations
- Rotationally symmetric images (balls, textures, patterns) are inherently ambiguous β no model can reliably classify these.
- Trained on natural images (ImageNet). Performance may degrade on:
- Documents / text-heavy images
- Medical imaging
- Satellite / aerial imagery
- Abstract art
Only handles 90Β° increments β arbitrary angles (e.g. 45Β° or 135Β°) are **not supported**!
Trained from scratch on 50k images β a pretrained backbone would likely yield higher accuracy (Finetuning).
## π Use Cases
- πΈ Photo management β auto-correct phone/camera orientation
- ποΈ Data preprocessing β fix rotated images in scraped datasets
- π€ ML pipelines β orientation normalization before feeding to downstream models
- πΌοΈ Digital archives β batch-correct scanned/uploaded images
> Yesterday, I was sorting photos and like every photo was rotated wrong! This inspired me to make this tool π
## π» Training code
The full training code can be found in `train.py`. Have fun π
## π License
Apache 2.0
## π Acknowledgments
- Dataset: ILSVRC/ImageNet-1k
- Architecture: Microsoft ResNet via π€ Transformers
- Trained on Kaggle (Tesla T4 GPU)
---
> GyroScope β because every image deserves to stand upright. |