| --- |
| license: apache-2.0 |
| tags: |
| - image-classification |
| - rotation-prediction |
| - resnet |
| - pytorch |
| - vision |
| datasets: |
| - ILSVRC/imagenet-1k |
| pipeline_tag: image-classification |
| library_name: transformers |
| --- |
| |
| # π GyroScope β Image Rotation Prediction |
|
|
| **GyroScope** is a ResNet-18 trained **from scratch** to detect whether an image is rotated by **0Β°, 90Β°, 180Β°, or 270Β°** β and correct it automatically. |
|
|
| > Is that photo upside down? Let GyroScope figure it out. |
|
|
| --- |
|
|
| ## π― Task |
|
|
| Given any image, GyroScope classifies its orientation into one of **4 classes**: |
|
|
| | Label | Meaning | Correction | |
| |-------|---------|------------| |
| | 0 | 0Β° β upright β
| None | |
| | 1 | 90Β° CCW | Rotate 270Β° CCW | |
| | 2 | 180Β° β upside down | Rotate 180Β° | |
| | 3 | 270Β° CCW (= 90Β° CW) | Rotate 90Β° CCW | |
|
|
| **Correction formula:** `correction = (360 β detected_angle) % 360` |
|
|
| --- |
|
|
| ## π Benchmarks |
|
|
| Trained on **50,000 images** from [ImageNet-1k](https://huggingface.co/datasets/ILSVRC/imagenet-1k) Γ 4 rotations = **200k training samples**. |
| Validated on **5,000 images** Γ 4 rotations = **20k validation samples**. |
|
|
| | Metric | Value | |
| |--------|-------| |
| | **Overall Val Accuracy** | **79.81%%** | |
| | Per-class: 0Β° (upright) | 79.8% | |
| | Per-class: 90Β° CCW | 80.1% | |
| | Per-class: 180Β° | 79.4% | |
| | Per-class: 270Β° CCW | 79.8% | |
| | Training Epochs | 12 | |
| | Training Time | ~4h (Kaggle T4 GPU) | |
|
|
| ### Training Curve |
|
|
| | Epoch | Train Acc | Val Acc | |
| |-------|----------|---------| |
| | 1 | 41.4% | 43.2% | |
| | 2 | 52.0% | 46.9% | |
| | 3 | 59.4% | 62.8% | |
| | 4 | 64.1% | 66.0% | |
| | 5 | 67.8% | 69.48% | |
| | 6 | 70.6% | 72.22% | |
| | 7 | 73.3% | 74.25% | |
| | 8 | 75.6% | 76.49% | |
| | 9 | 77.5% | 77.47% | |
| | 10 | 79.1% | 79.47% | |
| | 11 | 80.3% | 79.78% | |
| | 12 | 80.9% | 79.81% | |
|
|
| --- |
|
|
| ## ποΈ Architecture |
|
|
| | Detail | Value | |
| |--------|-------| |
| | Base | ResNet-18 (from scratch, **no pretrained weights**) | |
| | Parameters | 11.2M | |
| | Input | 224 Γ 224 RGB | |
| | Output | 4 classes (0Β°, 90Β°, 180Β°, 270Β°) | |
| | Framework | π€ Hugging Face Transformers (`ResNetForImageClassification`) | |
|
|
| ### Training Details |
|
|
| - **Optimizer:** AdamW (lr=1e-3, weight_decay=0.05) |
| - **Scheduler:** Cosine annealing with 1-epoch linear warmup |
| - **Loss:** CrossEntropy with label smoothing (0.1) |
| - **Augmentations:** RandomCrop, ColorJitter, RandomGrayscale, RandomErasing |
| - **β οΈ No flips** β horizontal/vertical flips would corrupt rotation labels |
| - **Mixed precision:** FP16 via `torch.cuda.amp` |
| |
| --- |
| |
| ## π Quick Start |
| |
| ### Installation |
| |
| ```bash |
| pip install transformers torch torchvision pillow requests |
| ``` |
| |
| ### Inference β Single Image from URL |
| ```bash |
| python3 use_with_UI.py |
| ``` |
| |
| --> Download `use_with_UI.py` first π |
| |
| ## π‘ Example |
| |
| Input (rotated 180Β°): |
| |
|  |
| |
| GyroScope Output: |
| **π Recognized: 90Β° | Correction: 270Β°** |
| **π Probs: {'0Β°': '0.0257', '90Β°': '0.8706', '180Β°': '0.0735', '270Β°': '0.0300'}** |
| <br> |
| Corrected: |
| |
|  |
| |
| *Original Image Source: [Link to Pexels](https://www.pexels.com/de-de/foto/ruhiger-schlaf-einer-getigerten-hauskatze-auf-einem-sofa-32441547/)* |
| |
| ## β οΈ Limitations |
| |
| - Rotationally symmetric images (balls, textures, patterns) are inherently ambiguous β no model can reliably classify these. |
| - Trained on natural images (ImageNet). Performance may degrade on: |
| - Documents / text-heavy images |
| - Medical imaging |
| - Satellite / aerial imagery |
| - Abstract art |
| |
| |
| Only handles 90Β° increments β arbitrary angles (e.g. 45Β° or 135Β°) are **not supported**! |
| Trained from scratch on 50k images β a pretrained backbone would likely yield higher accuracy (Finetuning). |
| |
| |
| ## π Use Cases |
| |
| - πΈ Photo management β auto-correct phone/camera orientation |
| - ποΈ Data preprocessing β fix rotated images in scraped datasets |
| - π€ ML pipelines β orientation normalization before feeding to downstream models |
| - πΌοΈ Digital archives β batch-correct scanned/uploaded images |
| |
| > Yesterday, I was sorting photos and like every photo was rotated wrong! This inspired me to make this tool π |
| |
| ## π» Training code |
| |
| The full training code can be found in `train.py`. Have fun π |
| |
| ## π License |
| |
| Apache 2.0 |
| |
| ## π Acknowledgments |
| |
| - Dataset: ILSVRC/ImageNet-1k |
| - Architecture: Microsoft ResNet via π€ Transformers |
| - Trained on Kaggle (Tesla T4 GPU) |
| |
| --- |
| |
| > GyroScope β because every image deserves to stand upright. |