| --- |
| language: en |
| license: apache-2.0 |
| pipeline_tag: image-classification |
| tags: |
| - computer-vision |
| - image-classification |
| - mobilenet-v2 |
| - cifar100 |
| - whirlwindai |
| datasets: |
| - cifar100 |
| metrics: |
| - accuracy |
| --- |
| |
| <p align="center"> |
|
|
| <img src="https://raw.githubusercontent.com/Platane/snk/output/github-contribution-grid-snake-dark.svg"> |
|
|
| </p> |
|
|
| <div align="center"> |
|
|
| <img src="https://readme-typing-svg.demolab.com?font=Space+Grotesk&weight=700&size=28&duration=2600&pause=1200&color=22D3EE¢er=true&vCenter=true&width=720&lines=GVM;General+Vision+Model;Seeing+Patterns.;Built+for+Speed." /> |
|
|
| <br> |
|
|
| <img src="https://img.shields.io/badge/Model-MobileNetV2-06B6D4?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Parameters-14MB-0EA5E9?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Dataset-CIFAR--100-14B8A6?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Framework-PyTorch-0284C7?style=for-the-badge"> |
|
|
| <br><br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=blur&height=180&text=GVM&fontSize=56&animation=twinkling&fontColor=ffffff&color=0:0891B2,50:06B6D4,100:22D3EE"/> |
|
|
| </div> |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| # Vision, Simplified. |
|
|
| Small models can recognize more than their size suggests. |
|
|
| GVM explores efficient computer vision using lightweight architectures, |
| fast inference, and practical deployment. |
|
|
| Designed to run almost anywhere. |
|
|
| </div> |
|
|
| --- |
|
|
| # Classification Performance |
|
|
| <div align="center"> |
|
|
| | Epoch | Training Loss | Validation Accuracy | |
| |:------:|:-------------:|:------------------:| |
| | **1** | 3.36 | **41.75%** | |
| | **2** | 2.78 | **47.14%** | |
| | **3** | 2.64 | **47.40%** | |
|
|
| </div> |
|
|
| --- |
|
|
| # Quick Start |
|
|
| ```python |
| import torch |
| import torchvision.transforms as transforms |
| import timm |
| import requests |
| import json |
| from PIL import Image |
| |
| config = json.loads( |
| requests.get( |
| "https://huggingface.co/WhirlwindAI/GVM/resolve/main/config.json" |
| ).text |
| ) |
| |
| model = timm.create_model( |
| "mobilenetv2_100", |
| pretrained=False, |
| num_classes=config["num_classes"] |
| ) |
| |
| state = torch.hub.load_state_dict_from_url( |
| "https://huggingface.co/WhirlwindAI/GVM/resolve/main/model.pth", |
| map_location="cpu" |
| ) |
| |
| model.load_state_dict(state) |
| model.eval() |
| |
| transform = transforms.Compose([ |
| transforms.Resize(256), |
| transforms.CenterCrop(224), |
| transforms.ToTensor(), |
| transforms.Normalize( |
| mean=[0.485,0.456,0.406], |
| std=[0.229,0.224,0.225] |
| ) |
| ]) |
| |
| image = Image.open("image.jpg").convert("RGB") |
| tensor = transform(image).unsqueeze(0) |
| |
| prediction = model(tensor).argmax(1).item() |
| |
| print(config["class_names"][prediction]) |
| ``` |
|
|
| --- |
|
|
| # Highlights |
|
|
| <div align="center"> |
|
|
| | | | |
| |:---:|:---| |
| | **Architecture** | MobileNetV2 | |
| | **Dataset** | CIFAR-100 | |
| | **Classes** | 100 | |
| | **Model Size** | 14 MB | |
| | **Framework** | PyTorch | |
| | **Inference** | CPU & GPU Friendly | |
|
|
| </div> |
|
|
| --- |
|
|
| # Repository Contents |
|
|
| ``` |
| model.pth |
| config.json |
| README.md |
| ``` |
|
|
| --- |
|
|
| # Current Limitations |
|
|
| - Trained for only **3 epochs** |
| - Frozen backbone during training |
| - CIFAR-100 is considerably harder than CIFAR-10 |
| - Intended as an efficient baseline rather than a state-of-the-art classifier |
|
|
| --- |
|
|
| # Roadmap |
|
|
| - Higher resolution training |
| - Full backbone fine-tuning |
| - Improved augmentation |
| - ONNX export |
| - TensorRT support |
| - Interactive demo |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| <img src="https://readme-typing-svg.demolab.com?font=JetBrains+Mono&weight=600&size=17&duration=2000&pause=1200&color=22D3EE¢er=true&vCenter=true&width=650&lines=capturing+features...;extracting+patterns...;classifying+images...;optimizing+accuracy...;ready." /> |
|
|
| <br> |
|
|
| <img src="https://img.shields.io/badge/Built%20by-WhirlwindAI-0891B2?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Open-Research-06B6D4?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Computer-Vision-22D3EE?style=for-the-badge"> |
|
|
| <br><br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=rect&height=120&text=General%20Vision%20Model&fontSize=34&animation=blinking&fontColor=ffffff&color=0:0F172A,50:0891B2,100:22D3EE"/> |
|
|
| </div> |