File size: 4,064 Bytes
---
language: en
license: apache-2.0
pipeline_tag: image-classification
tags:
- computer-vision
- image-classification
- mobilenet-v2
- cifar100
- whirlwindai
datasets:
- cifar100
metrics:
- accuracy
---

<p align="center">

<img src="https://raw.githubusercontent.com/Platane/snk/output/github-contribution-grid-snake-dark.svg">

</p>

<div align="center">

<img src="https://readme-typing-svg.demolab.com?font=Space+Grotesk&weight=700&size=28&duration=2600&pause=1200&color=22D3EE&center=true&vCenter=true&width=720&lines=GVM;General+Vision+Model;Seeing+Patterns.;Built+for+Speed." />

<br>

<img src="https://img.shields.io/badge/Model-MobileNetV2-06B6D4?style=for-the-badge">
<img src="https://img.shields.io/badge/Parameters-14MB-0EA5E9?style=for-the-badge">
<img src="https://img.shields.io/badge/Dataset-CIFAR--100-14B8A6?style=for-the-badge">
<img src="https://img.shields.io/badge/Framework-PyTorch-0284C7?style=for-the-badge">

<br><br>

<img src="https://capsule-render.vercel.app/api?type=blur&height=180&text=GVM&fontSize=56&animation=twinkling&fontColor=ffffff&color=0:0891B2,50:06B6D4,100:22D3EE"/>

</div>

---

<div align="center">

# Vision, Simplified.

Small models can recognize more than their size suggests.

GVM explores efficient computer vision using lightweight architectures,
fast inference, and practical deployment.

Designed to run almost anywhere.

</div>

---

# Classification Performance

<div align="center">

| Epoch | Training Loss | Validation Accuracy |
|:------:|:-------------:|:------------------:|
| **1** | 3.36 | **41.75%** |
| **2** | 2.78 | **47.14%** |
| **3** | 2.64 | **47.40%** |

</div>

---

# Quick Start

```python
import torch
import torchvision.transforms as transforms
import timm
import requests
import json
from PIL import Image

config = json.loads(
    requests.get(
        "https://huggingface.co/WhirlwindAI/GVM/resolve/main/config.json"
    ).text
)

model = timm.create_model(
    "mobilenetv2_100",
    pretrained=False,
    num_classes=config["num_classes"]
)

state = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/WhirlwindAI/GVM/resolve/main/model.pth",
    map_location="cpu"
)

model.load_state_dict(state)
model.eval()

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485,0.456,0.406],
        std=[0.229,0.224,0.225]
    )
])

image = Image.open("image.jpg").convert("RGB")
tensor = transform(image).unsqueeze(0)

prediction = model(tensor).argmax(1).item()

print(config["class_names"][prediction])
```

---

# Highlights

<div align="center">

| | |
|:---:|:---|
| **Architecture** | MobileNetV2 |
| **Dataset** | CIFAR-100 |
| **Classes** | 100 |
| **Model Size** | 14 MB |
| **Framework** | PyTorch |
| **Inference** | CPU & GPU Friendly |

</div>

---

# Repository Contents

```
model.pth
config.json
README.md
```

---

# Current Limitations

- Trained for only **3 epochs**
- Frozen backbone during training
- CIFAR-100 is considerably harder than CIFAR-10
- Intended as an efficient baseline rather than a state-of-the-art classifier

---

# Roadmap

- Higher resolution training
- Full backbone fine-tuning
- Improved augmentation
- ONNX export
- TensorRT support
- Interactive demo

---

<div align="center">

<img src="https://readme-typing-svg.demolab.com?font=JetBrains+Mono&weight=600&size=17&duration=2000&pause=1200&color=22D3EE&center=true&vCenter=true&width=650&lines=capturing+features...;extracting+patterns...;classifying+images...;optimizing+accuracy...;ready." />

<br>

<img src="https://img.shields.io/badge/Built%20by-WhirlwindAI-0891B2?style=for-the-badge">
<img src="https://img.shields.io/badge/Open-Research-06B6D4?style=for-the-badge">
<img src="https://img.shields.io/badge/Computer-Vision-22D3EE?style=for-the-badge">

<br><br>

<img src="https://capsule-render.vercel.app/api?type=rect&height=120&text=General%20Vision%20Model&fontSize=34&animation=blinking&fontColor=ffffff&color=0:0F172A,50:0891B2,100:22D3EE"/>

</div>