GVM / README.md
QuantaSparkLabs's picture
Create README.md
7851773 verified
|
Raw
History Blame Contribute Delete
4.06 kB
---
language: en
license: apache-2.0
pipeline_tag: image-classification
tags:
- computer-vision
- image-classification
- mobilenet-v2
- cifar100
- whirlwindai
datasets:
- cifar100
metrics:
- accuracy
---
<p align="center">
<img src="https://raw.githubusercontent.com/Platane/snk/output/github-contribution-grid-snake-dark.svg">
</p>
<div align="center">
<img src="https://readme-typing-svg.demolab.com?font=Space+Grotesk&weight=700&size=28&duration=2600&pause=1200&color=22D3EE&center=true&vCenter=true&width=720&lines=GVM;General+Vision+Model;Seeing+Patterns.;Built+for+Speed." />
<br>
<img src="https://img.shields.io/badge/Model-MobileNetV2-06B6D4?style=for-the-badge">
<img src="https://img.shields.io/badge/Parameters-14MB-0EA5E9?style=for-the-badge">
<img src="https://img.shields.io/badge/Dataset-CIFAR--100-14B8A6?style=for-the-badge">
<img src="https://img.shields.io/badge/Framework-PyTorch-0284C7?style=for-the-badge">
<br><br>
<img src="https://capsule-render.vercel.app/api?type=blur&height=180&text=GVM&fontSize=56&animation=twinkling&fontColor=ffffff&color=0:0891B2,50:06B6D4,100:22D3EE"/>
</div>
---
<div align="center">
# Vision, Simplified.
Small models can recognize more than their size suggests.
GVM explores efficient computer vision using lightweight architectures,
fast inference, and practical deployment.
Designed to run almost anywhere.
</div>
---
# Classification Performance
<div align="center">
| Epoch | Training Loss | Validation Accuracy |
|:------:|:-------------:|:------------------:|
| **1** | 3.36 | **41.75%** |
| **2** | 2.78 | **47.14%** |
| **3** | 2.64 | **47.40%** |
</div>
---
# Quick Start
```python
import torch
import torchvision.transforms as transforms
import timm
import requests
import json
from PIL import Image
config = json.loads(
requests.get(
"https://huggingface.co/WhirlwindAI/GVM/resolve/main/config.json"
).text
)
model = timm.create_model(
"mobilenetv2_100",
pretrained=False,
num_classes=config["num_classes"]
)
state = torch.hub.load_state_dict_from_url(
"https://huggingface.co/WhirlwindAI/GVM/resolve/main/model.pth",
map_location="cpu"
)
model.load_state_dict(state)
model.eval()
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485,0.456,0.406],
std=[0.229,0.224,0.225]
)
])
image = Image.open("image.jpg").convert("RGB")
tensor = transform(image).unsqueeze(0)
prediction = model(tensor).argmax(1).item()
print(config["class_names"][prediction])
```
---
# Highlights
<div align="center">
| | |
|:---:|:---|
| **Architecture** | MobileNetV2 |
| **Dataset** | CIFAR-100 |
| **Classes** | 100 |
| **Model Size** | 14 MB |
| **Framework** | PyTorch |
| **Inference** | CPU & GPU Friendly |
</div>
---
# Repository Contents
```
model.pth
config.json
README.md
```
---
# Current Limitations
- Trained for only **3 epochs**
- Frozen backbone during training
- CIFAR-100 is considerably harder than CIFAR-10
- Intended as an efficient baseline rather than a state-of-the-art classifier
---
# Roadmap
- Higher resolution training
- Full backbone fine-tuning
- Improved augmentation
- ONNX export
- TensorRT support
- Interactive demo
---
<div align="center">
<img src="https://readme-typing-svg.demolab.com?font=JetBrains+Mono&weight=600&size=17&duration=2000&pause=1200&color=22D3EE&center=true&vCenter=true&width=650&lines=capturing+features...;extracting+patterns...;classifying+images...;optimizing+accuracy...;ready." />
<br>
<img src="https://img.shields.io/badge/Built%20by-WhirlwindAI-0891B2?style=for-the-badge">
<img src="https://img.shields.io/badge/Open-Research-06B6D4?style=for-the-badge">
<img src="https://img.shields.io/badge/Computer-Vision-22D3EE?style=for-the-badge">
<br><br>
<img src="https://capsule-render.vercel.app/api?type=rect&height=120&text=General%20Vision%20Model&fontSize=34&animation=blinking&fontColor=ffffff&color=0:0F172A,50:0891B2,100:22D3EE"/>
</div>