GVM / README.md

Create README.md

7851773 verified 2 days ago

4.06 kB

	---
	language: en
	license: apache-2.0
	pipeline_tag: image-classification
	tags:
	- computer-vision
	- image-classification
	- mobilenet-v2
	- cifar100
	- whirlwindai
	datasets:
	- cifar100
	metrics:
	- accuracy
	---

	<p align="center">

	<img src="https://raw.githubusercontent.com/Platane/snk/output/github-contribution-grid-snake-dark.svg">

	</p>

	<div align="center">

	<img src="https://readme-typing-svg.demolab.com?font=Space+Grotesk&weight=700&size=28&duration=2600&pause=1200&color=22D3EE&center=true&vCenter=true&width=720&lines=GVM;General+Vision+Model;Seeing+Patterns.;Built+for+Speed." />

	<br>

	<img src="https://img.shields.io/badge/Model-MobileNetV2-06B6D4?style=for-the-badge">
	<img src="https://img.shields.io/badge/Parameters-14MB-0EA5E9?style=for-the-badge">
	<img src="https://img.shields.io/badge/Dataset-CIFAR--100-14B8A6?style=for-the-badge">
	<img src="https://img.shields.io/badge/Framework-PyTorch-0284C7?style=for-the-badge">

	<br><br>

	<img src="https://capsule-render.vercel.app/api?type=blur&height=180&text=GVM&fontSize=56&animation=twinkling&fontColor=ffffff&color=0:0891B2,50:06B6D4,100:22D3EE"/>

	</div>

	---

	<div align="center">

	# Vision, Simplified.

	Small models can recognize more than their size suggests.

	GVM explores efficient computer vision using lightweight architectures,
	fast inference, and practical deployment.

	Designed to run almost anywhere.

	</div>

	---

	# Classification Performance

	<div align="center">

	\| Epoch \| Training Loss \| Validation Accuracy \|
	\|:------:\|:-------------:\|:------------------:\|
	\| 1 \| 3.36 \| 41.75% \|
	\| 2 \| 2.78 \| 47.14% \|
	\| 3 \| 2.64 \| 47.40% \|

	</div>

	---

	# Quick Start

	```python
	import torch
	import torchvision.transforms as transforms
	import timm
	import requests
	import json
	from PIL import Image

	config = json.loads(
	requests.get(
	"https://huggingface.co/WhirlwindAI/GVM/resolve/main/config.json"
	).text
	)

	model = timm.create_model(
	"mobilenetv2_100",
	pretrained=False,
	num_classes=config["num_classes"]
	)

	state = torch.hub.load_state_dict_from_url(
	"https://huggingface.co/WhirlwindAI/GVM/resolve/main/model.pth",
	map_location="cpu"
	)

	model.load_state_dict(state)
	model.eval()

	transform = transforms.Compose([
	transforms.Resize(256),
	transforms.CenterCrop(224),
	transforms.ToTensor(),
	transforms.Normalize(
	mean=[0.485,0.456,0.406],
	std=[0.229,0.224,0.225]
	)
	])

	image = Image.open("image.jpg").convert("RGB")
	tensor = transform(image).unsqueeze(0)

	prediction = model(tensor).argmax(1).item()

	print(config["class_names"][prediction])
	```

	---

	# Highlights

	<div align="center">

	\| \| \|
	\|:---:\|:---\|
	\| Architecture \| MobileNetV2 \|
	\| Dataset \| CIFAR-100 \|
	\| Classes \| 100 \|
	\| Model Size \| 14 MB \|
	\| Framework \| PyTorch \|
	\| Inference \| CPU & GPU Friendly \|

	</div>

	---

	# Repository Contents

	```
	model.pth
	config.json
	README.md
	```

	---

	# Current Limitations

	- Trained for only 3 epochs
	- Frozen backbone during training
	- CIFAR-100 is considerably harder than CIFAR-10
	- Intended as an efficient baseline rather than a state-of-the-art classifier

	---

	# Roadmap

	- Higher resolution training
	- Full backbone fine-tuning
	- Improved augmentation
	- ONNX export
	- TensorRT support
	- Interactive demo

	---

	<div align="center">

	<img src="https://readme-typing-svg.demolab.com?font=JetBrains+Mono&weight=600&size=17&duration=2000&pause=1200&color=22D3EE&center=true&vCenter=true&width=650&lines=capturing+features...;extracting+patterns...;classifying+images...;optimizing+accuracy...;ready." />

	<br>

	<img src="https://img.shields.io/badge/Built%20by-WhirlwindAI-0891B2?style=for-the-badge">
	<img src="https://img.shields.io/badge/Open-Research-06B6D4?style=for-the-badge">
	<img src="https://img.shields.io/badge/Computer-Vision-22D3EE?style=for-the-badge">

	<br><br>

	<img src="https://capsule-render.vercel.app/api?type=rect&height=120&text=General%20Vision%20Model&fontSize=34&animation=blinking&fontColor=ffffff&color=0:0F172A,50:0891B2,100:22D3EE"/>

	</div>