Upload README.md with huggingface_hub

43b3b73 verified 4 days ago

5.23 kB

	---
	license: apache-2.0
	tags:
	- image-classification
	- mobilenet-v3
	- video-games
	- real-time
	pipeline_tag: image-classification
	library_name: pytorch
	---

	# Game classifier — MobileNetV3-Small (CS2 / Dota 2 / Valorant)

	3-class image classifier used as the §1.1 hot-path detector in
	[coach-api](https://github.com/bashiryounis/gamed), a real-time AI coaching
	service for competitive PC games. Decides which game is on screen so the
	service routes the frame to the correct per-game event-extraction pipeline.

	## Classes

	`cs2`, `dota2`, `valorant` — in that order along the logits axis.

	## Architecture

	- `torchvision.models.mobilenet_v3_small(weights=None)`
	- `classifier[3]` replaced with `nn.Linear(in_features, 3)`
	- Input: 224×224 RGB, ImageNet mean/std normalization
	- Output: softmax over 3 classes

	## Files

	\| File \| Purpose \|
	\|---\|---\|
	\| `game_classifier_mobilenet_v3_small.pt` \| Checkpoint. Dict with `state_dict`, `imagenet_mean`, `imagenet_std`. \|
	\| `detector_comparison.json` \| Custom-CNN vs MobileNetV3 head-to-head: per-class P/R/F1, latency p50/p95/p99, training time, size. \|
	\| `confusion_matrices.png` \| Side-by-side confusion matrices for both candidate detectors on the training split. \|

	## Loading

	```python
	import torch
	from torch import nn
	from torchvision import models

	CLASSES = ("cs2", "dota2", "valorant")
	ckpt = torch.load("game_classifier_mobilenet_v3_small.pt", map_location="cpu", weights_only=True)

	model = models.mobilenet_v3_small(weights=None)
	model.classifier[3] = nn.Linear(model.classifier[3].in_features, len(CLASSES))
	model.load_state_dict(ckpt["state_dict"])
	model.eval()
	```

	Preprocessing: BGR → RGB → resize 224×224 → `transforms.ToTensor()` →
	`transforms.Normalize(ckpt["imagenet_mean"], ckpt["imagenet_std"])`. See
	[`api/src/coach_api/services/detectors/game_classifier.py`](https://github.com/bashiryounis/gamed/blob/main/api/src/coach_api/services/detectors/game_classifier.py)
	for the production preprocessing exact match.

	## Training

	- Data: 6450 frames extracted at 0.4 fps from 10 YouTube gameplay videos
	(3 cs2 / 3 dota2 / 4 valorant). Same `dataset/manifest.csv` schema
	published alongside the dataset repo.
	- Split: frame-level `random_split(0.70 / 0.15 / 0.15)`. **Same videos
	appear in train and val** — useful for hyperparameter search, not a
	fair generalization estimate. See "Honest eval" below.
	- Loss: cross-entropy
	- Optimizer / schedule: see the training notebook
	[`notebook/Game_Classifier_…ipynb`](https://github.com/bashiryounis/gamed/tree/main/notebook).
	- Random seed: 42

	### Training-split metrics (`detector_comparison.json`)

	\| Model \| Acc \| Size \| Mean latency (CPU) \|
	\|---\|---\|---\|---\|
	\| Custom CNN \| 98.76 % \| 1.5 MB \| 6.3 ms \|
	\| MobileNetV3-Small (winner) \| 99.28 % \| 5.8 MB \| 10.2 ms \|

	The MobileNetV3 was selected for its better F1 on the harder valorant
	class and headroom against domain shift, despite the larger size.

	## Honest eval — held-out videos (this is the number to cite)

	The training split is video-leaky. To get a fair estimate we extracted
	frames from 3 fresh YouTube videos (one per class) with zero
	`video_id` overlap with training:

	\| Set \| Frames \| Accuracy \| Errors \|
	\|---\|---\|---\|---\|
	\| 150 / class evenly-spaced subsample \| 450 \| 100 % \| 0 \|
	\| Full extraction \| 1118 \| 99.91 % \| 1 \|

	The single error is a Valorant round-end "WON" overlay misread as
	CS2 with confidence 0.89. Valorant's orange/red post-round Combat
	Report panel visually mimics CS2's MVP card. Spectator / round-end
	frames are under-represented in training.

	Eval tool: [`cli/gamed_classification_eval.py`](https://github.com/bashiryounis/gamed/blob/main/cli/gamed_classification_eval.py).
	Eval set: companion dataset
	[`ybashir/gamed-game-classification-dataset`](https://huggingface.co/datasets/ybashir/gamed-game-classification-dataset).

	## Latency

	Reported by `detector_comparison.json` on the training split:

	\| Percentile \| Custom CNN \| MobileNetV3-Small \|
	\|---\|---\|---\|
	\| p50 \| 5.96 ms \| 9.73 ms \|
	\| p95 \| 9.11 ms \| 13.67 ms \|
	\| p99 \| 10.41 ms \| 16.88 ms \|

	The held-out eval CLI measures 7.83 ms mean end-to-end (cv2 imread +
	preprocessing + forward pass) on CPU.

	In production (`coach-api` service, CPU container) the per-frame
	`t_classify_us` stamp averages ~21 ms — the difference is service
	overhead (frame ingest, payload decode, logging).

	All well inside the brief's 30 ms p95 budget.

	## Intended use

	Real-time game detection for a coaching service that needs to route
	frames to per-game event extractors. Designed for CPU inference at
	1–4 fps (VLM path) up to 30 fps (cheap CV path).

	## Limitations

	- Trained on YouTube "no-commentary" gameplay videos at 1080p. Frames
	with heavy streamer overlays, ultrawide aspect ratios, or HDR have
	not been evaluated.
	- Three classes only. Adding a fourth (e.g. LoL, Apex) requires
	retraining the head.
	- Confidently misclassifies round-end / spectator / replay frames —
	see "Honest eval" above. Mitigation in production: pipeline gates
	downstream detectors on confidence and frame-class persistence; a
	single misread doesn't propagate.

	## Citation / source

	Repo: <https://github.com/bashiryounis/gamed>