Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- image-classification
|
| 5 |
+
- mobilenet-v3
|
| 6 |
+
- video-games
|
| 7 |
+
- real-time
|
| 8 |
+
pipeline_tag: image-classification
|
| 9 |
+
library_name: pytorch
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Game classifier — MobileNetV3-Small (CS2 / Dota 2 / Valorant)
|
| 13 |
+
|
| 14 |
+
3-class image classifier used as the §1.1 hot-path detector in
|
| 15 |
+
[coach-api](https://github.com/bashiryounis/gamed), a real-time AI coaching
|
| 16 |
+
service for competitive PC games. Decides which game is on screen so the
|
| 17 |
+
service routes the frame to the correct per-game event-extraction pipeline.
|
| 18 |
+
|
| 19 |
+
## Classes
|
| 20 |
+
|
| 21 |
+
`cs2`, `dota2`, `valorant` — in that order along the logits axis.
|
| 22 |
+
|
| 23 |
+
## Architecture
|
| 24 |
+
|
| 25 |
+
- `torchvision.models.mobilenet_v3_small(weights=None)`
|
| 26 |
+
- `classifier[3]` replaced with `nn.Linear(in_features, 3)`
|
| 27 |
+
- Input: 224×224 RGB, ImageNet mean/std normalization
|
| 28 |
+
- Output: softmax over 3 classes
|
| 29 |
+
|
| 30 |
+
## Files
|
| 31 |
+
|
| 32 |
+
| File | Purpose |
|
| 33 |
+
|---|---|
|
| 34 |
+
| `game_classifier_mobilenet_v3_small.pt` | Checkpoint. Dict with `state_dict`, `imagenet_mean`, `imagenet_std`. |
|
| 35 |
+
| `detector_comparison.json` | Custom-CNN vs MobileNetV3 head-to-head: per-class P/R/F1, latency p50/p95/p99, training time, size. |
|
| 36 |
+
| `confusion_matrices.png` | Side-by-side confusion matrices for both candidate detectors on the training split. |
|
| 37 |
+
|
| 38 |
+
## Loading
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
import torch
|
| 42 |
+
from torch import nn
|
| 43 |
+
from torchvision import models
|
| 44 |
+
|
| 45 |
+
CLASSES = ("cs2", "dota2", "valorant")
|
| 46 |
+
ckpt = torch.load("game_classifier_mobilenet_v3_small.pt", map_location="cpu", weights_only=True)
|
| 47 |
+
|
| 48 |
+
model = models.mobilenet_v3_small(weights=None)
|
| 49 |
+
model.classifier[3] = nn.Linear(model.classifier[3].in_features, len(CLASSES))
|
| 50 |
+
model.load_state_dict(ckpt["state_dict"])
|
| 51 |
+
model.eval()
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
Preprocessing: BGR → RGB → resize 224×224 → `transforms.ToTensor()` →
|
| 55 |
+
`transforms.Normalize(ckpt["imagenet_mean"], ckpt["imagenet_std"])`. See
|
| 56 |
+
[`api/src/coach_api/services/detectors/game_classifier.py`](https://github.com/bashiryounis/gamed/blob/main/api/src/coach_api/services/detectors/game_classifier.py)
|
| 57 |
+
for the production preprocessing exact match.
|
| 58 |
+
|
| 59 |
+
## Training
|
| 60 |
+
|
| 61 |
+
- Data: 6450 frames extracted at 0.4 fps from 10 YouTube gameplay videos
|
| 62 |
+
(3 cs2 / 3 dota2 / 4 valorant). Same `dataset/manifest.csv` schema
|
| 63 |
+
published alongside the dataset repo.
|
| 64 |
+
- Split: frame-level `random_split(0.70 / 0.15 / 0.15)`. **Same videos
|
| 65 |
+
appear in train and val** — useful for hyperparameter search, *not* a
|
| 66 |
+
fair generalization estimate. See "Honest eval" below.
|
| 67 |
+
- Loss: cross-entropy
|
| 68 |
+
- Optimizer / schedule: see the training notebook
|
| 69 |
+
[`notebook/Game_Classifier_…ipynb`](https://github.com/bashiryounis/gamed/tree/main/notebook).
|
| 70 |
+
- Random seed: 42
|
| 71 |
+
|
| 72 |
+
### Training-split metrics (`detector_comparison.json`)
|
| 73 |
+
|
| 74 |
+
| Model | Acc | Size | Mean latency (CPU) |
|
| 75 |
+
|---|---|---|---|
|
| 76 |
+
| Custom CNN | 98.76 % | 1.5 MB | 6.3 ms |
|
| 77 |
+
| **MobileNetV3-Small (winner)** | **99.28 %** | 5.8 MB | 10.2 ms |
|
| 78 |
+
|
| 79 |
+
The MobileNetV3 was selected for its better F1 on the harder valorant
|
| 80 |
+
class and headroom against domain shift, despite the larger size.
|
| 81 |
+
|
| 82 |
+
## Honest eval — held-out videos (this is the number to cite)
|
| 83 |
+
|
| 84 |
+
The training split is video-leaky. To get a fair estimate we extracted
|
| 85 |
+
frames from **3 fresh YouTube videos** (one per class) with zero
|
| 86 |
+
`video_id` overlap with training:
|
| 87 |
+
|
| 88 |
+
| Set | Frames | Accuracy | Errors |
|
| 89 |
+
|---|---|---|---|
|
| 90 |
+
| 150 / class evenly-spaced subsample | 450 | **100 %** | 0 |
|
| 91 |
+
| Full extraction | 1118 | **99.91 %** | 1 |
|
| 92 |
+
|
| 93 |
+
The single error is a Valorant **round-end "WON" overlay** misread as
|
| 94 |
+
CS2 with confidence 0.89. Valorant's orange/red post-round Combat
|
| 95 |
+
Report panel visually mimics CS2's MVP card. Spectator / round-end
|
| 96 |
+
frames are under-represented in training.
|
| 97 |
+
|
| 98 |
+
Eval tool: [`cli/gamed_classification_eval.py`](https://github.com/bashiryounis/gamed/blob/main/cli/gamed_classification_eval.py).
|
| 99 |
+
Eval set: companion dataset
|
| 100 |
+
[`ybashir/gamed-game-classification-dataset`](https://huggingface.co/datasets/ybashir/gamed-game-classification-dataset).
|
| 101 |
+
|
| 102 |
+
## Latency
|
| 103 |
+
|
| 104 |
+
Reported by `detector_comparison.json` on the training split:
|
| 105 |
+
|
| 106 |
+
| Percentile | Custom CNN | MobileNetV3-Small |
|
| 107 |
+
|---|---|---|
|
| 108 |
+
| p50 | 5.96 ms | 9.73 ms |
|
| 109 |
+
| p95 | 9.11 ms | 13.67 ms |
|
| 110 |
+
| p99 | 10.41 ms | 16.88 ms |
|
| 111 |
+
|
| 112 |
+
The held-out eval CLI measures **7.83 ms mean** end-to-end (cv2 imread +
|
| 113 |
+
preprocessing + forward pass) on CPU.
|
| 114 |
+
|
| 115 |
+
In production (`coach-api` service, CPU container) the per-frame
|
| 116 |
+
`t_classify_us` stamp averages ~21 ms — the difference is service
|
| 117 |
+
overhead (frame ingest, payload decode, logging).
|
| 118 |
+
|
| 119 |
+
All well inside the brief's 30 ms p95 budget.
|
| 120 |
+
|
| 121 |
+
## Intended use
|
| 122 |
+
|
| 123 |
+
Real-time game detection for a coaching service that needs to route
|
| 124 |
+
frames to per-game event extractors. Designed for CPU inference at
|
| 125 |
+
1–4 fps (VLM path) up to 30 fps (cheap CV path).
|
| 126 |
+
|
| 127 |
+
## Limitations
|
| 128 |
+
|
| 129 |
+
- Trained on YouTube "no-commentary" gameplay videos at 1080p. Frames
|
| 130 |
+
with heavy streamer overlays, ultrawide aspect ratios, or HDR have
|
| 131 |
+
not been evaluated.
|
| 132 |
+
- Three classes only. Adding a fourth (e.g. LoL, Apex) requires
|
| 133 |
+
retraining the head.
|
| 134 |
+
- Confidently misclassifies round-end / spectator / replay frames —
|
| 135 |
+
see "Honest eval" above. Mitigation in production: pipeline gates
|
| 136 |
+
downstream detectors on confidence and frame-class persistence; a
|
| 137 |
+
single misread doesn't propagate.
|
| 138 |
+
|
| 139 |
+
## Citation / source
|
| 140 |
+
|
| 141 |
+
Repo: <https://github.com/bashiryounis/gamed>
|