File size: 5,233 Bytes
43b3b73 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
license: apache-2.0
tags:
- image-classification
- mobilenet-v3
- video-games
- real-time
pipeline_tag: image-classification
library_name: pytorch
---
# Game classifier — MobileNetV3-Small (CS2 / Dota 2 / Valorant)
3-class image classifier used as the §1.1 hot-path detector in
[coach-api](https://github.com/bashiryounis/gamed), a real-time AI coaching
service for competitive PC games. Decides which game is on screen so the
service routes the frame to the correct per-game event-extraction pipeline.
## Classes
`cs2`, `dota2`, `valorant` — in that order along the logits axis.
## Architecture
- `torchvision.models.mobilenet_v3_small(weights=None)`
- `classifier[3]` replaced with `nn.Linear(in_features, 3)`
- Input: 224×224 RGB, ImageNet mean/std normalization
- Output: softmax over 3 classes
## Files
| File | Purpose |
|---|---|
| `game_classifier_mobilenet_v3_small.pt` | Checkpoint. Dict with `state_dict`, `imagenet_mean`, `imagenet_std`. |
| `detector_comparison.json` | Custom-CNN vs MobileNetV3 head-to-head: per-class P/R/F1, latency p50/p95/p99, training time, size. |
| `confusion_matrices.png` | Side-by-side confusion matrices for both candidate detectors on the training split. |
## Loading
```python
import torch
from torch import nn
from torchvision import models
CLASSES = ("cs2", "dota2", "valorant")
ckpt = torch.load("game_classifier_mobilenet_v3_small.pt", map_location="cpu", weights_only=True)
model = models.mobilenet_v3_small(weights=None)
model.classifier[3] = nn.Linear(model.classifier[3].in_features, len(CLASSES))
model.load_state_dict(ckpt["state_dict"])
model.eval()
```
Preprocessing: BGR → RGB → resize 224×224 → `transforms.ToTensor()` →
`transforms.Normalize(ckpt["imagenet_mean"], ckpt["imagenet_std"])`. See
[`api/src/coach_api/services/detectors/game_classifier.py`](https://github.com/bashiryounis/gamed/blob/main/api/src/coach_api/services/detectors/game_classifier.py)
for the production preprocessing exact match.
## Training
- Data: 6450 frames extracted at 0.4 fps from 10 YouTube gameplay videos
(3 cs2 / 3 dota2 / 4 valorant). Same `dataset/manifest.csv` schema
published alongside the dataset repo.
- Split: frame-level `random_split(0.70 / 0.15 / 0.15)`. **Same videos
appear in train and val** — useful for hyperparameter search, *not* a
fair generalization estimate. See "Honest eval" below.
- Loss: cross-entropy
- Optimizer / schedule: see the training notebook
[`notebook/Game_Classifier_…ipynb`](https://github.com/bashiryounis/gamed/tree/main/notebook).
- Random seed: 42
### Training-split metrics (`detector_comparison.json`)
| Model | Acc | Size | Mean latency (CPU) |
|---|---|---|---|
| Custom CNN | 98.76 % | 1.5 MB | 6.3 ms |
| **MobileNetV3-Small (winner)** | **99.28 %** | 5.8 MB | 10.2 ms |
The MobileNetV3 was selected for its better F1 on the harder valorant
class and headroom against domain shift, despite the larger size.
## Honest eval — held-out videos (this is the number to cite)
The training split is video-leaky. To get a fair estimate we extracted
frames from **3 fresh YouTube videos** (one per class) with zero
`video_id` overlap with training:
| Set | Frames | Accuracy | Errors |
|---|---|---|---|
| 150 / class evenly-spaced subsample | 450 | **100 %** | 0 |
| Full extraction | 1118 | **99.91 %** | 1 |
The single error is a Valorant **round-end "WON" overlay** misread as
CS2 with confidence 0.89. Valorant's orange/red post-round Combat
Report panel visually mimics CS2's MVP card. Spectator / round-end
frames are under-represented in training.
Eval tool: [`cli/gamed_classification_eval.py`](https://github.com/bashiryounis/gamed/blob/main/cli/gamed_classification_eval.py).
Eval set: companion dataset
[`ybashir/gamed-game-classification-dataset`](https://huggingface.co/datasets/ybashir/gamed-game-classification-dataset).
## Latency
Reported by `detector_comparison.json` on the training split:
| Percentile | Custom CNN | MobileNetV3-Small |
|---|---|---|
| p50 | 5.96 ms | 9.73 ms |
| p95 | 9.11 ms | 13.67 ms |
| p99 | 10.41 ms | 16.88 ms |
The held-out eval CLI measures **7.83 ms mean** end-to-end (cv2 imread +
preprocessing + forward pass) on CPU.
In production (`coach-api` service, CPU container) the per-frame
`t_classify_us` stamp averages ~21 ms — the difference is service
overhead (frame ingest, payload decode, logging).
All well inside the brief's 30 ms p95 budget.
## Intended use
Real-time game detection for a coaching service that needs to route
frames to per-game event extractors. Designed for CPU inference at
1–4 fps (VLM path) up to 30 fps (cheap CV path).
## Limitations
- Trained on YouTube "no-commentary" gameplay videos at 1080p. Frames
with heavy streamer overlays, ultrawide aspect ratios, or HDR have
not been evaluated.
- Three classes only. Adding a fourth (e.g. LoL, Apex) requires
retraining the head.
- Confidently misclassifies round-end / spectator / replay frames —
see "Honest eval" above. Mitigation in production: pipeline gates
downstream detectors on confidence and frame-class persistence; a
single misread doesn't propagate.
## Citation / source
Repo: <https://github.com/bashiryounis/gamed>
|