| --- |
| license: apache-2.0 |
| tags: |
| - image-classification |
| - mobilenet-v3 |
| - video-games |
| - real-time |
| pipeline_tag: image-classification |
| library_name: pytorch |
| --- |
| |
| # Game classifier — MobileNetV3-Small (CS2 / Dota 2 / Valorant) |
|
|
| 3-class image classifier used as the §1.1 hot-path detector in |
| [coach-api](https://github.com/bashiryounis/gamed), a real-time AI coaching |
| service for competitive PC games. Decides which game is on screen so the |
| service routes the frame to the correct per-game event-extraction pipeline. |
|
|
| ## Classes |
|
|
| `cs2`, `dota2`, `valorant` — in that order along the logits axis. |
|
|
| ## Architecture |
|
|
| - `torchvision.models.mobilenet_v3_small(weights=None)` |
| - `classifier[3]` replaced with `nn.Linear(in_features, 3)` |
| - Input: 224×224 RGB, ImageNet mean/std normalization |
| - Output: softmax over 3 classes |
|
|
| ## Files |
|
|
| | File | Purpose | |
| |---|---| |
| | `game_classifier_mobilenet_v3_small.pt` | Checkpoint. Dict with `state_dict`, `imagenet_mean`, `imagenet_std`. | |
| | `detector_comparison.json` | Custom-CNN vs MobileNetV3 head-to-head: per-class P/R/F1, latency p50/p95/p99, training time, size. | |
| | `confusion_matrices.png` | Side-by-side confusion matrices for both candidate detectors on the training split. | |
|
|
| ## Loading |
|
|
| ```python |
| import torch |
| from torch import nn |
| from torchvision import models |
| |
| CLASSES = ("cs2", "dota2", "valorant") |
| ckpt = torch.load("game_classifier_mobilenet_v3_small.pt", map_location="cpu", weights_only=True) |
| |
| model = models.mobilenet_v3_small(weights=None) |
| model.classifier[3] = nn.Linear(model.classifier[3].in_features, len(CLASSES)) |
| model.load_state_dict(ckpt["state_dict"]) |
| model.eval() |
| ``` |
|
|
| Preprocessing: BGR → RGB → resize 224×224 → `transforms.ToTensor()` → |
| `transforms.Normalize(ckpt["imagenet_mean"], ckpt["imagenet_std"])`. See |
| [`api/src/coach_api/services/detectors/game_classifier.py`](https://github.com/bashiryounis/gamed/blob/main/api/src/coach_api/services/detectors/game_classifier.py) |
| for the production preprocessing exact match. |
|
|
| ## Training |
|
|
| - Data: 6450 frames extracted at 0.4 fps from 10 YouTube gameplay videos |
| (3 cs2 / 3 dota2 / 4 valorant). Same `dataset/manifest.csv` schema |
| published alongside the dataset repo. |
| - Split: frame-level `random_split(0.70 / 0.15 / 0.15)`. **Same videos |
| appear in train and val** — useful for hyperparameter search, *not* a |
| fair generalization estimate. See "Honest eval" below. |
| - Loss: cross-entropy |
| - Optimizer / schedule: see the training notebook |
| [`notebook/Game_Classifier_…ipynb`](https://github.com/bashiryounis/gamed/tree/main/notebook). |
| - Random seed: 42 |
|
|
| ### Training-split metrics (`detector_comparison.json`) |
| |
| | Model | Acc | Size | Mean latency (CPU) | |
| |---|---|---|---| |
| | Custom CNN | 98.76 % | 1.5 MB | 6.3 ms | |
| | **MobileNetV3-Small (winner)** | **99.28 %** | 5.8 MB | 10.2 ms | |
| |
| The MobileNetV3 was selected for its better F1 on the harder valorant |
| class and headroom against domain shift, despite the larger size. |
| |
| ## Honest eval — held-out videos (this is the number to cite) |
| |
| The training split is video-leaky. To get a fair estimate we extracted |
| frames from **3 fresh YouTube videos** (one per class) with zero |
| `video_id` overlap with training: |
|
|
| | Set | Frames | Accuracy | Errors | |
| |---|---|---|---| |
| | 150 / class evenly-spaced subsample | 450 | **100 %** | 0 | |
| | Full extraction | 1118 | **99.91 %** | 1 | |
|
|
| The single error is a Valorant **round-end "WON" overlay** misread as |
| CS2 with confidence 0.89. Valorant's orange/red post-round Combat |
| Report panel visually mimics CS2's MVP card. Spectator / round-end |
| frames are under-represented in training. |
|
|
| Eval tool: [`cli/gamed_classification_eval.py`](https://github.com/bashiryounis/gamed/blob/main/cli/gamed_classification_eval.py). |
| Eval set: companion dataset |
| [`ybashir/gamed-game-classification-dataset`](https://huggingface.co/datasets/ybashir/gamed-game-classification-dataset). |
|
|
| ## Latency |
|
|
| Reported by `detector_comparison.json` on the training split: |
|
|
| | Percentile | Custom CNN | MobileNetV3-Small | |
| |---|---|---| |
| | p50 | 5.96 ms | 9.73 ms | |
| | p95 | 9.11 ms | 13.67 ms | |
| | p99 | 10.41 ms | 16.88 ms | |
|
|
| The held-out eval CLI measures **7.83 ms mean** end-to-end (cv2 imread + |
| preprocessing + forward pass) on CPU. |
|
|
| In production (`coach-api` service, CPU container) the per-frame |
| `t_classify_us` stamp averages ~21 ms — the difference is service |
| overhead (frame ingest, payload decode, logging). |
|
|
| All well inside the brief's 30 ms p95 budget. |
|
|
| ## Intended use |
|
|
| Real-time game detection for a coaching service that needs to route |
| frames to per-game event extractors. Designed for CPU inference at |
| 1–4 fps (VLM path) up to 30 fps (cheap CV path). |
|
|
| ## Limitations |
|
|
| - Trained on YouTube "no-commentary" gameplay videos at 1080p. Frames |
| with heavy streamer overlays, ultrawide aspect ratios, or HDR have |
| not been evaluated. |
| - Three classes only. Adding a fourth (e.g. LoL, Apex) requires |
| retraining the head. |
| - Confidently misclassifies round-end / spectator / replay frames — |
| see "Honest eval" above. Mitigation in production: pipeline gates |
| downstream detectors on confidence and frame-class persistence; a |
| single misread doesn't propagate. |
|
|
| ## Citation / source |
|
|
| Repo: <https://github.com/bashiryounis/gamed> |
|
|