ybashir commited on
Commit
43b3b73
·
verified ·
1 Parent(s): 1e7608c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - image-classification
5
+ - mobilenet-v3
6
+ - video-games
7
+ - real-time
8
+ pipeline_tag: image-classification
9
+ library_name: pytorch
10
+ ---
11
+
12
+ # Game classifier — MobileNetV3-Small (CS2 / Dota 2 / Valorant)
13
+
14
+ 3-class image classifier used as the §1.1 hot-path detector in
15
+ [coach-api](https://github.com/bashiryounis/gamed), a real-time AI coaching
16
+ service for competitive PC games. Decides which game is on screen so the
17
+ service routes the frame to the correct per-game event-extraction pipeline.
18
+
19
+ ## Classes
20
+
21
+ `cs2`, `dota2`, `valorant` — in that order along the logits axis.
22
+
23
+ ## Architecture
24
+
25
+ - `torchvision.models.mobilenet_v3_small(weights=None)`
26
+ - `classifier[3]` replaced with `nn.Linear(in_features, 3)`
27
+ - Input: 224×224 RGB, ImageNet mean/std normalization
28
+ - Output: softmax over 3 classes
29
+
30
+ ## Files
31
+
32
+ | File | Purpose |
33
+ |---|---|
34
+ | `game_classifier_mobilenet_v3_small.pt` | Checkpoint. Dict with `state_dict`, `imagenet_mean`, `imagenet_std`. |
35
+ | `detector_comparison.json` | Custom-CNN vs MobileNetV3 head-to-head: per-class P/R/F1, latency p50/p95/p99, training time, size. |
36
+ | `confusion_matrices.png` | Side-by-side confusion matrices for both candidate detectors on the training split. |
37
+
38
+ ## Loading
39
+
40
+ ```python
41
+ import torch
42
+ from torch import nn
43
+ from torchvision import models
44
+
45
+ CLASSES = ("cs2", "dota2", "valorant")
46
+ ckpt = torch.load("game_classifier_mobilenet_v3_small.pt", map_location="cpu", weights_only=True)
47
+
48
+ model = models.mobilenet_v3_small(weights=None)
49
+ model.classifier[3] = nn.Linear(model.classifier[3].in_features, len(CLASSES))
50
+ model.load_state_dict(ckpt["state_dict"])
51
+ model.eval()
52
+ ```
53
+
54
+ Preprocessing: BGR → RGB → resize 224×224 → `transforms.ToTensor()` →
55
+ `transforms.Normalize(ckpt["imagenet_mean"], ckpt["imagenet_std"])`. See
56
+ [`api/src/coach_api/services/detectors/game_classifier.py`](https://github.com/bashiryounis/gamed/blob/main/api/src/coach_api/services/detectors/game_classifier.py)
57
+ for the production preprocessing exact match.
58
+
59
+ ## Training
60
+
61
+ - Data: 6450 frames extracted at 0.4 fps from 10 YouTube gameplay videos
62
+ (3 cs2 / 3 dota2 / 4 valorant). Same `dataset/manifest.csv` schema
63
+ published alongside the dataset repo.
64
+ - Split: frame-level `random_split(0.70 / 0.15 / 0.15)`. **Same videos
65
+ appear in train and val** — useful for hyperparameter search, *not* a
66
+ fair generalization estimate. See "Honest eval" below.
67
+ - Loss: cross-entropy
68
+ - Optimizer / schedule: see the training notebook
69
+ [`notebook/Game_Classifier_…ipynb`](https://github.com/bashiryounis/gamed/tree/main/notebook).
70
+ - Random seed: 42
71
+
72
+ ### Training-split metrics (`detector_comparison.json`)
73
+
74
+ | Model | Acc | Size | Mean latency (CPU) |
75
+ |---|---|---|---|
76
+ | Custom CNN | 98.76 % | 1.5 MB | 6.3 ms |
77
+ | **MobileNetV3-Small (winner)** | **99.28 %** | 5.8 MB | 10.2 ms |
78
+
79
+ The MobileNetV3 was selected for its better F1 on the harder valorant
80
+ class and headroom against domain shift, despite the larger size.
81
+
82
+ ## Honest eval — held-out videos (this is the number to cite)
83
+
84
+ The training split is video-leaky. To get a fair estimate we extracted
85
+ frames from **3 fresh YouTube videos** (one per class) with zero
86
+ `video_id` overlap with training:
87
+
88
+ | Set | Frames | Accuracy | Errors |
89
+ |---|---|---|---|
90
+ | 150 / class evenly-spaced subsample | 450 | **100 %** | 0 |
91
+ | Full extraction | 1118 | **99.91 %** | 1 |
92
+
93
+ The single error is a Valorant **round-end "WON" overlay** misread as
94
+ CS2 with confidence 0.89. Valorant's orange/red post-round Combat
95
+ Report panel visually mimics CS2's MVP card. Spectator / round-end
96
+ frames are under-represented in training.
97
+
98
+ Eval tool: [`cli/gamed_classification_eval.py`](https://github.com/bashiryounis/gamed/blob/main/cli/gamed_classification_eval.py).
99
+ Eval set: companion dataset
100
+ [`ybashir/gamed-game-classification-dataset`](https://huggingface.co/datasets/ybashir/gamed-game-classification-dataset).
101
+
102
+ ## Latency
103
+
104
+ Reported by `detector_comparison.json` on the training split:
105
+
106
+ | Percentile | Custom CNN | MobileNetV3-Small |
107
+ |---|---|---|
108
+ | p50 | 5.96 ms | 9.73 ms |
109
+ | p95 | 9.11 ms | 13.67 ms |
110
+ | p99 | 10.41 ms | 16.88 ms |
111
+
112
+ The held-out eval CLI measures **7.83 ms mean** end-to-end (cv2 imread +
113
+ preprocessing + forward pass) on CPU.
114
+
115
+ In production (`coach-api` service, CPU container) the per-frame
116
+ `t_classify_us` stamp averages ~21 ms — the difference is service
117
+ overhead (frame ingest, payload decode, logging).
118
+
119
+ All well inside the brief's 30 ms p95 budget.
120
+
121
+ ## Intended use
122
+
123
+ Real-time game detection for a coaching service that needs to route
124
+ frames to per-game event extractors. Designed for CPU inference at
125
+ 1–4 fps (VLM path) up to 30 fps (cheap CV path).
126
+
127
+ ## Limitations
128
+
129
+ - Trained on YouTube "no-commentary" gameplay videos at 1080p. Frames
130
+ with heavy streamer overlays, ultrawide aspect ratios, or HDR have
131
+ not been evaluated.
132
+ - Three classes only. Adding a fourth (e.g. LoL, Apex) requires
133
+ retraining the head.
134
+ - Confidently misclassifies round-end / spectator / replay frames —
135
+ see "Honest eval" above. Mitigation in production: pipeline gates
136
+ downstream detectors on confidence and frame-class persistence; a
137
+ single misread doesn't propagate.
138
+
139
+ ## Citation / source
140
+
141
+ Repo: <https://github.com/bashiryounis/gamed>