Game classifier — MobileNetV3-Small (CS2 / Dota 2 / Valorant)

3-class image classifier used as the §1.1 hot-path detector in coach-api, a real-time AI coaching service for competitive PC games. Decides which game is on screen so the service routes the frame to the correct per-game event-extraction pipeline.

Classes

cs2, dota2, valorant — in that order along the logits axis.

Architecture

torchvision.models.mobilenet_v3_small(weights=None)
classifier[3] replaced with nn.Linear(in_features, 3)
Input: 224×224 RGB, ImageNet mean/std normalization
Output: softmax over 3 classes

Files

File	Purpose
`game_classifier_mobilenet_v3_small.pt`	Checkpoint. Dict with `state_dict`, `imagenet_mean`, `imagenet_std`.
`detector_comparison.json`	Custom-CNN vs MobileNetV3 head-to-head: per-class P/R/F1, latency p50/p95/p99, training time, size.
`confusion_matrices.png`	Side-by-side confusion matrices for both candidate detectors on the training split.

Loading

import torch
from torch import nn
from torchvision import models

CLASSES = ("cs2", "dota2", "valorant")
ckpt = torch.load("game_classifier_mobilenet_v3_small.pt", map_location="cpu", weights_only=True)

model = models.mobilenet_v3_small(weights=None)
model.classifier[3] = nn.Linear(model.classifier[3].in_features, len(CLASSES))
model.load_state_dict(ckpt["state_dict"])
model.eval()

Preprocessing: BGR → RGB → resize 224×224 → transforms.ToTensor() → transforms.Normalize(ckpt["imagenet_mean"], ckpt["imagenet_std"]). See api/src/coach_api/services/detectors/game_classifier.py for the production preprocessing exact match.

Training

Data: 6450 frames extracted at 0.4 fps from 10 YouTube gameplay videos (3 cs2 / 3 dota2 / 4 valorant). Same dataset/manifest.csv schema published alongside the dataset repo.
Split: frame-level random_split(0.70 / 0.15 / 0.15). Same videos appear in train and val — useful for hyperparameter search, not a fair generalization estimate. See "Honest eval" below.
Loss: cross-entropy
Optimizer / schedule: see the training notebook notebook/Game_Classifier_…ipynb.
Random seed: 42

Training-split metrics (`detector_comparison.json`)

Model	Acc	Size	Mean latency (CPU)
Custom CNN	98.76 %	1.5 MB	6.3 ms
MobileNetV3-Small (winner)	99.28 %	5.8 MB	10.2 ms

The MobileNetV3 was selected for its better F1 on the harder valorant class and headroom against domain shift, despite the larger size.

Honest eval — held-out videos (this is the number to cite)

The training split is video-leaky. To get a fair estimate we extracted frames from 3 fresh YouTube videos (one per class) with zero video_id overlap with training:

Set	Frames	Accuracy	Errors
150 / class evenly-spaced subsample	450	100 %	0
Full extraction	1118	99.91 %	1

The single error is a Valorant round-end "WON" overlay misread as CS2 with confidence 0.89. Valorant's orange/red post-round Combat Report panel visually mimics CS2's MVP card. Spectator / round-end frames are under-represented in training.

Eval tool: cli/gamed_classification_eval.py. Eval set: companion dataset ybashir/gamed-game-classification-dataset.

Latency

Reported by detector_comparison.json on the training split:

Percentile	Custom CNN	MobileNetV3-Small
p50	5.96 ms	9.73 ms
p95	9.11 ms	13.67 ms
p99	10.41 ms	16.88 ms

The held-out eval CLI measures 7.83 ms mean end-to-end (cv2 imread + preprocessing + forward pass) on CPU.

In production (coach-api service, CPU container) the per-frame t_classify_us stamp averages ~21 ms — the difference is service overhead (frame ingest, payload decode, logging).

All well inside the brief's 30 ms p95 budget.

Intended use

Real-time game detection for a coaching service that needs to route frames to per-game event extractors. Designed for CPU inference at 1–4 fps (VLM path) up to 30 fps (cheap CV path).

Limitations

Trained on YouTube "no-commentary" gameplay videos at 1080p. Frames with heavy streamer overlays, ultrawide aspect ratios, or HDR have not been evaluated.
Three classes only. Adding a fourth (e.g. LoL, Apex) requires retraining the head.
Confidently misclassifies round-end / spectator / replay frames — see "Honest eval" above. Mitigation in production: pipeline gates downstream detectors on confidence and frame-class persistence; a single misread doesn't propagate.

Citation / source

Repo: https://github.com/bashiryounis/gamed

Downloads last month: -; Downloads are not tracked for this model. How to track