Game classifier β€” MobileNetV3-Small (CS2 / Dota 2 / Valorant)

3-class image classifier used as the Β§1.1 hot-path detector in coach-api, a real-time AI coaching service for competitive PC games. Decides which game is on screen so the service routes the frame to the correct per-game event-extraction pipeline.

Classes

cs2, dota2, valorant β€” in that order along the logits axis.

Architecture

  • torchvision.models.mobilenet_v3_small(weights=None)
  • classifier[3] replaced with nn.Linear(in_features, 3)
  • Input: 224Γ—224 RGB, ImageNet mean/std normalization
  • Output: softmax over 3 classes

Files

File Purpose
game_classifier_mobilenet_v3_small.pt Checkpoint. Dict with state_dict, imagenet_mean, imagenet_std.
detector_comparison.json Custom-CNN vs MobileNetV3 head-to-head: per-class P/R/F1, latency p50/p95/p99, training time, size.
confusion_matrices.png Side-by-side confusion matrices for both candidate detectors on the training split.

Loading

import torch
from torch import nn
from torchvision import models

CLASSES = ("cs2", "dota2", "valorant")
ckpt = torch.load("game_classifier_mobilenet_v3_small.pt", map_location="cpu", weights_only=True)

model = models.mobilenet_v3_small(weights=None)
model.classifier[3] = nn.Linear(model.classifier[3].in_features, len(CLASSES))
model.load_state_dict(ckpt["state_dict"])
model.eval()

Preprocessing: BGR β†’ RGB β†’ resize 224Γ—224 β†’ transforms.ToTensor() β†’ transforms.Normalize(ckpt["imagenet_mean"], ckpt["imagenet_std"]). See api/src/coach_api/services/detectors/game_classifier.py for the production preprocessing exact match.

Training

  • Data: 6450 frames extracted at 0.4 fps from 10 YouTube gameplay videos (3 cs2 / 3 dota2 / 4 valorant). Same dataset/manifest.csv schema published alongside the dataset repo.
  • Split: frame-level random_split(0.70 / 0.15 / 0.15). Same videos appear in train and val β€” useful for hyperparameter search, not a fair generalization estimate. See "Honest eval" below.
  • Loss: cross-entropy
  • Optimizer / schedule: see the training notebook notebook/Game_Classifier_…ipynb.
  • Random seed: 42

Training-split metrics (detector_comparison.json)

Model Acc Size Mean latency (CPU)
Custom CNN 98.76 % 1.5 MB 6.3 ms
MobileNetV3-Small (winner) 99.28 % 5.8 MB 10.2 ms

The MobileNetV3 was selected for its better F1 on the harder valorant class and headroom against domain shift, despite the larger size.

Honest eval β€” held-out videos (this is the number to cite)

The training split is video-leaky. To get a fair estimate we extracted frames from 3 fresh YouTube videos (one per class) with zero video_id overlap with training:

Set Frames Accuracy Errors
150 / class evenly-spaced subsample 450 100 % 0
Full extraction 1118 99.91 % 1

The single error is a Valorant round-end "WON" overlay misread as CS2 with confidence 0.89. Valorant's orange/red post-round Combat Report panel visually mimics CS2's MVP card. Spectator / round-end frames are under-represented in training.

Eval tool: cli/gamed_classification_eval.py. Eval set: companion dataset ybashir/gamed-game-classification-dataset.

Latency

Reported by detector_comparison.json on the training split:

Percentile Custom CNN MobileNetV3-Small
p50 5.96 ms 9.73 ms
p95 9.11 ms 13.67 ms
p99 10.41 ms 16.88 ms

The held-out eval CLI measures 7.83 ms mean end-to-end (cv2 imread + preprocessing + forward pass) on CPU.

In production (coach-api service, CPU container) the per-frame t_classify_us stamp averages ~21 ms β€” the difference is service overhead (frame ingest, payload decode, logging).

All well inside the brief's 30 ms p95 budget.

Intended use

Real-time game detection for a coaching service that needs to route frames to per-game event extractors. Designed for CPU inference at 1–4 fps (VLM path) up to 30 fps (cheap CV path).

Limitations

  • Trained on YouTube "no-commentary" gameplay videos at 1080p. Frames with heavy streamer overlays, ultrawide aspect ratios, or HDR have not been evaluated.
  • Three classes only. Adding a fourth (e.g. LoL, Apex) requires retraining the head.
  • Confidently misclassifies round-end / spectator / replay frames β€” see "Honest eval" above. Mitigation in production: pipeline gates downstream detectors on confidence and frame-class persistence; a single misread doesn't propagate.

Citation / source

Repo: https://github.com/bashiryounis/gamed

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support