BGMISentiment — gold_v3 (ONNX)

3-class sentiment for BGMI / Indian-esports YouTube live chat — fast, code-mixed (Hindi + English + Hinglish) gaming chat. Fine-tuned from distilbert-base-multilingual-cased, exported to ONNX (fp32 + INT8) for cheap CPU serving.

Classes: 0 negative · 1 neutral · 2 positive. signed = P(pos) − P(neg).
Trained on: ~208k silver rows (random 500/video × 416 streams + slang/emoji prior), combined with a human-reviewed gold set, 70/15/15 split, with gold-distribution class weights [neg 1.2, neu 1.5, pos 0.5].
Gold eval (n=666), calibrated (NEU_GATE=0.55): accuracy 0.84, macro-F1 0.75, neutral recall 0.71, positive recall 0.96.

Files

Path	Size	Use
`model.onnx`	541 MB	fp32 ONNX (self-contained)
`onnx_int8/model_quantized.onnx`	136 MB	INT8 — the deployed artifact (p50 ~7 ms CPU)
`tokenizer.json`, `vocab.txt`, `config.json`	—	WordPiece tokenizer + config

Usage (ONNX Runtime)

import onnxruntime as ort, numpy as np
from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("<this-repo>")
sess = ort.InferenceSession("onnx_int8/model_quantized.onnx", providers=["CPUExecutionProvider"])
enc = tok(["soul clutch insane", "godlike choked again"], padding=True, truncation=True,
          max_length=64, return_tensors="np")
logits = sess.run(None, {k: v for k, v in enc.items() if k in {i.name for i in sess.get_inputs()}})[0]
p = np.exp(logits) / np.exp(logits).sum(-1, keepdims=True)
print(p.argmax(-1))  # 0 neg / 1 neu / 2 pos

Notes & limitations

Tuned for short, slangy, romanized-Hindi esports chat; not a general sentiment model.
Domain slang is blended at inference (e.g. "demon"/"goat" = praise). Serving applies a calibrated neutral gate; raw argmax over-predicts neutral.
Dataset: companion BGMI Live-Chat Sentiment dataset on Kaggle.
Source data = public YouTube live chat; non-commercial / research use.

Downloads last month: 32