BGMISentiment β gold_v3 (ONNX)
3-class sentiment for BGMI / Indian-esports YouTube live chat β fast, code-mixed
(Hindi + English + Hinglish) gaming chat. Fine-tuned from
distilbert-base-multilingual-cased, exported to ONNX (fp32 + INT8) for cheap CPU
serving.
- Classes:
0 negative Β· 1 neutral Β· 2 positive.signed = P(pos) β P(neg). - Trained on: ~208k silver rows (random 500/video Γ 416 streams + slang/emoji prior),
combined with a human-reviewed gold set, 70/15/15 split, with gold-distribution class
weights
[neg 1.2, neu 1.5, pos 0.5]. - Gold eval (n=666), calibrated (NEU_GATE=0.55): accuracy 0.84, macro-F1 0.75, neutral recall 0.71, positive recall 0.96.
Files
| Path | Size | Use |
|---|---|---|
model.onnx |
541 MB | fp32 ONNX (self-contained) |
onnx_int8/model_quantized.onnx |
136 MB | INT8 β the deployed artifact (p50 ~7 ms CPU) |
tokenizer.json, vocab.txt, *config*.json |
β | WordPiece tokenizer + config |
Usage (ONNX Runtime)
import onnxruntime as ort, numpy as np
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("<this-repo>")
sess = ort.InferenceSession("onnx_int8/model_quantized.onnx", providers=["CPUExecutionProvider"])
enc = tok(["soul clutch insane", "godlike choked again"], padding=True, truncation=True,
max_length=64, return_tensors="np")
logits = sess.run(None, {k: v for k, v in enc.items() if k in {i.name for i in sess.get_inputs()}})[0]
p = np.exp(logits) / np.exp(logits).sum(-1, keepdims=True)
print(p.argmax(-1)) # 0 neg / 1 neu / 2 pos
Notes & limitations
- Tuned for short, slangy, romanized-Hindi esports chat; not a general sentiment model.
- Domain slang is blended at inference (e.g. "demon"/"goat" = praise). Serving applies a calibrated neutral gate; raw argmax over-predicts neutral.
- Dataset: companion BGMI Live-Chat Sentiment dataset on Kaggle.
- Source data = public YouTube live chat; non-commercial / research use.
- Downloads last month
- 32