Reddit mood classifier

A 3-class sentiment classifier for Reddit comments, fine-tuned from cardiffnlp/twitter-roberta-base-sentiment-latest.

Output classes: negative / neutral / positive.

Trained on the acidtib/reddit-mood dataset. Evaluate on your own corpus before relying on it outside the training domain.

Labels

Label Numeric score Meaning
negative 25 Anywhere on the negative spectrum: complaints, sarcasm, disappointment, balance gripes, bug-report annoyance, scorched-earth rage, personal attacks on devs, quit threats
neutral 60 Factual, banter, parody/hyperbole, in-domain references without strong real-world emotion
positive 90 Genuine positive, hype, love, excitement

The numeric scores are arbitrary anchors that let you average labels into a single 0-100 mood score per group of comments. Pick your own mapping if these don't fit.

Usage

transformers.js (Node / browser)

import { pipeline } from "@huggingface/transformers";

const classify = await pipeline(
  "text-classification",
  "acidtib/reddit-mood-classifier",
  { dtype: "q8" } // load model_quantized.onnx (~25MB, CPU-friendly)
);

const out = await classify("they nerfed it again, it's over");
// [{ label: "negative", score: 0.81 }]

Python (transformers + onnxruntime)

from transformers import AutoTokenizer
import onnxruntime as ort

tokenizer = AutoTokenizer.from_pretrained("acidtib/reddit-mood-classifier")
session = ort.InferenceSession(
    "onnx/model_quantized.onnx",
    providers=["CPUExecutionProvider"],
)
# tokenize, run argmax, softmax for confidence.

Files

config.json                       HF model config (id2label, label2id)
tokenizer.json + vocab.json + ... HF tokenizer files (RoBERTa BPE)
onnx/model.onnx                   full-precision ONNX (~500MB)
onnx/model_quantized.onnx         int8 dynamic quantized ONNX (~120MB) -
                                  this is what production inference loads
ort_config.json                   ONNX Runtime quantization metadata

Evaluation

Held-out test set (962 rows, never seen by trainer) at 2026-05-04T03:53:59.880132+00:00.

Macro-F1: 0.7259 on 9612-row corpus.

Label Test F1
negative 0.672
neutral 0.836
positive 0.669

Metrics are recomputed from the actually-quantized ONNX file (the one in this repo), not the unquantized PyTorch checkpoint - so the numbers reflect what production inference will see.

Training

  • Base: cardiffnlp/twitter-roberta-base-sentiment-latest (RoBERTa-base, 124M params)
  • Head: warm-started from the base model's existing 3-class sentiment head (label names + id order match)
  • Loss: Class-weighted cross-entropy with sqrt-inverse-frequency weights and label smoothing 0.1
  • Optimizer: AdamW with layer-wise LR decay (0.9), lr=2e-5, weight_decay=0.01
  • Schedule: Up to 4 epochs with EarlyStoppingCallback(patience=2) on val macro-F1
  • Split: Stratified 80/10/10 train/val/test, seed=42
  • Quantization: int8 dynamic (AVX2 CPU), via optimum.onnxruntime

Limitations

  • Labels reflect English-language Reddit conversation conventions. Sarcasm, in-domain aggression, and parody are inherently ambiguous and contribute most of the model's errors.
  • Out-of-domain performance is unevaluated - run your own holdout before depending on it on a different community.
Downloads last month
77
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for acidtib/reddit-mood-classifier

Quantized
(3)
this model

Dataset used to train acidtib/reddit-mood-classifier