TextThreat DistilBERT Jigsaw Classifier

This model is part of TextThreat - AI-Powered Detection of Digital Well-Being Risks with Cybersecurity Analytics, an MSc thesis proof-of-concept by Abdul Muksith Rizvi at the University of Doha for Science and Technology.

TextThreat detects digital well-being risk signals in social-media text and exports schema-valid cybersecurity-style events for SIEM analytics, with a Splunk-first hosted demo path.

Model Details

  • Model type: DistilBERT sequence classifier
  • Base model: distilbert-base-uncased
  • Task: Jigsaw Toxic Comment multi-label classification
  • Labels: toxic, severe_toxic, obscene, threat, insult, identity_hate
  • Problem type: multi-label classification
  • Training method: LoRA/PEFT fine-tuning, exported as a merged full Hugging Face model
  • Repository: https://github.com/abdulmuksith3/textthreat-poc
  • Thesis system: TextThreat proof-of-concept for harm detection, SIEM-ready event export, Splunk/OpenSearch analytics, SOAR-lite alerting, latency, calibration, privacy perturbation, and fairness audit artifacts

Intended Use

This model is intended for the TextThreat proof-of-concept pipeline:

social media comment
-> DistilBERT harm scoring
-> TextThreat JSON event schema
-> NDJSON / Splunk HEC export
-> SIEM dashboard and optional SOAR-lite alert

The model can be used to generate per-label harm probabilities for research demonstrations and thesis artifact reproduction.

Out-of-Scope Use

This model is not intended for autonomous moderation, clinical risk assessment, law-enforcement decision-making, employment screening, or other high-stakes decisions. It should not be used as the only basis for action against a person.

Evaluation

The current uploaded artifact corresponds to the thesis proof-of-concept training run. Metrics are stored in the repository under experiments/results/distilbert_metrics.json.

Metric Value
Micro F1 0.7363
Macro F1 0.4206
Macro ROC-AUC 0.9780
Macro PR-AUC 0.5324
Expected Calibration Error 0.0032
Eval loss 0.0471

Limitations

  • The model is trained for thesis proof-of-concept evidence, not as a production moderation service.
  • Toxicity labels come from the Jigsaw Toxic Comment task and may not cover all digital well-being risks.
  • Short, adversarial, sarcastic, reclaimed, or context-dependent text can be misclassified.
  • The live TextThreat demo applies a transparent safety lexical overlay for explicit threats, self-harm terms, and profanity so obvious demo-critical safety cases are not missed by the quick-trained model.
  • Fairness, calibration, and privacy experiments are represented in the companion repository and should be reviewed before deployment-style use.

Example

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_id = "abdulmuksith/textthreat-distilbert-jigsaw"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "I will kill you"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
inputs = {k: v for k, v in inputs.items() if k in model.forward.__code__.co_varnames}

with torch.no_grad():
    logits = model(**inputs).logits[0]

scores = torch.sigmoid(logits)
print({model.config.id2label[i]: float(scores[i]) for i in range(len(scores))})

Citation

If referencing this model, cite the thesis project:

Rizvi, A. M. (2026). TextThreat: AI-Powered Detection of Digital Well-Being Risks with Cybersecurity Analytics. MSc thesis, University of Doha for Science and Technology.
Downloads last month
58
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdulmuksith/textthreat-distilbert-jigsaw

Finetuned
(11770)
this model