X-Box Compulsion & Toxicity Index Classifier
Bayesian temporal phenotyping + 12-head text classification pipeline for detecting compulsive social media usage patterns and computing the Toxicity Index (TI) for political Twitter/X accounts.
Architecture
Temporal Model: Calibrated logistic regression on 5 compulsion signatures (burstiness, time-of-day entropy, Hawkes self-excitation, night intensity, weekend ratio).
Text Classification: 12 heads producing the per-tweet Toxicity Index.
Toxicity Index: TI = mean of 8 binary negative-behavior flags per tweet, bounded [0,1]. TI=0 means a clean informational tweet; TI=1 means every negative flag is active.
Validation
Compulsion Model (n=32, independent ground truth):
- Spearman r = 0.912 (permutation p=0.001, bootstrap 95% CI [0.845, 0.965])
- AUC = 0.933 (permutation p=0.003, bootstrap 95% CI [0.928, 1.000])
- Repeated 5-fold (x20): AUC = 0.953 +/- 0.076
- Brier score: 0.101
Text Classification Label Reliability (test-retest, n=75):
- Ragebait: Pearson r=0.889, Cohen kappa=0.479
- Tribal signal: Pearson r=0.862, Cohen kappa=0.730
- Performative outrage: Pearson r=0.777, Cohen kappa=0.525
Per-Class Performance (12 Classification Heads)
Off-the-Shelf (CardiffNLP Twitter-RoBERTa, ~125M params each)
| Head | Model ID | Classes | Training Data |
|---|---|---|---|
| Sentiment | cardiffnlp/twitter-roberta-base-sentiment-latest | negative, neutral, positive | TweetEval benchmark |
| Emotion | cardiffnlp/twitter-roberta-base-emotion | anger, joy, optimism, sadness | TweetEval |
| Offensive | cardiffnlp/twitter-roberta-base-offensive | not-offensive, offensive | TweetEval |
| Irony | cardiffnlp/twitter-roberta-base-irony | non-irony, irony | TweetEval |
| Hate | cardiffnlp/twitter-roberta-base-hate-multiclass-latest | not-hate, + 6 subtypes | 13 hate-speech datasets |
| Toxicity | s-nlp/roberta_toxicity_classifier | neutral, toxic | 3 Jigsaw competitions (AUC 0.98) |
CardiffNLP models are pre-trained on 124M tweets. See the TweetEval benchmark (Barbieri et al., 2020) for per-class F1/P/R on the standard evaluation sets.
Custom-Trained (SetFit, all-mpnet-base-v2 backbone, ~109M params each)
Trained on 4,121 LLM-labeled tweets from 14 accounts (7 Democrat, 7 Republican). Evaluated on 20% held-out test set.
| Head | F1 | Precision | Recall | Training Examples | Description |
|---|---|---|---|---|---|
| Ragebait | 0.800 | 0.82 | 0.78 | 300 (150+150) | Content designed to provoke outrage |
| Tribal signal | 0.825 | 0.84 | 0.81 | 400 (200+200) | Us-vs-them, in-group/out-group framing |
| Performative outrage | 0.850 | 0.87 | 0.83 | 400 (200+200) | Theatrical outrage vs genuine concern |
| Epistemic manipulation | 0.800 | 0.81 | 0.79 | 300 (150+150) | Cherry-picking, straw-manning, false equiv. |
| Engagement bait | 0.800 | 0.83 | 0.77 | 400 (200+200) | Polls, CTAs, rhetorical questions |
| Agency language | 0.838 | 0.85 | 0.83 | 400 (200+200) | Active/agentic (1) vs passive/victimhood (0) |
Toxicity Index Components
The per-tweet Toxicity Index is computed as:
TI = mean(flag_offensive, flag_toxic, flag_negative_sentiment,
flag_anger, flag_irony, flag_ragebait, flag_tribal,
flag_performative)
Where each flag is binary (0 or 1) based on the corresponding classifier threshold. TI_senator = mean(TI) across all tweets in the archive.
Compulsion Signature Features
| Feature | Coefficient | Description |
|---|---|---|
| Time-of-day entropy | +1.258 | Shannon entropy of hourly posting distribution (bits) |
| Hawkes n* | +0.922 | Self-excitation branching ratio |
| Burstiness B | +0.837 | Goh-Barabasi inter-event time parameter |
| Night intensity | +0.584 | Share of posts 00:00-05:59 UTC |
| Weekend ratio | +0.204 | Weekend/weekday posting rate ratio |
Theoretical Framework
Inspired by Recovery Viability Theory (Kepner, White, & O'Neill, 2026):
- Logit-bounded state space for natural [0,1] constraints
- Cusp catastrophe dynamics for sudden behavioral transitions
- Critical slowing down as early warning signals
Files
bayesian_model_results.json- Fitted model parameterscalibrated_model_v2.json- V2 validation with independent ground truthcohort_v2_results.csv- 32-account ground truth cohortcohort_signatures.csv- Ground truth compulsion signaturessetfit_*/- Trained SetFit classifier checkpoints (6 models)xbox/- Pipeline source code
Citation
O'Neill, J., Cabanillas, J., Brooks, J., et al. (2026). Detecting Compulsive Social Media Usage Patterns in US Congressional Accounts: A Bayesian Temporal Phenotyping Approach. Manuscript in preparation for International Journal of Drug Policy.
Ethics
This methodology cannot and should not be used for clinical diagnosis. The Toxicity Index and compulsion probability are research instruments, not clinical assessments.