jimnoneill
/

xbox-compulsion-classifier

@@ -6,43 +6,115 @@ tags:
 - political-tweets
 - bayesian-classifier
 - digital-phenotyping
 pipeline_tag: text-classification
 ---
-# X-Box Compulsion Classifier
-Bayesian classifier for detecting compulsive social media usage patterns
-in political Twitter/X accounts.
 ## Architecture
-- **12 classification heads**: 6 CardiffNLP (sentiment, emotion, offensive, irony, hate, toxicity) + 6 custom SetFit (ragebait, tribal signal, performative outrage, epistemic manipulation, engagement bait, agency language)
-- **Compulsion signatures**: Burstiness (Goh-Barabasi), time-of-day entropy, Hawkes self-excitation, night intensity, weekend ratio
-- **Bayesian posterior**: Calibrated P(compulsive | features) with 95% credible intervals
-- **Disorder baseline**: DSM-5-adjacent criteria mapping with clinical thresholds
 ## Validation
-- LOO cross-validation: F1=1.000, AUC=1.000 on 16-account ground truth cohort
-- Ground truth: 8 known-compulsive accounts (Trump Android, Mike Lee, Cruz, Hawley, Blackburn, Rubio, Murphy) + 8 known-strategic accounts (Feinstein, Risch, Tester, etc.)
-## Feature Importance
-| Feature | Mean |LLR| |
-|---------|---------|
-| Night intensity (00-05 UTC) | 28.1 |
-| Time-of-day entropy | 8.0 |
-| Burstiness B parameter | 4.8 |
-| Hawkes self-excitation n* | 4.6 |
-| Weekend ratio | 0.05 |
 ## Theoretical Framework
-Inspired by Recovery Viability Theory (Kepner, White, O'Neill):
-- Logit-bounded state space
 - Cusp catastrophe dynamics for sudden behavioral transitions
 - Critical slowing down as early warning signals
 ## Citation
-Research by O'Neill Lab. Not for clinical diagnosis.

 - political-tweets
 - bayesian-classifier
 - digital-phenotyping
+- toxicity-index
 pipeline_tag: text-classification
 ---
+# X-Box Compulsion & Toxicity Index Classifier
+Bayesian temporal phenotyping + 12-head text classification pipeline for detecting
+compulsive social media usage patterns and computing the Toxicity Index (TI) for
+political Twitter/X accounts.
 ## Architecture
+**Temporal Model**: Calibrated logistic regression on 5 compulsion signatures
+(burstiness, time-of-day entropy, Hawkes self-excitation, night intensity, weekend ratio).
+**Text Classification**: 12 heads producing the per-tweet Toxicity Index.
+**Toxicity Index**: TI = mean of 8 binary negative-behavior flags per tweet, bounded [0,1].
+TI=0 means a clean informational tweet; TI=1 means every negative flag is active.
 ## Validation
+**Compulsion Model** (n=32, independent ground truth):
+- Spearman r = 0.912 (permutation p=0.001, bootstrap 95% CI [0.845, 0.965])
+- AUC = 0.933 (permutation p=0.003, bootstrap 95% CI [0.928, 1.000])
+- Repeated 5-fold (x20): AUC = 0.953 +/- 0.076
+- Brier score: 0.101
+**Text Classification Label Reliability** (test-retest, n=75):
+- Ragebait: Pearson r=0.889, Cohen kappa=0.479
+- Tribal signal: Pearson r=0.862, Cohen kappa=0.730
+- Performative outrage: Pearson r=0.777, Cohen kappa=0.525
+## Per-Class Performance (12 Classification Heads)
+### Off-the-Shelf (CardiffNLP Twitter-RoBERTa, ~125M params each)
+| Head | Model ID | Classes | Training Data |
+|------|----------|---------|--------------|
+| Sentiment | cardiffnlp/twitter-roberta-base-sentiment-latest | negative, neutral, positive | TweetEval benchmark |
+| Emotion | cardiffnlp/twitter-roberta-base-emotion | anger, joy, optimism, sadness | TweetEval |
+| Offensive | cardiffnlp/twitter-roberta-base-offensive | not-offensive, offensive | TweetEval |
+| Irony | cardiffnlp/twitter-roberta-base-irony | non-irony, irony | TweetEval |
+| Hate | cardiffnlp/twitter-roberta-base-hate-multiclass-latest | not-hate, + 6 subtypes | 13 hate-speech datasets |
+| Toxicity | s-nlp/roberta_toxicity_classifier | neutral, toxic | 3 Jigsaw competitions (AUC 0.98) |
+CardiffNLP models are pre-trained on 124M tweets. See the TweetEval benchmark
+(Barbieri et al., 2020) for per-class F1/P/R on the standard evaluation sets.
+### Custom-Trained (SetFit, all-mpnet-base-v2 backbone, ~109M params each)
+Trained on 4,121 LLM-labeled tweets from 14 accounts (7 Democrat, 7 Republican).
+Evaluated on 20% held-out test set.
+| Head | F1 | Precision | Recall | Training Examples | Description |
+|------|----|-----------|--------|-------------------|-------------|
+| Ragebait | 0.800 | 0.82 | 0.78 | 300 (150+150) | Content designed to provoke outrage |
+| Tribal signal | 0.825 | 0.84 | 0.81 | 400 (200+200) | Us-vs-them, in-group/out-group framing |
+| Performative outrage | 0.850 | 0.87 | 0.83 | 400 (200+200) | Theatrical outrage vs genuine concern |
+| Epistemic manipulation | 0.800 | 0.81 | 0.79 | 300 (150+150) | Cherry-picking, straw-manning, false equiv. |
+| Engagement bait | 0.800 | 0.83 | 0.77 | 400 (200+200) | Polls, CTAs, rhetorical questions |
+| Agency language | 0.838 | 0.85 | 0.83 | 400 (200+200) | Active/agentic (1) vs passive/victimhood (0) |
+### Toxicity Index Components
+The per-tweet Toxicity Index is computed as:
+```
+TI = mean(flag_offensive, flag_toxic, flag_negative_sentiment,
+          flag_anger, flag_irony, flag_ragebait, flag_tribal,
+          flag_performative)
+```
+Where each flag is binary (0 or 1) based on the corresponding classifier threshold.
+TI_senator = mean(TI) across all tweets in the archive.
+## Compulsion Signature Features
+| Feature | Coefficient | Description |
+|---------|------------|-------------|
+| Time-of-day entropy | +1.258 | Shannon entropy of hourly posting distribution (bits) |
+| Hawkes n* | +0.922 | Self-excitation branching ratio |
+| Burstiness B | +0.837 | Goh-Barabasi inter-event time parameter |
+| Night intensity | +0.584 | Share of posts 00:00-05:59 UTC |
+| Weekend ratio | +0.204 | Weekend/weekday posting rate ratio |
 ## Theoretical Framework
+Inspired by Recovery Viability Theory (Kepner, White, & O'Neill, 2026):
+- Logit-bounded state space for natural [0,1] constraints
 - Cusp catastrophe dynamics for sudden behavioral transitions
 - Critical slowing down as early warning signals
+## Files
+- `bayesian_model_results.json` - Fitted model parameters
+- `calibrated_model_v2.json` - V2 validation with independent ground truth
+- `cohort_v2_results.csv` - 32-account ground truth cohort
+- `cohort_signatures.csv` - Ground truth compulsion signatures
+- `setfit_*/` - Trained SetFit classifier checkpoints (6 models)
+- `xbox/` - Pipeline source code
 ## Citation
+O'Neill, J., Brookes, J., et al. (2026). Detecting Compulsive Social Media Usage
+Patterns in US Congressional Accounts: A Bayesian Temporal Phenotyping Approach.
+Manuscript in preparation for International Journal of Drug Policy.
+## Ethics
+This methodology cannot and should not be used for clinical diagnosis.
+The Toxicity Index and compulsion probability are research instruments, not clinical assessments.