same_graph_test_bothmasked.json & same_graph_train_bothmasked.json — Full combined analysis
The dataset is class-imbalanced at roughly 1:2 (IP:OP). This ratio is stable across both train and test splits, suggesting a stratified partition strategy was used.
| Metric | IP (n=2,609) | OP (n=5,019) |
|---|---|---|
| Mean words | 438.1 | 330.3 |
| Median words | 355 | 292 |
| Std words | 318.0 | 203.3 |
| Mean sentences | 58.5 | 43.0 |
| Mean chars | 3,053 | 2,281 |
| Max words (note) | 2,609 | 2,501 |
| Avg severity score | 8.03 | 6.27 |
| Score ≥ 10 (%) | 33.4% | 19.1% |
| Score ≥ 15 (%) | 12.0% | 4.4% |
| Avg Treatment_decision tokens | 5.42 | 3.16 |
| Metric | Train | Test |
|---|---|---|
| Nodes | 4,988 | 2,640 |
| Edges | 120,492 | 69,600 |
| IP % | 33.6% | 35.3% |
| OP % | 66.4% | 64.7% |
| Mean words | 338.5 | 421.3 |
| Median words | 291 | 348 |
| Mean sentences | 44.1 | 56.2 |
| Mean chars | 2,325 | 2,961 |
Test set notes are notably longer on average (421 vs 339 words). This may reflect more complex multi-visit collations, and could affect model behaviour at inference time.
IP notes are consistently longer, denser, and higher-severity than OP notes. This is clinically expected: inpatient cases involve more events (admission, detox, stabilisation), multiple clinicians, and higher acuity — all reflected in longer collated notes.
Ratio > 1.0 means more prevalent in IP. Features sorted by ratio descending.
Paranoia (1.78×), memory issues (1.67×), hallucinations (1.57×), nausea/vomiting (1.44×), and irritability (1.30×) are the strongest IP-associated symptom patterns. These reflect acute psychiatric and neurological complications requiring inpatient management.
An estimated 10–13% of notes may constitute "borderline" cases — IP notes with minimal clinical documentation or OP notes with complex, high-severity presentations. These represent real-world label ambiguity and will be the hardest cases for any classifier.
At severity score = 8 (IP median), both classes overlap heavily:
| Score threshold | IP above (%) | OP above (%) |
|---|---|---|
| ≥ 5 | 71.9% | 61.3% |
| ≥ 10 | 33.4% | 19.1% |
| ≥ 15 | 12.0% | 4.4% |
| ≥ 20 | 3.6% | 0.7% |
| Substance | All | IP% | OP% | IP ratio |
|---|---|---|---|---|
| Alcohol | 7,581 | 99.0% | 99.6% | 0.99× |
| Tobacco/nicotine | 6,467 | 83.7% | 85.3% | 0.98× |
| Benzodiazepines | 824 | 13.9% | 9.2% | 1.51× |
| Cannabis | 623 | 13.3% | 5.5% | 2.42× |
| Stimulants | 290 | 6.1% | 2.6% | 2.35× |
| Opioids | 205 | 5.9% | 1.0% | 5.90× |
| Sedatives | 89 | 2.1% | 0.7% | 3.00× |
| Inhalants | 54 | 1.3% | 0.4% | 3.25× |
Opioid mention is the single strongest individual substance predictor of IP admission (5.9× more common in IP). Cannabis (2.4×), stimulants (2.4×), inhalants (3.3×), and sedatives (3.0×) also strongly discriminate. Alcohol and tobacco are near-universal and thus uninformative for classification.
| Metric | IP (n=495) | OP (n=842) |
|---|---|---|
| Mean duration | 93.2 months | 126.2 months |
| Median duration | 48 months (4y) | 96 months (8y) |
OP patients show longer documented durations of use (median 8y vs 4y for IP). This likely reflects that OP notes accumulate more longitudinal history, while IP notes focus on acute presentation. Duration alone is not a reliable IP predictor.
| Pair | Count |
|---|---|
| Alcohol + Tobacco | 6,462 |
| Alcohol + Benzodiazepines | 819 |
| Benzodiazepines + Tobacco | 716 |
| Alcohol + Cannabis | 622 |
| Cannabis + Tobacco | 605 |
| Alcohol + Stimulants | 289 |
| Stimulants + Tobacco | 262 |
| Alcohol + Opioids | 204 |
| Opioids + Tobacco | 192 |
| Cannabis + Stimulants | 136 |
IP patients show significantly higher rates of 4+ substance co-use: 8.5% of IP vs 2.7% of OP have 4 or more substances mentioned. This polysubstance pattern is a strong predictor of admission complexity.
| Symptom pair | Count |
|---|---|
| Craving + Withdrawal | 5,459 |
| Tremors + Withdrawal | 4,001 |
| Craving + Tremors | 3,758 |
| Sleep disturbance + Withdrawal | 2,687 |
| Seizures + Withdrawal | 2,677 |
| Craving + Sleep disturbance | 2,598 |
| Craving + Seizures | 2,407 |
| Sleep disturbance + Tremors | 2,094 |
| Seizures + Tremors | 1,891 |
| Irritability + Withdrawal | 1,829 |
| Craving + Irritability | 1,784 |
| Anxiety + Withdrawal | 1,561 |
| Anxiety + Craving | 1,534 |
| Depression + Withdrawal | 1,282 |
Most predictive (IP-enriched): Paranoia (1.78×), memory/blackout issues (1.67×), auditory/visual hallucinations (1.57×), and nausea/vomiting (1.44×) are the strongest individual symptom predictors of inpatient admission.
Near-universal (non-discriminating): Withdrawal (85%), craving (79%), and tremors (58%) are so prevalent across both classes that they add little discriminative signal on their own. Their combinations matter more.
| Pattern | IP | OP |
|---|---|---|
| Notes with 0 relapses | 47.7% | 53.0% |
| Notes with 5+ relapses | 22.0% | 13.1% |
| Avg abstinence interval | 6.1 days | 6.0 days |
| Abstinence mentions (n) | 1,195 | 1,404 |
| Full sequence notes | ~13% | ~13% |
IP patients show 54% higher frequency of multiple relapse mentions (5+), consistent with more severe, cyclical SUD patterns requiring inpatient intervention.
The abstinence → relapse → detox → follow-up cycle is the dominant clinical trajectory in the dataset. Notes encoding the full cycle tend to be significantly longer (avg ~500+ words) and are more common in complex multi-visit IP cases.
| Feature | IP% | OP% | Ratio |
|---|---|---|---|
| Social withdrawal | 4.2% | 2.1% | 2.00× |
| Delusional thinking | 12.0% | 7.2% | 1.68× |
| Socio-occupational dysfunction | 49.2% | 40.4% | 1.22× |
| Violence/aggression | 10.3% | 8.6% | 1.20× |
| Legal issues | 6.1% | 5.2% | 1.17× |
| Family discord | 32.3% | 34.2% | 0.94× |
| Use despite harm | 20.4% | 22.1% | 0.92× |
| Loss of control | 42.6% | 48.6% | 0.88× |
| Tolerance | 53.4% | 61.3% | 0.87× |
Interestingly, standard dependence markers like "loss of control" and "tolerance" are more common in OP notes — possibly because OP clinicians document them more thoroughly in structured assessments, while IP notes focus on acute management.
Based on feature ratios alone, a rule-based classifier would achieve moderate performance. The strongest single predictor combinations are: opioid mention + hallucinations + paranoia + high relapse count. Lexical features alone likely achieve 65–72% accuracy; the graph structure (similarity edges) provides the key additional signal for GNN-based models.
| Metric | IP | OP |
|---|---|---|
| Avg masked tokens / note | 5.50 | 4.03 |
| Total masked tokens | 14,360 | 20,226 |
| Notes with any masking | 93.4% | 92.1% |
| Treatment_decision tokens / note | 5.42 | 3.16 |
| Notes with Treatment_decision | 90.8% | 81.9% |
Person, address, company, and date masking successfully removes patient/clinician identifiers — preventing memorisation of specific individuals or institutions as IP/OP labels.
Leakage risk: Treatment_decision tokens (e.g. Treatment_decision1–10) are 71% more frequent in IP notes on average. A model can trivially detect this pattern as a proxy for admission severity.
Covered (low leakage risk):
Not fully masked (residual leakage risk):
The masking strategy is robust for PII/PHI removal. However, structural leakage remains: IP notes are longer, contain more Treatment_decision placeholders, and use admission-specific vocabulary (discharge, ward, detox unit). A model can learn these structural cues even without entity names. To fully eliminate leakage, Treatment_decision tokens should be masked uniformly, and note length normalisation should be considered.
| Edge type | Count | % |
|---|---|---|
| OP — OP | 105,847 | 55.7% |
| IP — OP (cross) | 71,141 | 37.4% |
| IP — IP | 13,104 | 6.9% |
37.4% of edges are cross-label (IP ↔ OP). This high cross-label similarity is expected given that all patients are SUD cases with overlapping symptom language, but creates a challenging homophily situation for GNN classifiers.