File size: 2,402 Bytes
7cec964
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# MentalBERT V5 — Source-Aware Multi-Task Classifier

**Architecture:** Dual-head MentalBERT (BertModel base + classification head + auxiliary source head)  
**Dataset:** V5 (6 sources, 8 classes, ~88k samples)  
**Test Accuracy:** 83.23%  |  **F1 Macro:** 0.8381

## Load Pattern

```python
import torch
import torch.nn as nn
import joblib, json
from transformers import BertModel, BertTokenizerFast
from huggingface_hub import hf_hub_download

# 1. Load BertModel base and tokenizer
base = BertModel.from_pretrained('itsLu/mentalbert-v5-source-aware')
tok  = BertTokenizerFast.from_pretrained('itsLu/mentalbert-v5-source-aware')

# 2. Load config
config_path = hf_hub_download('itsLu/mentalbert-v5-source-aware', 'inference_config.json')
with open(config_path) as f:
    cfg = json.load(f)

# 3. Reconstruct classification head
cls_head = nn.Linear(768, cfg['n_classes'])
head_path = hf_hub_download('itsLu/mentalbert-v5-source-aware', 'cls_head.pt')
cls_head.load_state_dict(torch.load(head_path, map_location='cpu'))

# 4. Reconstruct wrapper model
class InferenceModel(nn.Module):
    def __init__(self, bert, head):
        super().__init__()
        self.bert    = bert
        self.dropout = nn.Dropout(0.1)
        self.head    = head
    def forward(self, input_ids, attention_mask):
        out    = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled = out.pooler_output
        return self.head(self.dropout(pooled))

model = InferenceModel(base, cls_head).eval()

# 5. Inference
le_path = hf_hub_download('itsLu/mentalbert-v5-source-aware', 'label_encoder.joblib')
le = joblib.load(le_path)

def predict(text):
    enc   = tok(text, max_length=128, padding='max_length',
                truncation=True, return_tensors='pt')
    with torch.no_grad():
        logits = model(enc['input_ids'], enc['attention_mask'])
    probs = torch.softmax(logits, dim=1).squeeze().numpy()
    idx   = probs.argmax()
    return le.classes_[idx], float(probs[idx])

label, prob = predict("I can't stop thinking about how worthless I am.")
print(label, f'{prob:.2%}')
```

## Classes
- Anxiety
- Bipolar
- Depression
- Directed Aggression
- Normal
- Personality Disorder
- Stress
- Suicidal

## Source Reliability Weights
| Source | Reliability |
|--------|-------------|
| cssrs | 1.0 |
| olid | 1.0 |
| kaggle_bpd | 0.95 |
| huggingface | 0.7 |
| kaggle | 0.7 |
| swmh | 0.5 |