File size: 8,499 Bytes
88e53c2
 
 
 
 
 
 
 
147ef09
 
 
 
88e53c2
fe33299
 
 
 
 
 
 
 
 
 
 
88e53c2
 
 
 
 
147ef09
 
 
 
88e53c2
 
 
147ef09
 
 
 
88e53c2
 
 
147ef09
88e53c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a54a086
88e53c2
 
 
 
 
 
 
 
aaec136
5bfc616
88e53c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147ef09
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe33299
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44f73b6
 
 
 
 
 
 
 
88e53c2
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
license: cc-by-nc-4.0
tags:
- mental-health
- social-media
- life-events
---


# 🧠 PsyEvent: Life Event Recognition System

## πŸ“– Model Overview

**What is PsyEvent?**

**PsyEvent** is a specialized NLP tool designed to extract and analyze major life events from unstructured social media text. Unlike general sentiment analysis, it focuses on identifying specific, objective occurrences (e.g., career, health) that significantly impact mental health trajectories (see Figure 1).

<!-- ![](./assets/example_post.jpg) -->
<div align="center">
    <img src="./assets/example_post.jpg" width="600" alt="Model Architecture" />
    <em>Figure 1: User post example.</em>
</div>

This repository contains the **PsyEvent** models described in the paper **["Tracking Life's Ups and Downs: Mining Life Events from Social Media Posts for Mental Health Analysis"](https://aclanthology.org/2025.acl-long.345/)** (ACL 2025).

The system consists of two distinct models housed in this repository:
1.  **Life Events Detection (`LE_detection`)**: A multi-label classifier that identifies 12 categories of life events from social media posts.
2.  **Self-Status Determination (`Self-status_determination`)**: A binary classifier that determines whether the detected life event is currently being experienced by the user themselves (Self) or someone else.

### Architecture
Both models are based on **BERT-large** (340M parameters) with a custom classification head.

## πŸ“‚ Repository Structure

This repository uses **subfolders** to store the weights for each model. You must specify the `subfolder` argument when loading.

| Subfolder | Task Description | Type |
| :--- | :--- | :--- |
| `LE_detection/` | Detects **which** life events are present. | Multi-label Classification |
| `Self-status_determination/` | Detects **who** is experiencing the event. | Binary Classification |

Both models share the same architecture (`BERTDiseaseClassifier`) defined in `model.py`.

## πŸš€ Quick Start (Copy & Run)

Since these models use a custom architecture (BERT + Linear Head on `[CLS]` token without pooling), **you must define or import the model class locally** before loading the weights.

### 1. Installation

```bash
pip install transformers torch huggingface_hub
```
### 2. Define the Model Architecture

You can download the model.py file from this repository, or simply define the class in your code as shown below:
```python
import torch
from torch import nn
from transformers import AutoModel, AutoConfig, AutoTokenizer

class BERTDiseaseClassifier(nn.Module):
    def __init__(self, model_type, num_symps) -> None:
        super().__init__()
        self.model_type = model_type
        self.num_symps = num_symps
        self.encoder = AutoModel.from_pretrained(model_type)
        self.dropout = nn.Dropout(self.encoder.config.hidden_dropout_prob)
        self.clf = nn.Linear(self.encoder.config.hidden_size, num_symps)
    
    def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
        outputs = self.encoder(input_ids, attention_mask, token_type_ids)
        x = outputs.last_hidden_state[:, 0, :]   # [CLS] pooling
        x = self.dropout(x)
        logits = self.clf(x)
        return logits
```

### 3. Load the Models
Use the subfolder parameter to select which model you want to load.
```python
import torch
from transformers import AutoConfig, AutoTokenizer
from huggingface_hub import hf_hub_download
# from model import BERTDiseaseClassifier

repo_id = "shallowblueQAQ/PsyEvent-model"
subfolder = "LE_detection"
# subfolder = "Self-status_determination"

# 1. Load Config & Tokenizer
config = AutoConfig.from_pretrained(repo_id, subfolder=subfolder)
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)

# 2. Initialize Model Architecture
# NOTE: If you are running offline, you can replace "bert-large-uncased" with your local path (e.g., "/path/to/bert-large-uncased").
model = BERTDiseaseClassifier(model_type="bert-large-uncased", num_symps=len(config.id2label))

# 3. Load Weights
weights_path = hf_hub_download(repo_id=repo_id, subfolder=subfolder, filename="pytorch_model.bin")
model.load_state_dict(torch.load(weights_path, map_location="cpu"))
model.eval()

# 4. Inference
text = "I lost my job yesterday and I feel terrible."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs)
    probs = torch.sigmoid(logits)

# Display Predictions (Multi-label)
threshold = 0.5
for i, prob in enumerate(probs[0]):
    if prob > threshold:
        print(f"Detected: {config.id2label[i]} ({prob:.4f})")
```

## πŸ“Š Dataset & Categories
The model was trained on PsyEvent, a dataset of 7,965 annotated sentences derived from SMHD. It covers 12 major life event categories:

| Life Event Categories | Representative Examples (from paper Appendix D) |
| :--- | :--- |
| `πŸ₯ Health` | personal injury , accident or illness; became disabled; mental illenss. |
| `πŸ’° Financial` | loan; home purchase; car purchase; other major purchase. |
| `🏠 Relocation` | move to a different town/city; move out of parent's home; lost home / became homeless; major travel. |
| `πŸ’Ό Career`| started a new job; promotion; voluntary/involuntary job loss; retirement. |
| `πŸŽ“ Education`| begin or end school/college; change in school/college; left school (without graduating). |
| `πŸ’” Relationship Change`| began/ended serious romantic relationship; marriage; divorce; serious argument. |
| `πŸ•―οΈ Death`| death of spouse/child/parent/friend/pet. |
| `πŸ‘Ά New Birth`| gave birth / became a parent; adopted a child; became a grandparent. |
| `βš–οΈ Legal`| got arrested; lawsuit or legal action; went to jail or prison; released from jail or prison. |
| `🌈 Identity`| came out as LGBTQ+; gender transition; change in political/religious/spiritual beliefs. |
| `🌱 Lifestyle Change`| change in physical habits; new pet; joined the military; vacation. |
| `🌍 Societal`| natural disaster; war; major political event that had personal impact. |

## Performance (AUC)
LE detection model performance on each life event category:
| Life Event Categories | AUC(%) |
| :--- | :--- |
| Health | 92.1 |
| Financial | 95.7 |
| Relocation | 97.7 |
| Legal | 96.1 |
| Relationship Change | 95.0 |
| New Birth | 92.6 |
| Death | 99.7 |
| Career | 93.5 |
| Education | 99.2 |
| Lifestyle Change | 87.9 |
| Identity | 95.5 |
| Societal | 97.4 |
| Avg. | 95.2 |


## ⚠️ Ethical Considerations & Limitations

**1. No Clinical Diagnosis:**
This model is designed for **research purposes only**. It is not a clinical diagnostic tool and should not be used as a substitute for professional medical advice, diagnosis, or treatment.

**2. No Automated Decision Making:**
The model must **not** be used for automated decision-making in high-stakes scenarios, including but not limited to:
  - Employment screening or hiring decisions.
  - Insurance eligibility or claims processing.
  - Legal assessment or administrative decision-making regarding individuals.

**3. Bias & Errors:**
Like all models trained on social media data, this model may reflect biases present in the training corpus. It may generate false positives or misinterpret metaphorical language. Users should critically evaluate the model's outputs.

## Data Availability & Privacy Statement

This model was trained on **PsyEvent**, a subset of the **[SMHD (Self-reported Mental Health Diagnoses)](https://aclanthology.org/C18-1126/)** dataset. 

**Due to the strict Data Usage Agreement of SMHD, we are prohibited from publishing or sharing any proportion of the original dataset (including our annotated subset).** Researchers interested in reproducing this work or using the data must apply for access directly from the original creators of [SMHD (Cohan et al., 2018)](https://ir.cs.georgetown.edu/resources/). We only provide the model weights and inference code here.



### Citation
If you use this model or dataset, please cite our paper:
```bibtex
@inproceedings{lv2025tracking,
  title={Tracking life’s ups and downs: Mining life events from social media posts for mental health analysis},
  author={Lv, Minghao and Chen, Siyuan and Jin, Haoan and Yuan, Minghao and Ju, Qianqian and Peng, Yujia and Zhu, Kenny and Wu, Mengyue},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={6950--6965},
  year={2025}
}
```