driwnet commited on
Commit
3ddd7c4
·
verified ·
1 Parent(s): 0888ad1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - spanish
5
+ - mental-health
6
+ - longformer
7
+ - domain-adaptation
8
+ - nlp
9
+ language:
10
+ - es
11
+ base_model:
12
+ - PlanTL-GOB-ES/longformer-base-4096-bne-es
13
+ ---
14
+
15
+ ## Model Description
16
+
17
+ Longformer-es-mental-base is the base-sized version of the Longformer-es-mental family, a Spanish domain-adapted language model designed for mental health text analysis on long user-generated content.
18
+ The model is intended for scenarios where relevant mental health signals are distributed across multiple messages, such as social media timelines, forum threads, or user message histories.
19
+
20
+ It is based on the Longformer architecture, which extends the standard Transformer attention mechanism to efficiently process long sequences.
21
+ The model supports input sequences of up to 4096 tokens, enabling it to capture long-range dependencies and temporal patterns that are particularly relevant for mental health screening tasks.
22
+
23
+ Longformer-es-mental-base was obtained through domain-adaptive pre-training (DAP) on a large corpus of mental health–related texts translated into Spanish from Reddit communities focused on psychological support and mental health discussions.
24
+ This adaptation allows the model to better capture emotional expression, self-disclosure patterns, and discourse structures characteristic of mental health narratives in Spanish.
25
+
26
+ The model is released as a foundational model and does not include task-specific fine-tuning.
27
+
28
+ - Developed by: ELiRF group, VRAIN (Valencian Research Institute for Artificial Intelligence), Universitat Politècnica de València
29
+ - Funded by: Spanish Agencia Estatal de Investigación (AEI), MCIN/AEI, ERDF
30
+ - Shared by: ELiRF
31
+ - Model type: Transformer-based masked language model (Longformer)
32
+ - Language: Spanish
33
+ - License: Same as base model (PlanTL-GOB-ES models)
34
+ - Finetuned from model: PlanTL-GOB-ES/longformer-base-4096-bne-es
35
+
36
+ ## Uses
37
+
38
+ This model is intended for research purposes in the mental health NLP domain.
39
+
40
+ ### Direct Use
41
+
42
+ The model can be used directly as a language encoder or feature extractor for Spanish mental health–related texts when long input sequences are required and computational efficiency is a concern.
43
+
44
+ ### Downstream Use
45
+
46
+ Longformer-es-mental-base is primarily intended to be fine-tuned for downstream tasks such as:
47
+
48
+ - Mental disorder detection
49
+ - Mental health screening
50
+ - User-level and context-level classification
51
+ - Early risk detection tasks involving long message histories
52
+ - Social media analysis related to psychological well-being
53
+
54
+ ### Out-of-Scope Use
55
+
56
+ - Real-time intervention systems without human supervision
57
+ - Use on languages other than Spanish
58
+ - High-stakes decision-making affecting individuals’ health or safety
59
+
60
+ ## Bias, Risks, and Limitations
61
+
62
+ - Training data originates from social media platforms, which may introduce demographic, cultural, and linguistic biases.
63
+ - All texts were automatically translated into Spanish, potentially introducing translation artifacts or subtle semantic shifts.
64
+ - Mental health language is highly contextual and subjective; predictions may be unreliable when very limited evidence is available.
65
+ - The model does not provide explanations or clinical interpretations of its outputs.
66
+
67
+ ## How to Get Started with the Model
68
+
69
+ ```python
70
+ from transformers import AutoTokenizer, AutoModel
71
+
72
+ tokenizer = AutoTokenizer.from_pretrained("ELiRF/Longformer-es-mental-base")
73
+ model = AutoModel.from_pretrained("ELiRF/Longformer-es-mental-base")
74
+
75
+ inputs = tokenizer(
76
+ "Ejemplo de texto relacionado con salud mental.",
77
+ return_tensors="pt",
78
+ truncation=True,
79
+ max_length=4096
80
+ )
81
+
82
+ outputs = model(**inputs)
83
+ ```
84
+
85
+ ## Training Details
86
+
87
+ ### Training Data
88
+
89
+ The model was domain-adapted using a merged corpus composed of:
90
+
91
+ - Reddit SuicideWatch and Mental Health Collection (SWMH)
92
+ - Reddit Mental Health Narratives (RMHN)
93
+
94
+ All texts were automatically translated into Spanish using neural machine translation.
95
+ The resulting dataset contains approximately 1.9 million posts from multiple mental health–related communities (e.g., depression, anxiety, suicide ideation, loneliness), providing broad coverage of informal mental health discourse.
96
+
97
+ ### Training Procedure
98
+
99
+ The model was trained using domain-adaptive pre-training (DAP) with a masked language modeling objective.
100
+
101
+ - Training regime: fp16 mixed precision
102
+ - Number of epochs: 20
103
+ - Hardware: multiple NVIDIA A40 GPUs
104
+ - Training duration: approximately 4 days
105
+
106
+ No task-specific fine-tuning is included in this checkpoint.
107
+
108
+ ## Evaluation
109
+
110
+ ### Results
111
+
112
+ When fine-tuned on Spanish mental health benchmarks, Longformer-es-mental-base shows competitive performance.
113
+
114
+ ## Technical Specifications
115
+
116
+ ### Model Architecture and Objective
117
+
118
+ - Architecture: Longformer
119
+ - Objective: Masked Language Modeling
120
+ - Model size: approximately 150M parameters (base version)
121
+
122
+ ## Citation
123
+
124
+ This model is part of an ongoing research project.
125
+ The associated paper is currently under review and will be added to this model card once the publication process is completed.
126
+
127
+ ## Model Card Authors
128
+
129
+ ELiRF research group (VRAIN, Universitat Politècnica de València)