Update README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,44 @@ language:
|
|
| 5 |
- en
|
| 6 |
tags:
|
| 7 |
- depression
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- en
|
| 6 |
tags:
|
| 7 |
- depression
|
| 8 |
+
- medical
|
| 9 |
+
base_model:
|
| 10 |
+
- rafalposwiata/deproberta-large-depression
|
| 11 |
+
pipeline_tag: text-classification
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# MentalBERTa
|
| 15 |
+
|
| 16 |
+
This model, `MentalBERTa`, was developed by the DeepLearningBrasil team and secured the first position in the [DepSign-LT-EDI@RANLP-2023 shared task](https://arxiv.org/abs/2311.05047).
|
| 17 |
+
The objective of the task was to classify social media texts into three distinct levels of depression: "not depressed," "moderately depressed," and "severely depressed".
|
| 18 |
+
The accompanying code is available on [GitHub](https://github.com/eduagarcia/depsign-2023-ranlp).
|
| 19 |
+
|
| 20 |
+
## Model Description
|
| 21 |
+
|
| 22 |
+
`MentalBERTa` is a `RoBERTa` large model [from rafalposwiata/deproberta-large-depression](https://huggingface.co/rafalposwiata/deproberta-large-depression), pre-trained on a curated Reddit dataset from mental health-related communities.
|
| 23 |
+
This pre-training allows for an enhanced understanding of nuanced mental health discourse
|
| 24 |
+
|
| 25 |
+
The best performing version of the model was trained with Loss Sample Weights and a 50% head + 50% tail truncation method.
|
| 26 |
+
|
| 27 |
+
## Training Data
|
| 28 |
+
|
| 29 |
+
The model was pre-trained on a custom dataset collected from mental health-related Subreddits, which is available on Hugging Face at [dlb/mentalreddit](https://huggingface.co/datasets/dlb/mentalreddit).
|
| 30 |
+
The full pre-training dataset comprises 3.4 million comments from mental health-related subreddits and 3.2 million comments from other subreddites, occupying approximately 1.4 GB of text on disk.
|
| 31 |
+
|
| 32 |
+
### Citation
|
| 33 |
+
```bibtex
|
| 34 |
+
@inproceedings{garcia-etal-2023-deeplearningbrasil,
|
| 35 |
+
title = "{D}eep{L}earning{B}rasil@{LT}-{EDI}-2023: Exploring Deep Learning Techniques for Detecting Depression in Social Media Text",
|
| 36 |
+
author = "Garcia, Eduardo and
|
| 37 |
+
Gomes, Juliana and
|
| 38 |
+
Barbosa Junior, Adalberto and
|
| 39 |
+
Borges, Cardeque and
|
| 40 |
+
da Silva, N{\'a}dia",
|
| 41 |
+
booktitle = "Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion",
|
| 42 |
+
month = sep,
|
| 43 |
+
year = "2023",
|
| 44 |
+
address = "Varna, Bulgaria",
|
| 45 |
+
publisher = "INCOMA Ltd., Shoumen, Bulgaria",
|
| 46 |
+
url = "https://aclanthology.org/2023.ltedi-1.42",
|
| 47 |
+
pages = "272--278",
|
| 48 |
+
}
|