webesama
/

MADRS-BERT

@@ -24,4 +24,112 @@ tags:
 This model was developed to support standardized, scalable mental health assessments in both clinical and low-resource settings.
----

 This model was developed to support standardized, scalable mental health assessments in both clinical and low-resource settings.
+---
+## 🧠 Model Details
+- **Base model**: `bert-base-german-cased`
+- **Task**: Ordinal regression/classification (scores 0–6)
+- **Language**: German 🇩🇪
+- **Input**: Text (dialogue segment grouped by MADRS topic)
+- **Output**: Predicted score for each MADRS item (rounded integer 0–6)
+- **Training data**: Mix of real and synthetic clinician–patient interviews (MADRS-structured)
+---
+## 💡 Intended Use
+This model is intended for research and development use. It is not a certified medical device. The goal is to:
+- Explore AI-assisted symptom severity assessment
+- Enable structured evaluation of individual MADRS items
+- Support clinicians or researchers working in psychiatry/mental health
+---
+## 🚀 How to Use
+### Load model and tokenizer:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_name = "webersamantha/MADRS-BERT"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+model.eval().to("cuda" if torch.cuda.is_available() else "cpu")
+```
+### 📝 Predict on a full structured interview:
+Assume you have a conversation log like this:
+```python
+conversation_log = [
+    {"Speaker": "Interviewer", "Content": "Wie war Ihr Appetit?", "Topic": "Appetit"},
+    {"Speaker": "Patient", "Content": "Ich hatte guten Appetit.", "Topic": "Appetit"},
+    {"Speaker": "Interviewer", "Content": "Wie war Ihr Schlaf?", "Topic": "Schlaf"},
+    {"Speaker": "Patient", "Content": "Ich konnte gut schlafen.", "Topic": "Schlaf"},
+    # etc.
+]
+topics = ["Traurigkeit", "Anspannung", "Schlaf", "Appetit", "Konzentration", "Antriebslosigkeit", "Gefühlslosigkeit", "Gedanken", "Suizid"]
+```
+Use the prediction function:
+```python
+def predict_scores_per_topic(conversation_log, topics, tokenizer, model):
+    device = model.device
+    predictions = {}
+    for topic in topics:
+        topic_dialogue = "\n".join(
+            [f"{entry['Speaker']}: {entry['Content']}" for entry in conversation_log if entry["Topic"] == topic]
+        )
+        if not topic_dialogue:
+            predictions[topic] = None
+            continue
+        inputs = tokenizer(topic_dialogue, truncation=True, padding="max_length", max_length=512, return_tensors="pt").to(device)
+        with torch.no_grad():
+            score = torch.round(model(**inputs).logits).clamp(0, 6).item()
+        predictions[topic] = score
+    return predictions
+```
+---
+## 🧹 Preprocessing Custom Data
+If you want to prepare your own data (e.g., from JSONL with structure: `User ID`, `Speaker`, `Transcription`, `Topic`, `Score`), use the preprocessing below:
+```python
+from datasets import load_dataset
+dataset = load_dataset("json", data_files="your_data.jsonl", split="train")
+def preprocess_function(examples):
+    scores = [int(float(output.split(":")[1].strip())) for output in examples['output']]
+    topics = [
+        input_text.split("\n")[0].replace("Topic: ", "").strip()
+        if "Topic:" in input_text else "Unknown"
+        for input_text in examples['input']
+    ]
+    encoded = tokenizer(examples['input'], truncation=True, padding="max_length", max_length=512)
+    encoded["labels"] = scores
+    encoded["Topic"] = topics
+    return encoded
+tokenized_dataset = dataset.map(preprocess_function, batched=True)
+```
+---
+## 🙏 Acknowledgements
+Model trained and released by [Samantha Weber](https://github.com/webersamantha). Research conducted as part of efforts to improve AI-driven mental health tools. Thanks to all clinicians and collaborators who contributed to the annotated MADRS dataset.
+---
+## 🧪 Citation
+If you use this model, please cite:
+> Weber, S. et al. (2025). "Using a Fine-tuned Large Language Model for Symptom-based Depression Evaluation" *Preprint*. https://doi.org/10.21203/rs.3.rs-6555767/v1