MADRS-BERT / README.md

Update README.md

3046351 verified about 2 months ago

4.45 kB

	---
	license: cc-by-nc-4.0
	language:
	- de
	base_model:
	- google-bert/bert-base-german-cased
	pipeline_tag: text-classification
	tags:
	- depression
	- mental-health
	- MADRS
	- clinical
	- interview
	---


	# MADRS-BERT

	MADRS-BERT is a fine-tuned `bert-base-german-cased` model that predicts depression severity scores (0–6) across individual items of the [Montgomery-Åsberg Depression Rating Scale (MADRS)](https://en.wikipedia.org/wiki/MADRS). Each prediction is based on transcribed, structured clinician–patient interview segments.

	- Publication: [https://www.nature.com/articles/s41746-025-01982-8#Sec8](https://www.nature.com/articles/s41746-025-01982-8#Sec8)
	- Example dataset: [https://github.com/webersamantha/MADRS-BERT/data](https://github.com/webersamantha/MADRS-BERT/data)
	- Github Repo: The code for data curation, finetuning and evaluation is shared in the following github repo: [https://github.com/webersamantha/MADRS-BERT](https://github.com/webersamantha/MADRS-BERT)

	This model was developed to support standardized, scalable mental health assessments in both clinical and low-resource settings.


	## Model Details

	- Base model: `bert-base-german-cased`
	- Task: Ordinal regression (scores 0–6)
	- Language: German
	- Input: Text (dialogue segment grouped by MADRS topic)
	- Output: Predicted score for each MADRS item (rounded integer 0–6)
	- Training data: Mix of real and synthetic clinician–patient interviews (MADRS-structured)


	## Intended Use

	This model is intended for research and development use. It is not a certified medical device. The goal is to:
	- Explore AI-assisted symptom severity assessment
	- Enable structured evaluation of individual MADRS items
	- Support clinicians or researchers working in psychiatry/mental health

	---

	## 🚀 How to Use

	### Preprocess Data File:

	Please organize your data equivalent to the example data (synthetic data) with columns: Subject, Speaker, Transcription, Topic, Score.

	```python

	import pandas as pd

	def load_and_prepare_conversations(filepath):
	df = pd.read_excel(filepath)
	conversations = []

	for topic in df['Topic'].unique():
	topic_df = df[df['Topic'] == topic]
	if topic_df.empty: continue

	dialogue = "\n".join([
	f"{row['Speaker']}: {row['Transcription']}"
	for _, row in topic_df.iterrows()
	if pd.notnull(row['Transcription'])
	])

	conversations.append((topic, dialogue))
	return conversations

	```

	### Load model and tokenizer:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "webesama/MADRS-BERT"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	model.eval().to("cuda" if torch.cuda.is_available() else "cpu")
	```

	### Predict on a full structured interview / Run inference:
	Assume you have a conversation log like this:

	```python
	def predict_madrs_scores(conversations, tokenizer, model):
	device = model.device
	predictions = {}

	for topic, dialogue in conversations:
	inputs = tokenizer(dialogue, truncation=True, padding="max_length", max_length=512, return_tensors="pt").to(device)
	with torch.no_grad():
	score = torch.round(model(**inputs).logits).clamp(0, 6).item()
	predictions[topic] = score

	return predictions

	file_path = "example_interview.xlsx"
	conversations = load_and_prepare_conversations(file_path)
	scores = predict_madrs_scores(conversations, tokenizer, model)
	print(scores)

	```

	---

	## Acknowledgements

	Model trained and released by [Samantha Weber](https://github.com/webersamantha) within the framework of the [Multicast Project on predicting and treating suicidality](https://www.multicast.uzh.ch/en.html). Research conducted as part of efforts to improve AI-driven mental health tools. Thanks to all clinicians and collaborators who contributed to the annotated MADRS dataset.


	## Evaluation

	The model was evaluated on a held-out clinical validation set and achieved strong performance under both strict and flexible scoring criteria (±1 deviation tolerance). See publication for full metrics.


	## Citation

	If you use this model, please cite:
	> Weber, S. et al. (2025). "Using a Fine-tuned Large Language Model for Symptom-based Depression Evaluation" (DOI: https://doi.org/10.1038/s41746-025-01982-8)