|
|
--- |
|
|
tags: |
|
|
- text-classification |
|
|
- mental-health |
|
|
- transformers |
|
|
- pytorch |
|
|
- huggingface |
|
|
--- |
|
|
|
|
|
# π§ mindBERT - Mental Health Text Classification |
|
|
|
|
|
 |
|
|
|
|
|
## π Model Description |
|
|
**mindBERT** is a fine-tuned **BERT-based** model designed for **mental health text classification**. It can classify text into **stress, depression, bipolar disorder, personality disorder, and anxiety** with high accuracy. The model was trained on **real-world mental health discussions** from Reddit. |
|
|
|
|
|
π **Try the Interactive UI**: [Hugging Face Spaces](https://huggingface.co/spaces/DrSyedFaizan/mindBERT) |
|
|
|
|
|
--- |
|
|
|
|
|
## π Training and Evaluation |
|
|
|
|
|
### **Training Loss & Learning Rate** |
|
|
 |
|
|
|
|
|
### **Training Summary** |
|
|
| Epoch | Training Loss | Validation Loss | Accuracy | |
|
|
|-------|--------------|----------------|----------| |
|
|
| 1 | 0.359400 | 0.285864 | 89.61% | |
|
|
| 2 | 0.210500 | 0.224632 | 92.03% | |
|
|
| 3 | 0.177800 | 0.217146 | 92.83% | |
|
|
| 4 | 0.089200 | 0.249640 | 93.23% | |
|
|
| 5 | 0.087600 | 0.282782 | 93.39% | |
|
|
|
|
|
### **Confusion Matrix** |
|
|
 |
|
|
|
|
|
### **Dataset Label Distribution** |
|
|
 |
|
|
|
|
|
### **Evaluation Metrics (Loss & Accuracy)** |
|
|
 |
|
|
|
|
|
### **π¬ Full Weights & Biases Evaluation** |
|
|
[π View Detailed W&B Logs](https://wandb.ai/drsyedfaizan1987-northeastern-university/huggingface/runs/f3w7nhbd?nw=nwuserdrsyedfaizan1987) |
|
|
|
|
|
--- |
|
|
|
|
|
## π How to Use |
|
|
To use this model for inference: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_name = "DrSyedFaizan/mindBERT" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
text = "I feel so anxious and stressed all the time." |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
prediction = torch.argmax(logits, dim=1).item() |
|
|
|
|
|
labels = ["Stress", "Depression", "Bipolar", "Personality Disorder", "Anxiety"] |
|
|
print(f"Predicted Category: {labels[prediction]}") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Training Parameters |
|
|
```python |
|
|
training_args = TrainingArguments( |
|
|
output_dir="./results", # Output directory |
|
|
evaluation_strategy="epoch", # Evaluate once per epoch |
|
|
save_strategy="epoch", # Save at each epoch |
|
|
learning_rate=2e-5, # Learning rate |
|
|
per_device_train_batch_size=16, # Training batch size |
|
|
per_device_eval_batch_size=16, # Evaluation batch size |
|
|
num_train_epochs=5, # Training epochs |
|
|
weight_decay=0.01, # Weight decay |
|
|
logging_steps=10, # Logging frequency |
|
|
lr_scheduler_type="linear", # Learning rate scheduler |
|
|
warmup_steps=500, # Warmup steps |
|
|
load_best_model_at_end=True, # Load best model after training |
|
|
metric_for_best_model="eval_loss", |
|
|
save_total_limit=3, # Save up to 3 checkpoints |
|
|
gradient_accumulation_steps=2, # Larger batch size simulation |
|
|
report_to="wandb" # Log to Weights & Biases |
|
|
) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Future Improvements |
|
|
- Train with larger datasets like **CLPsych, eRisk**. |
|
|
- Expand categories for broader **mental health classification**. |
|
|
- Deploy as an **API** for real-world use cases. |
|
|
|
|
|
π‘ **mindBERT - Advancing AI for Mental Health Research!** π |
|
|
|