| --- |
| language: ["en"] |
| license: mit |
| tags: |
| - mental-health |
| - bert |
| - text-classification |
| - nlp |
| - transformers |
| - explainable-ai |
| - shap |
| datasets: |
| - kaggle-sentiment-analysis-mental-health |
| metrics: |
| - accuracy |
| - f1 |
| --- |
| |
| # π§ MindSense-BERT β AI-Powered Mental Health Detection |
|  |
|  |
|  |
|  |
|  |
|  |
| --- |
| MindSense-BERT is a fine-tuned BERT-based model designed to **classify text into 7 mental health categories** using real-world Reddit data. |
| It combines **state-of-the-art NLP, class imbalance handling, and explainable AI (SHAP)** to create a transparent and practical mental health analysis system. |
|
|
| --- |
|
|
| ## π Model Overview |
|
|
| * **Model Type:** Transformer (BERT) |
|
|
| * **Base Model:** `bert-base-uncased` |
|
|
| * **Task:** Multi-class Text Classification |
|
|
| * **Classes (7):** |
|
|
| * Normal |
| * Depression |
| * Anxiety |
| * Bipolar |
| * PTSD |
| * Stress |
| * Personality Disorder |
|
|
| * **Framework:** PyTorch + Hugging Face Transformers |
|
|
| * **Deployment:** Streamlit + Hugging Face Hub |
|
|
| --- |
|
|
| ## π― Problem Statement |
|
|
| Mental health conditions affect **1 in 4 people globally**, but early detection is difficult due to stigma and lack of awareness. |
|
|
| This model aims to: |
|
|
| * Analyze user-written text |
| * Detect potential mental health conditions |
| * Enable early awareness and intervention |
|
|
| --- |
|
|
| ## π Dataset |
|
|
| * **Source:** Kaggle β *Sentiment Analysis for Mental Health* |
| * **Size:** 53,000+ Reddit posts |
| * **Input:** `statement` (text) |
| * **Label:** `status` (mental health category) |
|
|
| ### Categories: |
|
|
| * Normal |
| * Depression |
| * Anxiety |
| * Bipolar |
| * PTSD |
| * Stress |
| * Personality Disorder |
|
|
| --- |
|
|
| ## βοΈ Preprocessing |
|
|
| * Removed URLs, special characters, duplicates |
| * Handled missing values |
| * Tokenized using BERT tokenizer |
| * Lowercasing (uncased model) |
| * Padding & truncation applied |
|
|
| --- |
|
|
| ## π¬ Training Methodology |
|
|
| ### Phase 1 β Baseline Models |
|
|
| * TF-IDF (10k features, uni + bi-grams) |
| * Logistic Regression, Random Forest, SVM |
|
|
| ### Phase 2 β BERT Fine-tuning |
|
|
| * Pretrained `bert-base-uncased` |
| * 3 epochs |
| * Learning rate: `1e-5` |
| * Batch size: `16` |
| * Trained on Google Colab T4 GPU |
|
|
| --- |
|
|
| ### Phase 3 β Class Imbalance Handling |
|
|
| * Weak classes identified: Stress, Personality Disorder |
| * Techniques used: |
|
|
| * Word swap |
| * Random deletion |
| * Key phrase duplication |
| * Applied custom class weights |
|
|
| --- |
|
|
| ## π Model Performance |
|
|
| ### Accuracy Comparison |
|
|
| | Model | Accuracy | |
| | ------------------- | ----------- | |
| | Logistic Regression | ~78% | |
| | Random Forest | ~74% | |
| | SVM | ~82% | |
| | BERT (initial) | ~83% | |
| | **BERT (final)** | **~87β89%** | |
|
|
| --- |
|
|
| ### Per-Class F1 Score |
|
|
| | Category | F1 Score | |
| | -------------------- | -------- | |
| | Normal | 0.91 | |
| | Depression | 0.89 | |
| | Anxiety | 0.86 | |
| | Bipolar | 0.84 | |
| | PTSD | 0.87 | |
| | Stress | 0.82 | |
| | Personality Disorder | 0.80 | |
|
|
| --- |
|
|
| ## π Explainability (SHAP) |
|
|
| This model integrates **SHAP (SHapley Additive Explanations)** for transparency: |
|
|
| * π΄ Red words β push prediction toward a class |
| * π΅ Blue words β push prediction away |
|
|
| This improves trust and interpretability β critical in healthcare AI. |
|
|
| --- |
|
|
| ## π Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_name = "maitry30/mindsense-bert" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| |
| text = "I feel completely hopeless and empty." |
| |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
| outputs = model(**inputs) |
| |
| prediction = torch.argmax(outputs.logits, dim=1).item() |
| print(prediction) |
| ``` |
|
|
| --- |
|
|
| ## π§ͺ Example Predictions |
|
|
| | Input Text | Predicted Class | |
| | --------------------------- | --------------- | |
| | "I feel hopeless and empty" | Depression | |
| | "I keep having nightmares" | PTSD | |
| | "My mood swings a lot" | Bipolar | |
| | "I feel fine today" | Normal | |
|
|
| --- |
|
|
| ## π Tech Stack |
|
|
| * Python 3.10 |
| * PyTorch |
| * Hugging Face Transformers |
| * Scikit-learn |
| * SHAP |
| * Pandas, NumPy |
| * Matplotlib, Seaborn |
| * Streamlit (UI) |
| * Hugging Face Hub (model hosting) |
|
|
| --- |
|
|
| ## β οΈ Limitations |
|
|
| * Limited to English text |
| * May struggle with sarcasm or slang |
| * Depends on dataset quality |
| * Not suitable for real clinical diagnosis |
|
|
| --- |
|
|
| ## βοΈ Ethical Considerations |
|
|
| * This model is for **educational purposes only** |
| * Not a replacement for mental health professionals |
| * Should not be used for medical decisions |
| * Predictions require human interpretation |
|
|
| --- |
|
|
| ## π Safety Notice |
|
|
| * β Not a diagnostic tool |
| * β Not for emergency use |
| * β
Can assist awareness and research |
|
|
| --- |
|
|
| ## π Future Work |
|
|
| * Multilingual support (Hindi + regional languages) |
| * Voice-based mental health detection |
| * Multi-modal AI (text + physiological signals) |
| * Explainable AI improvements |
| * Production deployment (AWS/Azure) |
|
|
| --- |
|
|
| ## π€ Author |
|
|
| **Maitry** |
|
|
| * GitHub: https://github.com/Maitry09/mindsense-mental-health |
| * Hugging Face: https://huggingface.co/maitry30 |
| * Live App: https://mindsense.streamlit.app |
|
|
| --- |
|
|
| ## π License |
|
|
| MIT License |
|
|
| --- |
|
|
| > β If you found this model useful, consider giving the project a star! |
|
|