Spaces:

K3016
/

hackthon2

Sleeping

App Files Files Community

hackthon2 / README.md

kz110AIPI

Initial HF Space deployment

4eb1829 15 days ago

preview code

Raw

History Blame Contribute Delete

4.92 kB

	---
	title: Hackthon2
	emoji: 🧠
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	app_file: app.py
	pinned: false
	---
	# MindSignal: Mental Health Support Triage Assistant

	MindSignal is a hackathon prototype that classifies short mental-health-related messages into three triage labels:

	- `informational`
	- `emotional_support_needed`
	- `escalation_required`

	## Problem Statement

	Support teams, community moderators, and wellness platforms often receive short messages that vary widely in urgency. MindSignal explores whether a lightweight NLP assistant can help triage these messages into informational requests, emotional support needs, and possible escalation cases.

	This project is a prototype only. It is not a medical diagnosis tool, therapist, crisis service, or emergency system.

	## Model Used

	The project uses the pre-trained Hugging Face model `distilbert-base-uncased`.

	DistilBERT is a smaller and faster version of BERT. It already understands a broad amount of English language structure from pre-training, which makes it useful for transfer learning on a small hackathon dataset.

	## Transfer Learning Approach

	`train.py` loads the pre-trained DistilBERT weights and adds a 3-class sequence classification head:

	- class 0: `informational`
	- class 1: `emotional_support_needed`
	- class 2: `escalation_required`

	The model is then fine-tuned on rows where `split == "train"` in:

	```text
	data/mental_health_triage_synthetic_dataset.csv
	```

	The fine-tuned model is saved to:

	```text
	models/mindsignal-distilbert/
	```

	## Data Augmentation

	The training pipeline keeps the original examples and adds lightweight augmented copies using:

	- random lowercase conversion
	- small typo/noise injection
	- emoji insertion
	- slang phrase insertion
	- abbreviation examples such as `rn`, `ngl`, and `idk`

	These augmentations are intentionally simple and are used only on the training split.

	## Safety Override

	Before model prediction, MindSignal checks for high-risk phrases such as:

	- `kill myself`
	- `end my life`
	- `not worth living`
	- `hurt myself`
	- `suicide`
	- `not safe`
	- `better off without me`
	- `can't keep myself safe`

	If one of these phrases is found, the prediction is immediately set to:

	```text
	escalation_required
	```

	This rule-based override is used in both `evaluate.py` and `app.py`.

	## Evaluation Metrics

	`evaluate.py` evaluates the test split and stress-test split separately.

	It reports:

	- accuracy
	- macro F1
	- per-class precision, recall, and F1
	- `escalation_required` recall
	- confusion matrix
	- stress-test accuracy

	Outputs are saved to:

	```text
	results/evaluation_report.txt
	results/confusion_matrix.png
	```

	## Preliminary Results

	After training, run evaluation to generate preliminary results:

	```bash
	python evaluate.py
	```

	The generated report will contain the current model's metrics on the `test` and `stress_test` splits. Because this repository includes a very small synthetic starter dataset, results are useful only as a smoke test. Replace the CSV with a larger reviewed dataset for meaningful model performance.

	## Project Structure

	```text
	.
	+-- app.py
	+-- evaluate.py
	+-- train.py
	+-- mindsignal_utils.py
	+-- requirements.txt
	+-- README.md
	+-- data/
	\| +-- mental_health_triage_synthetic_dataset.csv
	+-- models/
	\| +-- mindsignal-distilbert/
	+-- results/
	+-- evaluation_report.txt
	+-- confusion_matrix.png
	```

	## Setup

	Use Python 3.11.

	```bash
	python3.11 -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	## Run Training

	```bash
	python train.py
	```

	This fine-tunes DistilBERT and saves the trained model in `models/mindsignal-distilbert/`.

	## Run Evaluation

	```bash
	python evaluate.py
	```

	This writes `results/evaluation_report.txt` and `results/confusion_matrix.png`.

	## Run the Streamlit App

	```bash
	streamlit run app.py
	```

	The app provides:

	- a text area for a user message
	- a classify button
	- predicted label
	- confidence score
	- safety warning when `escalation_required` is predicted
	- a clear medical and emergency-service disclaimer

	## Limitations

	- The included dataset is synthetic and small.
	- The classifier should not be used as a standalone safety system.
	- High-risk language can be indirect, sarcastic, multilingual, or misspelled.
	- False negatives are especially serious for escalation cases.
	- Human review and professional crisis workflows are required for real deployments.
	- More diverse data, expert labeling, calibration, fairness testing, and red-team evaluation are needed before real-world use.
	## Attribution

	This project was developed with assistance from OpenAI ChatGPT for software engineering support, code generation, testing, and documentation. All generated content was reviewed, modified, and validated by the author.

	Additional references:
	- PyTorch Documentation: https://pytorch.org/docs/stable/
	- scikit-learn Documentation: https://scikit-learn.org/stable/