--- language: - en license: apache-2.0 base_model: distilbert/distilbert-base-uncased tags: - text-classification - guardrail - safety - cbt - mental-health - academic-stress pipeline_tag: text-classification --- # ReframeBot-Guardrail-DistilBERT A 3-class DistilBERT classifier for routing ReframeBot user turns: | Label | Meaning | |---|---| | `TASK_1` | CBT / academic stress | | `TASK_2` | Crisis / self-harm signal | | `TASK_3` | Out-of-scope | This version was retrained on `data/guardrail_dataset_clean.jsonl`, which merges the original guardrail data with curated hard cases for CBT/Crisis boundaries, Vietnamese text, pills/overdose language, and OOS work/mental health informational prompts. ## Current System Threshold The ReframeBot runtime uses the classifier's full probability vector and routes to `TASK_2` when: ```text P(TASK_2) >= 0.10 ``` after academic-context/follow-up overrides and after the regex + semantic crisis detector has already run. ## Evaluation Hard out-of-domain eval set (`data/evaluation_test_data.json`, 60 samples): | Mode | Accuracy | TASK_2 Precision | TASK_2 Recall | TASK_2 F1 | |---|---:|---:|---:|---:| | Argmax only | 0.9667 | 1.0000 | 0.9048 | 0.9500 | | Tuned `P(TASK_2) >= 0.10` | 0.9833 | 0.9545 | 1.0000 | 0.9767 | Threshold sweep artifact in the project repo: - `reports/guardrail_threshold_sweep.csv` - `reports/guardrail_threshold_sweep.png` ## Usage ```python from transformers import pipeline classifier = pipeline( "text-classification", model="Nhatminh1234/ReframeBot-Guardrail-DistilBERT", revision="v2-guardrail-clean", ) classifier("I'm stressed about my final exam") ``` For full class probabilities: ```python classifier("I bought pills to overdose", top_k=None) ``` ## Safety Note This classifier is a routing component, not a standalone crisis intervention system. ReframeBot also uses regex + semantic crisis detection and crisis response handling around this model.