Nhatminh1234's picture
Update model card for v2-guardrail-clean
2613ce3 verified
|
Raw
History Blame Contribute Delete
1.97 kB
---
language:
- en
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
- text-classification
- guardrail
- safety
- cbt
- mental-health
- academic-stress
pipeline_tag: text-classification
---
# ReframeBot-Guardrail-DistilBERT
A 3-class DistilBERT classifier for routing ReframeBot user turns:
| Label | Meaning |
|---|---|
| `TASK_1` | CBT / academic stress |
| `TASK_2` | Crisis / self-harm signal |
| `TASK_3` | Out-of-scope |
This version was retrained on `data/guardrail_dataset_clean.jsonl`, which
merges the original guardrail data with curated hard cases for CBT/Crisis
boundaries, Vietnamese text, pills/overdose language, and OOS work/mental
health informational prompts.
## Current System Threshold
The ReframeBot runtime uses the classifier's full probability vector and
routes to `TASK_2` when:
```text
P(TASK_2) >= 0.10
```
after academic-context/follow-up overrides and after the regex + semantic
crisis detector has already run.
## Evaluation
Hard out-of-domain eval set (`data/evaluation_test_data.json`, 60 samples):
| Mode | Accuracy | TASK_2 Precision | TASK_2 Recall | TASK_2 F1 |
|---|---:|---:|---:|---:|
| Argmax only | 0.9667 | 1.0000 | 0.9048 | 0.9500 |
| Tuned `P(TASK_2) >= 0.10` | 0.9833 | 0.9545 | 1.0000 | 0.9767 |
Threshold sweep artifact in the project repo:
- `reports/guardrail_threshold_sweep.csv`
- `reports/guardrail_threshold_sweep.png`
## Usage
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Nhatminh1234/ReframeBot-Guardrail-DistilBERT",
revision="v2-guardrail-clean",
)
classifier("I'm stressed about my final exam")
```
For full class probabilities:
```python
classifier("I bought pills to overdose", top_k=None)
```
## Safety Note
This classifier is a routing component, not a standalone crisis intervention
system. ReframeBot also uses regex + semantic crisis detection and crisis
response handling around this model.