---
language:
- en
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
- text-classification
- guardrail
- safety
- cbt
- mental-health
- academic-stress
pipeline_tag: text-classification
---

# ReframeBot-Guardrail-DistilBERT

A 3-class DistilBERT classifier for routing ReframeBot user turns:

| Label | Meaning |
|---|---|
| `TASK_1` | CBT / academic stress |
| `TASK_2` | Crisis / self-harm signal |
| `TASK_3` | Out-of-scope |

This version was retrained on `data/guardrail_dataset_clean.jsonl`, which
merges the original guardrail data with curated hard cases for CBT/Crisis
boundaries, Vietnamese text, pills/overdose language, and OOS work/mental
health informational prompts.

## Current System Threshold

The ReframeBot runtime uses the classifier's full probability vector and
routes to `TASK_2` when:

```text
P(TASK_2) >= 0.10
```

after academic-context/follow-up overrides and after the regex + semantic
crisis detector has already run.

## Evaluation

Hard out-of-domain eval set (`data/evaluation_test_data.json`, 60 samples):

| Mode | Accuracy | TASK_2 Precision | TASK_2 Recall | TASK_2 F1 |
|---|---:|---:|---:|---:|
| Argmax only | 0.9667 | 1.0000 | 0.9048 | 0.9500 |
| Tuned `P(TASK_2) >= 0.10` | 0.9833 | 0.9545 | 1.0000 | 0.9767 |

Threshold sweep artifact in the project repo:

- `reports/guardrail_threshold_sweep.csv`
- `reports/guardrail_threshold_sweep.png`

## Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Nhatminh1234/ReframeBot-Guardrail-DistilBERT",
    revision="v2-guardrail-clean",
)

classifier("I'm stressed about my final exam")
```

For full class probabilities:

```python
classifier("I bought pills to overdose", top_k=None)
```

## Safety Note

This classifier is a routing component, not a standalone crisis intervention
system. ReframeBot also uses regex + semantic crisis detection and crisis
response handling around this model.