agentlans's picture
Update README.md
777a934 verified
---
license: mit
base_model:
- agentlans/multilingual-e5-small-aligned-v2
language:
- en
- zh
- fr
- pt
- es
- ja
- tr
- ru
- ar
- ko
- th
- it
- de
- vi
- ms
- id
- fil
- hi
- pl
- cs
- nl
- km
- my
- fa
- gu
- ur
- te
- mr
- he
- bn
- ta
- uk
- bo
- kk
- mn
- ug
- yue
datasets:
- agentlans/refusal-classifier-data
pipeline_tag: text-classification
tags:
- text-classification
- multilingual
- refusal-detection
- alignment
- conversation-analysis
- fine-tuned-model
- ethics
- ai-safety
- e5
- transformer
- huggingface
- research
---
# Multilingual Refusal Classifier
This model detects **assistant refusals** in multilingual AI conversations.
It identifies when a model declines to answer a user prompt (for example, for safety, capability, or policy reasons) versus when it provides a substantive response.
The model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2](https://huggingface.co/agentlans/multilingual-e5-small-aligned-v2),
trained on the [agentlans/refusal-classifier-data](https://huggingface.co/datasets/agentlans/refusal-classifier-data) dataset.
**Evaluation results:**
- **Loss:** 0.2665
- **Accuracy:** 0.9153
- **Training tokens:** 5,347,200
## Usage
This classifier accepts input in conversation-like text formats using structured role tokens.
For long texts, insert `<|...|>` as an ellipsis placeholder in the middle of omitted content.
**Supported input formats:**
- `<|system|>System prompt<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`
- `<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`
**Example:**
```python
from transformers import pipeline
classifier = pipeline(
task="text-classification",
model="agentlans/multilingual-e5-small-refusal-classifier"
)
text = (
"<|user|>Mr. Loyd wants to fence his square-shaped land of 150 sqft each side. "
"If a pole is laid every certain distance, he needs 30 poles. "
"What is the distance between each pole in feet?"
"<|assistant|>If Mr. Loyd's land is square-shaped and each side is 150 sqft, then<|...|>"
"ce between poles β‰ˆ 20.69 sqft\n\nTherefore, the distance between each pole is approximately 20.69 feet."
)
print(classifier(text))
# [{'label': 'Non-refusal', 'score': 0.9906}]
```
## Evaluation Results
The classifier was tested on ten examples translated from the [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1) model page.
Full examples are available in [Examples.md](Examples.md).
- 🚫 β€” The model predicted a **refusal to answer**.
- β—― β€” The model predicted a **valid response**.
| Example | English | French | Spanish | Chinese | Russian | Arabic |
|----------|:--------:|:-------:|:---------:|:---------:|:----------:|:--------:|
| 1 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 2 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 3 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 4 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 5 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 6 | β—― | β—― | β—― | β—― | β—― | β—― |
| 7 | β—― | β—― | β—― | β—― | β—― | β—― |
| 8 | β—― | β—― | β—― | β—― | β—― | β—― |
| 9 | β—― | 🚫 | β—― | β—― | 🚫 | 🚫 |
| 10 | β—― | β—― | β—― | β—― | β—― | β—― |
The classifier performs consistently across major languages, though some false positives remain, especially in contexts with ambiguous phrasing.
## Limitations
- **Input length:** 512-token maximum
- **False positives/negatives:** Occasionally similar to the Minos classifier
- **Low-resource languages:** May yield inconsistent predictions
- **Cultural variation:** Expressions of refusal differ linguistically, which can affect accuracy
## Training Details
### Hyperparameters
- **Learning rate:** 5e-5
- **Train batch size:** 8
- **Eval batch size:** 8
- **Seed:** 42
- **Optimizer:** `ADAMW_TORCH_FUSED` (`betas=(0.9, 0.999)`, `epsilon=1e-8`)
- **Scheduler:** Linear
- **Epochs:** 5
### Framework Versions
- Transformers 5.0.0.dev0
- PyTorch 2.9.1+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1
## Intended Use
This model is designed for:
- Identifying **AI refusals** during conversation analysis.
- Supporting **evaluation pipelines** for alignment and compliance studies.
- Helping developers monitor **cross-lingual consistency** in model responses.
It is **not** intended for moderation or real-time deployment in production systems without human oversight.