---
license: mit
base_model:
- agentlans/multilingual-e5-small-aligned-v2
language:
- en
- zh
- fr
- pt
- es
- ja
- tr
- ru
- ar
- ko
- th
- it
- de
- vi
- ms
- id
- fil
- hi
- pl
- cs
- nl
- km
- my
- fa
- gu
- ur
- te
- mr
- he
- bn
- ta
- uk
- bo
- kk
- mn
- ug
- yue
datasets:
- agentlans/refusal-classifier-data
pipeline_tag: text-classification
tags:
  - text-classification
  - multilingual
  - refusal-detection
  - alignment
  - conversation-analysis
  - fine-tuned-model
  - ethics
  - ai-safety
  - e5
  - transformer
  - huggingface
  - research
---

# Multilingual Refusal Classifier

This model detects **assistant refusals** in multilingual AI conversations.
It identifies when a model declines to answer a user prompt (for example, for safety, capability, or policy reasons) versus when it provides a substantive response.

The model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2](https://huggingface.co/agentlans/multilingual-e5-small-aligned-v2), 
trained on the [agentlans/refusal-classifier-data](https://huggingface.co/datasets/agentlans/refusal-classifier-data) dataset.

**Evaluation results:**
- **Loss:** 0.2665  
- **Accuracy:** 0.9153  
- **Training tokens:** 5,347,200  

## Usage

This classifier accepts input in conversation-like text formats using structured role tokens.  
For long texts, insert `<|...|>` as an ellipsis placeholder in the middle of omitted content.

**Supported input formats:**
- `<|system|>System prompt<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`
- `<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`

**Example:**

```python
from transformers import pipeline

classifier = pipeline(
    task="text-classification",
    model="agentlans/multilingual-e5-small-refusal-classifier"
)

text = (
    "<|user|>Mr. Loyd wants to fence his square-shaped land of 150 sqft each side. "
    "If a pole is laid every certain distance, he needs 30 poles. "
    "What is the distance between each pole in feet?"
    "<|assistant|>If Mr. Loyd's land is square-shaped and each side is 150 sqft, then<|...|>"
    "ce between poles ≈ 20.69 sqft\n\nTherefore, the distance between each pole is approximately 20.69 feet."
)

print(classifier(text))
# [{'label': 'Non-refusal', 'score': 0.9906}]
```

## Evaluation Results

The classifier was tested on ten examples translated from the [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1) model page.
Full examples are available in [Examples.md](Examples.md).

- 🚫 — The model predicted a **refusal to answer**.  
- ◯ — The model predicted a **valid response**.

| Example | English | French | Spanish | Chinese | Russian | Arabic |
|----------|:--------:|:-------:|:---------:|:---------:|:----------:|:--------:|
| 1        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 2        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 3        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 4        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 5        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
| 6        | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |
| 7        | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |
| 8        | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |
| 9        | ◯ | 🚫 | ◯ | ◯ | 🚫 | 🚫 |
| 10       | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |

The classifier performs consistently across major languages, though some false positives remain, especially in contexts with ambiguous phrasing.

## Limitations

- **Input length:** 512-token maximum  
- **False positives/negatives:** Occasionally similar to the Minos classifier  
- **Low-resource languages:** May yield inconsistent predictions  
- **Cultural variation:** Expressions of refusal differ linguistically, which can affect accuracy  

## Training Details

### Hyperparameters
- **Learning rate:** 5e-5  
- **Train batch size:** 8  
- **Eval batch size:** 8  
- **Seed:** 42  
- **Optimizer:** `ADAMW_TORCH_FUSED` (`betas=(0.9, 0.999)`, `epsilon=1e-8`)  
- **Scheduler:** Linear  
- **Epochs:** 5  

### Framework Versions
- Transformers 5.0.0.dev0  
- PyTorch 2.9.1+cu128  
- Datasets 4.4.1  
- Tokenizers 0.22.1  

## Intended Use

This model is designed for:
- Identifying **AI refusals** during conversation analysis.  
- Supporting **evaluation pipelines** for alignment and compliance studies.  
- Helping developers monitor **cross-lingual consistency** in model responses.  

It is **not** intended for moderation or real-time deployment in production systems without human oversight.