Update README.md

Browse files

Files changed (1) hide show

README.md +79 -46

README.md CHANGED Viewed

@@ -45,69 +45,102 @@ datasets:
 pipeline_tag: text-classification
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# multilingual-e5-small-aligned-v2-conversation-refusal
-This model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2] on the [agentlans/refusal-classifier-data] dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.2665
-- Accuracy: 0.9153
-- Num Input Tokens Seen: 5347200
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Results
-Classifier results on the [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1) model's 10 examples translated into various languages.
-See the translated examples [here](Examples.md).
-Refusals and non-refusals are accurately classified and consistent across languages (although with some false positives).
-- 🚫 means the classifier determined that the assistant **refused to answer** the user’s prompt.
-- ◯ means the classifier determined that the assistant **provided an answer** to the user’s prompt.
-| Text | English | French | Spanish | Chinese | Russian | Arabic |
-|--------|:---------:|:--------:|:---------:|:---------:|:---------:|:--------:|
-| 1      | 🚫      | 🚫     | 🚫      | 🚫      | 🚫      | 🚫     |
-| 2      | 🚫      | 🚫     | 🚫      | 🚫      | 🚫      | 🚫     |
-| 3      | 🚫      | 🚫     | 🚫      | 🚫      | 🚫      | 🚫     |
-| 4      | 🚫      | 🚫     | 🚫      | 🚫      | 🚫      | 🚫     |
-| 5      | 🚫      | 🚫     | 🚫      | 🚫      | 🚫      | 🚫     |
-| 6      | ◯      | ◯     | ◯      | ◯      | ◯      | ◯     |
-| 7      | ◯      | ◯     | ◯      | ◯      | ◯      | ◯     |
-| 8      | ◯      | ◯     | ◯      | ◯      | ◯      | ◯     |
-| 9      | ◯      | 🚫     | ◯      | ◯      | 🚫      | 🚫     |
-| 10     | ◯      | ◯     | ◯      | ◯      | ◯      | ◯     |
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 5.0
-### Training results
-### Framework versions
-- Transformers 5.0.0.dev0
-- Pytorch 2.9.1+cu128
-- Datasets 4.4.1
-- Tokenizers 0.22.1

 pipeline_tag: text-classification
 ---
+# Multilingual Refusal Classifier
+This model detects **assistant refusals** in multilingual AI conversations.
+It identifies when a model declines to answer a user prompt (for example, for safety, capability, or policy reasons) versus when it provides a substantive response.
+The model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2](https://huggingface.co/agentlans/multilingual-e5-small-aligned-v2),
+trained on the [agentlans/refusal-classifier-data](https://huggingface.co/datasets/agentlans/refusal-classifier-data) dataset.
+**Evaluation results:**
+- **Loss:** 0.2665
+- **Accuracy:** 0.9153
+- **Training tokens:** 5,347,200
+## Usage
+This classifier accepts input in conversation-like text formats using structured role tokens.
+For long texts, insert `<|...|>` as an ellipsis placeholder in the middle of omitted content.
+**Supported input formats:**
+- `<|system|>System prompt<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`
+- `<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`
+**Example:**
+```python
+from transformers import pipeline
+classifier = pipeline(
+    task="text-classification",
+    model="agentlans/multilingual-e5-small-refusal-classifier"
+)
+text = (
+    "<|user|>Mr. Loyd wants to fence his square-shaped land of 150 sqft each side. "
+    "If a pole is laid every certain distance, he needs 30 poles. "
+    "What is the distance between each pole in feet?"
+    "<|assistant|>If Mr. Loyd's land is square-shaped and each side is 150 sqft, then<|...|>"
+    "ce between poles ≈ 20.69 sqft\n\nTherefore, the distance between each pole is approximately 20.69 feet."
+)
+print(classifier(text))
+# [{'label': 'Non-refusal', 'score': 0.9906}]
+```
+## Evaluation Results
+The classifier was tested on ten multilingual examples translated from the [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1) dataset.
+Full examples are available in [Examples.md](Examples.md).
+- 🚫 — The model predicted a **refusal to answer**.
+- ◯ — The model predicted a **valid response**.
+| Example | English | French | Spanish | Chinese | Russian | Arabic |
+|----------|:--------:|:-------:|:---------:|:---------:|:----------:|:--------:|
+| 1        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
+| 2        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
+| 3        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
+| 4        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
+| 5        | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
+| 6        | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |
+| 7        | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |
+| 8        | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |
+| 9        | ◯ | 🚫 | ◯ | ◯ | 🚫 | 🚫 |
+| 10       | ◯ | ◯ | ◯ | ◯ | ◯ | ◯ |
+The classifier performs consistently across major languages, though some false positives remain, especially in contexts with ambiguous phrasing.
+## Limitations
+- **Input length:** 512-token maximum
+- **False positives/negatives:** Occasionally similar to the Minos classifier
+- **Low-resource languages:** May yield inconsistent predictions
+- **Cultural variation:** Expressions of refusal differ linguistically, which can affect accuracy
+## Training Details
+### Hyperparameters
+- **Learning rate:** 5e-5
+- **Train batch size:** 8
+- **Eval batch size:** 8
+- **Seed:** 42
+- **Optimizer:** `ADAMW_TORCH_FUSED` (`betas=(0.9, 0.999)`, `epsilon=1e-8`)
+- **Scheduler:** Linear
+- **Epochs:** 5
+### Framework Versions
+- Transformers 5.0.0.dev0
+- PyTorch 2.9.1+cu128
+- Datasets 4.4.1
+- Tokenizers 0.22.1
+## Intended Use
+This model is designed for:
+- Identifying **AI refusals** during conversation analysis.
+- Supporting **evaluation pipelines** for alignment and compliance studies.
+- Helping developers monitor **cross-lingual consistency** in model responses.
+It is **not** intended for moderation or real-time deployment in production systems without human oversight.