agentlans commited on
Commit
fcab8ef
Β·
verified Β·
1 Parent(s): 773b828

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -46
README.md CHANGED
@@ -45,69 +45,102 @@ datasets:
45
  pipeline_tag: text-classification
46
  ---
47
 
48
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
49
- should probably proofread and complete it, then remove this comment. -->
50
 
51
- # multilingual-e5-small-aligned-v2-conversation-refusal
 
52
 
53
- This model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2] on the [agentlans/refusal-classifier-data] dataset.
54
- It achieves the following results on the evaluation set:
55
- - Loss: 0.2665
56
- - Accuracy: 0.9153
57
- - Num Input Tokens Seen: 5347200
58
 
59
- ## Model description
 
 
 
60
 
61
- More information needed
62
 
63
- ## Intended uses & limitations
 
64
 
65
- More information needed
 
 
66
 
67
- ## Results
68
 
69
- Classifier results on the [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1) model's 10 examples translated into various languages.
70
- See the translated examples [here](Examples.md).
71
 
72
- Refusals and non-refusals are accurately classified and consistent across languages (although with some false positives).
 
 
 
73
 
74
- - 🚫 means the classifier determined that the assistant **refused to answer** the user’s prompt.
75
- - β—― means the classifier determined that the assistant **provided an answer** to the user’s prompt.
 
 
 
 
 
76
 
77
- | Text | English | French | Spanish | Chinese | Russian | Arabic |
78
- |--------|:---------:|:--------:|:---------:|:---------:|:---------:|:--------:|
79
- | 1 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
80
- | 2 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
81
- | 3 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
82
- | 4 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
83
- | 5 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
84
- | 6 | β—― | β—― | β—― | β—― | β—― | β—― |
85
- | 7 | β—― | β—― | β—― | β—― | β—― | β—― |
86
- | 8 | β—― | β—― | β—― | β—― | β—― | β—― |
87
- | 9 | β—― | 🚫 | β—― | β—― | 🚫 | 🚫 |
88
- | 10 | β—― | β—― | β—― | β—― | β—― | β—― |
89
 
 
90
 
91
- ## Training procedure
 
92
 
93
- ### Training hyperparameters
 
94
 
95
- The following hyperparameters were used during training:
96
- - learning_rate: 5e-05
97
- - train_batch_size: 8
98
- - eval_batch_size: 8
99
- - seed: 42
100
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
101
- - lr_scheduler_type: linear
102
- - num_epochs: 5.0
 
 
 
 
103
 
104
- ### Training results
105
 
 
106
 
 
 
 
 
107
 
108
- ### Framework versions
109
 
110
- - Transformers 5.0.0.dev0
111
- - Pytorch 2.9.1+cu128
112
- - Datasets 4.4.1
113
- - Tokenizers 0.22.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  pipeline_tag: text-classification
46
  ---
47
 
48
+ # Multilingual Refusal Classifier
 
49
 
50
+ This model detects **assistant refusals** in multilingual AI conversations.
51
+ It identifies when a model declines to answer a user prompt (for example, for safety, capability, or policy reasons) versus when it provides a substantive response.
52
 
53
+ The model is a fine-tuned version of [agentlans/multilingual-e5-small-aligned-v2](https://huggingface.co/agentlans/multilingual-e5-small-aligned-v2),
54
+ trained on the [agentlans/refusal-classifier-data](https://huggingface.co/datasets/agentlans/refusal-classifier-data) dataset.
 
 
 
55
 
56
+ **Evaluation results:**
57
+ - **Loss:** 0.2665
58
+ - **Accuracy:** 0.9153
59
+ - **Training tokens:** 5,347,200
60
 
61
+ ## Usage
62
 
63
+ This classifier accepts input in conversation-like text formats using structured role tokens.
64
+ For long texts, insert `<|...|>` as an ellipsis placeholder in the middle of omitted content.
65
 
66
+ **Supported input formats:**
67
+ - `<|system|>System prompt<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`
68
+ - `<|user|>User message<|assistant|>Response<|user|>Next user message<|assistant|>Next response...`
69
 
70
+ **Example:**
71
 
72
+ ```python
73
+ from transformers import pipeline
74
 
75
+ classifier = pipeline(
76
+ task="text-classification",
77
+ model="agentlans/multilingual-e5-small-refusal-classifier"
78
+ )
79
 
80
+ text = (
81
+ "<|user|>Mr. Loyd wants to fence his square-shaped land of 150 sqft each side. "
82
+ "If a pole is laid every certain distance, he needs 30 poles. "
83
+ "What is the distance between each pole in feet?"
84
+ "<|assistant|>If Mr. Loyd's land is square-shaped and each side is 150 sqft, then<|...|>"
85
+ "ce between poles β‰ˆ 20.69 sqft\n\nTherefore, the distance between each pole is approximately 20.69 feet."
86
+ )
87
 
88
+ print(classifier(text))
89
+ # [{'label': 'Non-refusal', 'score': 0.9906}]
90
+ ```
 
 
 
 
 
 
 
 
 
91
 
92
+ ## Evaluation Results
93
 
94
+ The classifier was tested on ten multilingual examples translated from the [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1) dataset.
95
+ Full examples are available in [Examples.md](Examples.md).
96
 
97
+ - 🚫 β€” The model predicted a **refusal to answer**.
98
+ - β—― β€” The model predicted a **valid response**.
99
 
100
+ | Example | English | French | Spanish | Chinese | Russian | Arabic |
101
+ |----------|:--------:|:-------:|:---------:|:---------:|:----------:|:--------:|
102
+ | 1 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
103
+ | 2 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
104
+ | 3 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
105
+ | 4 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
106
+ | 5 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
107
+ | 6 | β—― | β—― | β—― | β—― | β—― | β—― |
108
+ | 7 | β—― | β—― | β—― | β—― | β—― | β—― |
109
+ | 8 | β—― | β—― | β—― | β—― | β—― | β—― |
110
+ | 9 | β—― | 🚫 | β—― | β—― | 🚫 | 🚫 |
111
+ | 10 | β—― | β—― | β—― | β—― | β—― | β—― |
112
 
113
+ The classifier performs consistently across major languages, though some false positives remain, especially in contexts with ambiguous phrasing.
114
 
115
+ ## Limitations
116
 
117
+ - **Input length:** 512-token maximum
118
+ - **False positives/negatives:** Occasionally similar to the Minos classifier
119
+ - **Low-resource languages:** May yield inconsistent predictions
120
+ - **Cultural variation:** Expressions of refusal differ linguistically, which can affect accuracy
121
 
122
+ ## Training Details
123
 
124
+ ### Hyperparameters
125
+ - **Learning rate:** 5e-5
126
+ - **Train batch size:** 8
127
+ - **Eval batch size:** 8
128
+ - **Seed:** 42
129
+ - **Optimizer:** `ADAMW_TORCH_FUSED` (`betas=(0.9, 0.999)`, `epsilon=1e-8`)
130
+ - **Scheduler:** Linear
131
+ - **Epochs:** 5
132
+
133
+ ### Framework Versions
134
+ - Transformers 5.0.0.dev0
135
+ - PyTorch 2.9.1+cu128
136
+ - Datasets 4.4.1
137
+ - Tokenizers 0.22.1
138
+
139
+ ## Intended Use
140
+
141
+ This model is designed for:
142
+ - Identifying **AI refusals** during conversation analysis.
143
+ - Supporting **evaluation pipelines** for alignment and compliance studies.
144
+ - Helping developers monitor **cross-lingual consistency** in model responses.
145
+
146
+ It is **not** intended for moderation or real-time deployment in production systems without human oversight.