Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
This model is a fine-tuned version of RoBERTa-large [1]. It was trained on 2,450 LLM responses from Chatbot Arena [2]. The model classifies if the model refuses to respond to the prompt or provides a disclaimer. It also distuinguishes the refusal/disclaimer reason between (i) ethical concerns, and (ii) lack of technical capabilities or missing information or context.
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
The model assigns one of five possible labels:
|
| 5 |
+
|
| 6 |
+
0 (**normal**): No refusal or disclaimer; the model provides a standard, straightforward answer <br />
|
| 7 |
+
1 (**Refusal Unethical**): The model refuses to answer for ethical reasons, such as legal, moral, inappropriate, or safety-related concerns <br />
|
| 8 |
+
2 (**Disclaimer Unethical**): The model cites ethical concerns but still attempts to conduct the task/question of the prompt <br />
|
| 9 |
+
3 (**Refusal Capability**): The model refuses to answer due to its own limitations, lack of information, or lack of ability to provide an adequate response. <br />
|
| 10 |
+
4 (**Disclaimer Capability**): The model signals its limitations but attempts to provide an answer within its capacity <br />
|
| 11 |
+
|
| 12 |
+
References <br />
|
| 13 |
+
[1] Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
|
| 14 |
+
[2] Chiang, W. L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., ... & Stoica, I. (2024). Chatbot arena: An open platform for evaluating llms by human preference. arXiv preprint arXiv:2403.04132.
|