Human-CentricAI
/

LLM-Refusal-Classifier

Model card Files Files and versions

Human-CentricAI commited on Feb 17, 2025

Commit

cd5ab52

·

verified ·

1 Parent(s): 76a2920

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -12,8 +12,7 @@ The model assigns one of five possible labels:
 3 (**Refusal Capability**): The model refuses to answer due to its own limitations, lack of information, or lack of ability to provide an adequate response. <br />
 4 (**Disclaimer Capability**): The model signals its limitations but attempts to provide an answer within its capacity  <br />
-Please cite: <br />
-Pasch, S. (2025). LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena. arXiv preprint arXiv:2501.03266.
 References <br />
 [1] Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. <br />

 3 (**Refusal Capability**): The model refuses to answer due to its own limitations, lack of information, or lack of ability to provide an adequate response. <br />
 4 (**Disclaimer Capability**): The model signals its limitations but attempts to provide an answer within its capacity  <br />
+Please cite: <br /> LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena (under review).
 References <br />
 [1] Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. <br />