Artifact Appendix
Paper title: An Analysis of Chinese Censorship Bias in LLMs
Artifacts HotCRP Id: #2
Requested Badge: Available
Description
This artifact includes CensorshipDetector, a text-classification model designed to classify a given text as more or less similar to known sanitized content (i.e., those pieces of content which remain after being subjected to state censorship including alterations, deletions, and self-imposed censorship). It also includes a large-scale dataset of Baidu Baike articles which were used to fine-tune CensorshipDetector and a dataset of Chinese-language news articles used to evaluate CensorshipDetector
Security/Privacy Issues and Ethical Concerns (All badges)
None
Environment
In the following, describe how to access our artifact and all related and necessary data and software components. Afterward, describe how to set up everything and how to verify that everything is set up correctly.
Accessibility (All badges)
CensorshipDetector and associated datasets are available on HuggingFace.