Artifact Appendix

Paper title: An Analysis of Chinese Censorship Bias in LLMs

Artifacts HotCRP Id: #2

Requested Badge: Available

Description

This artifact includes CensorshipDetector, a text-classification model designed to classify a given text as more or less similar to known sanitized content (i.e., those pieces of content which remain after being subjected to state censorship including alterations, deletions, and self-imposed censorship). It also includes a large-scale dataset of Baidu Baike articles which were used to fine-tune CensorshipDetector and a dataset of Chinese-language news articles used to evaluate CensorshipDetector

Security/Privacy Issues and Ethical Concerns (All badges)

None

Environment

In the following, describe how to access our artifact and all related and necessary data and software components. Afterward, describe how to set up everything and how to verify that everything is set up correctly.

Accessibility (All badges)

CensorshipDetector and associated datasets are available on HuggingFace.