|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
language: |
|
|
- zh |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-1.5B-Instruct |
|
|
pipeline_tag: text-classification |
|
|
datasets: |
|
|
- zjunlp/ChineseHarm-bench |
|
|
--- |
|
|
## 🚀Installation |
|
|
|
|
|
1. Clone the repositories: |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/zjunlp/ChineseHarm-bench |
|
|
cd ChineseHarm-bench |
|
|
git clone https://github.com/hiyouga/LLaMA-Factory |
|
|
``` |
|
|
|
|
|
2. Install dependencies: |
|
|
|
|
|
```bash |
|
|
cd LLaMA-Factory |
|
|
pip install -e ".[torch,metrics]" |
|
|
``` |
|
|
|
|
|
## 📚Inference |
|
|
|
|
|
We release the following variants of our harmful content detection model: |
|
|
|
|
|
🔹 Single Inference (Example) |
|
|
|
|
|
Run single-input inference using the ChineseGuard-1.5B model: |
|
|
|
|
|
``` |
|
|
SCRIPT_PATH="../infer/single_infer.py" |
|
|
model_name="zjunlp/ChineseGuard-1.5B" |
|
|
text="代发短信,有想做的联系我,无押金" |
|
|
|
|
|
python $SCRIPT_PATH \ |
|
|
--model_name $model_name \ |
|
|
--text $text |
|
|
``` |
|
|
|
|
|
🔸 Batch Inference (Multi-NPU or Multi-GPU) |
|
|
|
|
|
To run inference on the entire ChineseHarm-Bench using ChineseGuard-1.5B and 8 NPUs: |
|
|
|
|
|
``` |
|
|
SCRIPT_PATH="../infer/batch_infer.py" |
|
|
model_name="zjunlp/ChineseHarm-1.5B" |
|
|
file_name="../benchmark/bench.json" |
|
|
output_file="../benchmark/bench_ChineseHarm-1.5B.json" |
|
|
|
|
|
python $SCRIPT_PATH \ |
|
|
--model_name $model_name \ |
|
|
--file_name $file_name \ |
|
|
--output_file $output_file \ |
|
|
--num_npus 8 |
|
|
|
|
|
``` |
|
|
|
|
|
> For more configuration options (e.g., batch size, device selection, custom prompt templates), please refer to `single_infer.py` and `batch_infer.py`. |
|
|
> |
|
|
> **Note:** The inference scripts support both NPU and GPU devices. |
|
|
|
|
|
## 🚩Citation |
|
|
|
|
|
Please cite our repository if you use ChineseGuard in your work. Thanks! |
|
|
|
|
|
```bibtex |
|
|
@misc{liu2025chineseharmbenchchineseharmfulcontent, |
|
|
title={ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark}, |
|
|
author={Kangwei Liu and Siyuan Cheng and Bozhong Tian and Xiaozhuan Liang and Yuyang Yin and Meng Han and Ningyu Zhang and Bryan Hooi and Xi Chen and Shumin Deng}, |
|
|
year={2025}, |
|
|
eprint={2506.10960}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2506.10960}, |
|
|
} |
|
|
``` |