twnlp nielsr HF Staff commited on
Commit
e6d757f
·
1 Parent(s): 36ce215

Improve model card: add metadata, paper link and library tags (#1)

Browse files

- Improve model card: add metadata, paper link and library tags (ea6dee36066e6108c4872950acd3b945a6eac919)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +56 -28
README.md CHANGED
@@ -1,38 +1,45 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
4
- 一个面向中文文本纠错任务的综合平台,集学术研究、模型训练、模型评测和推理部署于一体,覆盖拼写纠错与语法纠错两个核心方向。
5
 
6
- 🔥 项目地址:https://github.com/TW-NLP/ChineseErrorCorrector
7
- ## 模型列表
8
 
9
- | 模型名称 | 纠错类型 | 描述 |
10
- |:--------------------------------------------------------------------------------------------|:------|:-------------------------------------------|
11
- | [twnlp/ChineseErrorCorrector3-4B](https://huggingface.co/twnlp/ChineseErrorCorrector3-4B) | 语法+拼写 | 使用200万纠错数据进行全量训练,适用于语法纠错和拼写纠错,效果最好,推荐使用。 |
12
- | [twnlp/ChineseErrorCorrector2-7B](https://huggingface.co/twnlp/ChineseErrorCorrector2-7B) | 语法+拼写 | 使用200万纠错数据进行多轮迭代训练,适用于语法纠错和拼写纠错,效果较好。 |
13
- ## 模型评测(NaCGEC Data)
14
- | Model Name | Model Link | Base Model | Avg | SIGHAN-2015 | EC-LAW | MCSC | GPU | QPS |
15
- |:------------------|:------------------------------------------------------------------------------------------------------------------------|:-------------------------------|:-----------|:------------|:-------|:-------|:--------|:--------|
16
- | Kenlm-CSC | [shibing624/chinese-kenlm-klm](https://huggingface.co/shibing624/chinese-kenlm-klm) | kenlm | 0.3409 | 0.3147 | 0.3763 | 0.3317 | CPU | 9 |
17
- | Mengzi-T5-CSC | [shibing624/mengzi-t5-base-chinese-correction](https://huggingface.co/shibing624/mengzi-t5-base-chinese-correction) | mengzi-t5-base | 0.3984 | 0.7758 | 0.3156 | 0.1039 | GPU | 214 |
18
- | ERNIE-CSC | [PaddleNLP/ernie-csc](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/examples/text_correction/ernie-csc) | PaddlePaddle/ernie-1.0-base-zh | 0.4353 | 0.8383 | 0.3357 | 0.1318 | GPU | 114 |
19
- | MacBERT-CSC | [shibing624/macbert4csc-base-chinese](https://huggingface.co/shibing624/macbert4csc-base-chinese) | hfl/chinese-macbert-base | 0.3993 | 0.8314 | 0.1610 | 0.2055 | GPU | **224** |
20
- | ChatGLM3-6B-CSC | [shibing624/chatglm3-6b-csc-chinese-lora](https://huggingface.co/shibing624/chatglm3-6b-csc-chinese-lora) | THUDM/chatglm3-6b | 0.4538 | 0.6572 | 0.4369 | 0.2672 | GPU | 3 |
21
- | Qwen2.5-1.5B-CTC | [shibing624/chinese-text-correction-1.5b](https://huggingface.co/shibing624/chinese-text-correction-1.5b) | Qwen/Qwen2.5-1.5B-Instruct | 0.6802 | 0.3032 | 0.7846 | 0.9529 | GPU | 6 |
22
- | Qwen2.5-7B-CTC | [shibing624/chinese-text-correction-7b](https://huggingface.co/shibing624/chinese-text-correction-7b) | Qwen/Qwen2.5-7B-Instruct | 0.8225 | 0.4917 | 0.9798 | 0.9959 | GPU | 3 |
23
- | **Qwen3-4B-CTC(Our)** | [twnlp/ChineseErrorCorrector3-4B](https://huggingface.co/twnlp/ChineseErrorCorrector3-4B) | Qwen/Qwen3-4B | **0.8521** | 0.6340 | 0.9360 | 0.9864 | GPU | 5 |
24
 
 
25
 
 
26
 
 
 
27
 
28
- Without [ChineseErrorCorrector](https://github.com/TW-NLP/ChineseErrorCorrector), you can use the model like this:
29
 
30
- First, you pass your input through the transformer model, then you get the generated sentence.
31
 
32
- Install package:
33
- ```
34
- pip install transformers
35
- ```
 
 
 
 
 
 
 
 
 
 
36
 
37
  ```python
38
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -69,10 +76,31 @@ generated_ids = [
69
 
70
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
71
  print(response)
72
-
73
  ```
74
 
75
- output:
76
- ```shell
77
- 对待每一项工作都要一丝不苟。
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ```
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ language:
6
+ - zh
7
+ base_model: Qwen/Qwen3-4B
8
+ tags:
9
+ - text-correction
10
+ - cgec
11
+ - csc
12
  ---
 
13
 
14
+ # ChineseErrorCorrector3-4B
 
15
 
16
+ [**🇨🇳中文**](https://github.com/TW-NLP/ChineseErrorCorrector/blob/main/README.md) | [**English**](https://github.com/TW-NLP/ChineseErrorCorrector/blob/main/README_EN.md)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ ChineseErrorCorrector3-4B is part of a comprehensive platform for Chinese text correction, integrating academic research, model training, evaluation, and inference. It covers two core directions: Spelling Correction (CSC) and Grammatical Error Correction (CGEC).
19
 
20
+ The methodology behind this line of models is presented in the paper [CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards](https://huggingface.co/papers/2606.00020).
21
 
22
+ - **Project Page:** [ChineseErrorCorrector GitHub](https://github.com/TW-NLP/ChineseErrorCorrector)
23
+ - **Paper:** [arXiv:2606.00020](https://arxiv.org/abs/2606.00020)
24
 
25
+ ## 模型描述
26
 
27
+ 一个面向中文文本纠错任务的综合平台,集学术研究、模型训练、模型评测和推理部署于一体,覆盖拼写纠错与语法纠错两个核心方向。
28
 
29
+ - **twnlp/ChineseErrorCorrector3-4B**: 使用200万纠错数据进行全量��练,适用于语法纠错和拼写纠错,效果最好,推荐使用。
30
+
31
+ ## 模型评测(NaCGEC Data)
32
+
33
+ | Model Name | Base Model | Avg | SIGHAN-2015 | EC-LAW | MCSC | GPU | QPS |
34
+ |:---|:---|:---|:---|:---|:---|:---|:---|
35
+ | ChatGLM3-6B-CSC | THUDM/chatglm3-6b | 0.4538 | 0.6572 | 0.4369 | 0.2672 | GPU | 3 |
36
+ | Qwen2.5-1.5B-CTC | Qwen/Qwen2.5-1.5B-Instruct | 0.6802 | 0.3032 | 0.7846 | 0.9529 | GPU | 6 |
37
+ | Qwen2.5-7B-CTC | Qwen/Qwen2.5-7B-Instruct | 0.8225 | 0.4917 | 0.9798 | 0.9959 | GPU | 3 |
38
+ | **Qwen3-4B-CTC(Our)** | Qwen/Qwen3-4B | **0.8521** | 0.6340 | 0.9360 | 0.9864 | GPU | 5 |
39
+
40
+ ## Sample Usage
41
+
42
+ You can use the model with the `transformers` library as follows:
43
 
44
  ```python
45
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
76
 
77
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
78
  print(response)
79
+ # Output: 对待每一项工作都要一丝不苟。
80
  ```
81
 
82
+ ## Citation
83
+
84
+ If you find this work helpful, please cite:
85
+
86
+ ```bibtex
87
+ @misc{tian2026csrpchainofthoughtreasoningchinese,
88
+ title={CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards},
89
+ author={Wei Tian and Yuhao Zhou and Man Lan},
90
+ year={2026},
91
+ eprint={2606.00020},
92
+ archivePrefix={arXiv},
93
+ primaryClass={cs.CL},
94
+ url={https://arxiv.org/abs/2606.00020},
95
+ }
96
+
97
+ @misc{tian2025chineseerrorcorrector34bstateoftheartchinesespelling,
98
+ title={ChineseErrorCorrector3-4B: State-of-the-Art Chinese Spelling and Grammar Corrector},
99
+ author={Wei Tian and YuhaoZhou},
100
+ year={2025},
101
+ eprint={2511.17562},
102
+ archivePrefix={arXiv},
103
+ primaryClass={cs.CL},
104
+ url={https://arxiv.org/abs/2511.17562},
105
+ }
106
  ```