ckiplab
/

albert-tiny-chinese

@@ -12,24 +12,81 @@ license: gpl-3.0
 # CKIP ALBERT Tiny Chinese
 This project provides traditional Chinese transformers models (including ALBERT, BERT, GPT2) and NLP tools (including word segmentation, part-of-speech tagging, named entity recognition).
 這個專案提供了繁體中文的 transformers 模型（包含 ALBERT、BERT、GPT2）及自然語言處理工具（包含斷詞、詞性標記、實體辨識）。
-## Homepage
-- https://github.com/ckiplab/ckip-transformers
-## Contributers
-- [Mu Yang](https://muyang.pro) at [CKIP](https://ckip.iis.sinica.edu.tw) (Author & Maintainer)
-## Usage
-Please use BertTokenizerFast as tokenizer instead of AutoTokenizer.
 請使用 BertTokenizerFast 而非 AutoTokenizer。
 ```
 from transformers import (
   BertTokenizerFast,
@@ -40,6 +97,4 @@ tokenizer = BertTokenizerFast.from_pretrained('bert-base-chinese')
 model = AutoModel.from_pretrained('ckiplab/albert-tiny-chinese')
 ```
-For full usage and more information, please refer to https://github.com/ckiplab/ckip-transformers.
-有關完整使用方法及其他資訊，請參見 https://github.com/ckiplab/ckip-transformers 。

 # CKIP ALBERT Tiny Chinese
+## Table of Contents
+- [Model Details](#model-details)
+- [Uses](#uses)
+- [Risks, Limitations and Biases](#risks-limitations-and-biases)
+- [Training](#training)
+- [Evaluation](#evaluation)
+- [How to Get Started With the Model](#how-to-get-started-with-the-model)
+## Model Details
+- **Model Description:**
 This project provides traditional Chinese transformers models (including ALBERT, BERT, GPT2) and NLP tools (including word segmentation, part-of-speech tagging, named entity recognition).
 這個專案提供了繁體中文的 transformers 模型（包含 ALBERT、BERT、GPT2）及自然語言處理工具（包含斷詞、詞性標記、實體辨識）。
+- **Developed by:** [Mu Yang](https://muyang.pro) at [CKIP](https://ckip.iis.sinica.edu.tw)
+- **Model Type:** Fill-Mask
+- **Language(s):** Chinese
+- **License:** gpl-3.0
+- **Parent Model:** See the [ALBERT base model](https://huggingface.co/albert-base-v2) for more information about the ALBERT base model.
+- **Resources for more information:**
+  - [GitHub Repo](https://github.com/ckiplab/ckip-transformers)
+  - [CKIP Documentation](https://ckip-transformers.readthedocs.io/en/stable/)
+## Uses
+#### Direct Use
+The model author suggests using BertTokenizerFast as tokenizer instead of AutoTokenizer.
 請使用 BertTokenizerFast 而非 AutoTokenizer。
+For full usage and more information, please refer to [github repository]  (https://github.com/ckiplab/ckip-transformers.)
+有關完整使用方法及其他資訊，請參見  [github repository]  (https://github.com/ckiplab/ckip-transformers.)
+## Risks, Limitations and Biases
+**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
+Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
+## Training
+#### Training Data
+The language models are trained on the ZhWiki and CNA datasets; the WS and POS tasks are trained on the ASBC dataset; the NER tasks are trained on the OntoNotes dataset.
+以上的語言模型訓練於 ZhWiki 與 CNA 資料集上；斷詞（WS）與詞性標記（POS）任務模型訓練於 ASBC 資料集上；實體辨識（NER）任務模型訓練於 OntoNotes 資料集上。
+#### Training Procedure
+* **Parameters:** 4M
+## Evaluation
+#### Results
+* **Perplexity:** 4.40
+* **WOS (Word Segmentation) [F1]:** 96.66%
+* **POS (Part-of-speech) [ACC]:** 94.48%
+* **NER (Named-entity recognition) [F1]:** 71.17%
+## How to Get Started With the Model
 ```
 from transformers import (
   BertTokenizerFast,
 model = AutoModel.from_pretrained('ckiplab/albert-tiny-chinese')
 ```