ckip-joint
/

bloom-1b1-zh

@@ -2,18 +2,17 @@
 license: bigscience-bloom-rail-1.0
 language:
 - zh
-- en
 pipeline_tag: text-generation
 widget:
- - text: "四月的某一天，天氣晴朗寒冷，"
- - text: "問：台灣最高的建築物是？答："
 ---
 <h1 style='text-align: center '>BLOOM-zh</h1>
 <h2 style='text-align: center '><em>Traditional Chinese-enhanced BLOOM language model</em> </h2>
 <h3 style='text-align: center '>Model Card</h3>
-Version 1.0 / 20.Feb.2023
 This model is a joint collaboration between CKIP lab at Acedemia Sinica ([link](https://ckip.iis.sinica.edu.tw/)), MediaTek Research ([連結](https://www.mtkresearch.com/), [连结](https://www.mtkresearch.com/zh-hans/), [link](https://www.mtkresearch.com/en/)), and National Academy for Educational Research ([link](https://www.naer.edu.tw/)).
@@ -33,10 +32,10 @@ BLOOM-zh is trained extendedly on large amount of Traditional Chinese text data.
 * **Developed by:** MediaTek Research
 * **Model Type:** Transformer-based Language Model
-* **Version:** 1.0.0
 * **Languages:** Multiple; see [training data](#training-data)
 * **License:** MEDIATEK RESEARCH License ([link](https://huggingface.co/ckip-joint/bloom-1b1-zh/blob/main/LICENSE_MR.md)) and RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
-* **Release Date Estimate:** Wednesday, 22.February.2023
 * **Send Questions to:** info@mtkresearch.com
 * **Paper:** [https://arxiv.org/abs/2303.04715](https://arxiv.org/abs/2303.04715)
 * **Cite as:** MediaTek Research: Traditional Chinese-enhanced BLOOM language model. International, February 2023.
@@ -65,7 +64,7 @@ For the uses of the model, please refer to [BLOOM](https://huggingface.co/bigsci
 ## Training Data
 *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
-We trained the 1B1 parameter model on a total of 6 Billion tokens of mostly high quality Traditional Chinese text. Details are provided in the [paper](https://arxiv.org/abs/2303.04715).
 ## Risks and Limitations
 *This section identifies foreseeable harms and misunderstandings.*
@@ -75,9 +74,9 @@ For risks and limitations, please refer to [BLOOM](https://huggingface.co/bigsci
 ### Factors
 *This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
-- The model is trained on Traditional Chinese and English. However, the pretrained weights capture more than 40 different languages.
-- The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions
 ## Recommendations
@@ -90,5 +89,5 @@ For recommendations, please refer to [BLOOM](https://huggingface.co/bigscience/b
 ## Model Card Authors
 *Ordered roughly chronologically and by amount of time spent.*
-Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yen-Chen Wu, Yin-Hsiang Liao, Chin-Tung Lin, Da-Shan Shiu, Wei-Yun Ma
-<!-- # Bloom_eval -->

 license: bigscience-bloom-rail-1.0
 language:
 - zh
 pipeline_tag: text-generation
 widget:
+- text: 四月的某一天，天氣晴朗寒冷，
+- text: 問：台灣最高的建築物是？答：
 ---
 <h1 style='text-align: center '>BLOOM-zh</h1>
 <h2 style='text-align: center '><em>Traditional Chinese-enhanced BLOOM language model</em> </h2>
 <h3 style='text-align: center '>Model Card</h3>
+Version 2.0 / 10.April.2023
 This model is a joint collaboration between CKIP lab at Acedemia Sinica ([link](https://ckip.iis.sinica.edu.tw/)), MediaTek Research ([連結](https://www.mtkresearch.com/), [连结](https://www.mtkresearch.com/zh-hans/), [link](https://www.mtkresearch.com/en/)), and National Academy for Educational Research ([link](https://www.naer.edu.tw/)).
 * **Developed by:** MediaTek Research
 * **Model Type:** Transformer-based Language Model
+* **Version:** 2.0.0
 * **Languages:** Multiple; see [training data](#training-data)
 * **License:** MEDIATEK RESEARCH License ([link](https://huggingface.co/ckip-joint/bloom-1b1-zh/blob/main/LICENSE_MR.md)) and RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
+* **Release Date Estimate:** Monday, 10.April.2023
 * **Send Questions to:** info@mtkresearch.com
 * **Paper:** [https://arxiv.org/abs/2303.04715](https://arxiv.org/abs/2303.04715)
 * **Cite as:** MediaTek Research: Traditional Chinese-enhanced BLOOM language model. International, February 2023.
 ## Training Data
 *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
+We trained the 1B1 parameter model on a total of 11.5 Billion tokens of mostly high quality Traditional Chinese text. Details are provided in the [paper](https://arxiv.org/abs/2303.04715).
 ## Risks and Limitations
 *This section identifies foreseeable harms and misunderstandings.*
 ### Factors
 *This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
+- The model is trained on Traditional Chinese. However, the pretrained weights capture more than 40 different languages.
+- The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions.
 ## Recommendations
 ## Model Card Authors
 *Ordered roughly chronologically and by amount of time spent.*
+Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yen-Chen Wu, Yin-Hsiang Liao, Chin-Tung Lin, Chi-Ming Chung, Yi-Chang Chen, Da-Shan Shiu, Wei-Yun Ma
+<!-- # Bloom_eval -->

config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "apply_residual_connection_post_layernorm": false,
   "architectures": [
-    "BloomForCausalLM"
   ],
   "attention_dropout": 0.0,
   "attention_softmax_in_fp32": true,
@@ -27,4 +27,4 @@
   "unk_token_id": 0,
   "use_cache": true,
   "vocab_size": 250880
-}

 {
   "apply_residual_connection_post_layernorm": false,
   "architectures": [
+    "BloomModel"
   ],
   "attention_dropout": 0.0,
   "attention_softmax_in_fp32": true,
   "unk_token_id": 0,
   "use_cache": true,
   "vocab_size": 250880
+}

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:318c0b0e0f726b156e766691890a2ac5fc410f895b380defb53a8ba259c4af59
-size 2130720033

 version https://git-lfs.github.com/spec/v1
+oid sha256:24b882f6f6f1ac9d166797bedc845217245d29885c4e758dd5a3fb9b22e931ef
+size 4261358455