Update README.md
Browse files
README.md
CHANGED
|
@@ -13,4 +13,6 @@ library_name: transformers
|
|
| 13 |
This tokenizer is part of the experiments in the published paper at the BabyLM workshop in CoNLL 2023.
|
| 14 |
The paper titled "Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building" (https://aclanthology.org/2023.conll-babylm.29/)
|
| 15 |
|
| 16 |
-
<strong>omarmomen/babylm_tokenizer_32k</strong> is a RobertaTokenizer that is pretrained on the BabyLM 10M dataset (cased) with 32K tokens.
|
|
|
|
|
|
|
|
|
| 13 |
This tokenizer is part of the experiments in the published paper at the BabyLM workshop in CoNLL 2023.
|
| 14 |
The paper titled "Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building" (https://aclanthology.org/2023.conll-babylm.29/)
|
| 15 |
|
| 16 |
+
<strong>omarmomen/babylm_tokenizer_32k</strong> is a RobertaTokenizer that is pretrained on the BabyLM 10M dataset (cased) with 32K tokens.
|
| 17 |
+
|
| 18 |
+
https://arxiv.org/abs/2310.20589
|