Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,9 @@ tags:
|
|
| 7 |
- less-is-better
|
| 8 |
- supra-word
|
| 9 |
- cognitively-inspired
|
| 10 |
-
license:
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
# LiB Tokenizer
|
|
@@ -99,4 +101,4 @@ the original work:
|
|
| 99 |
## Links
|
| 100 |
|
| 101 |
- [tokenizers fork (Rust implementation)](https://github.com/antalvdb/tokenizers/tree/lib-model)
|
| 102 |
-
- [LiB repository (training scripts)](https://github.com/antalvdb/LiB/tree/feature/hf-compatible-tokenizer)
|
|
|
|
| 7 |
- less-is-better
|
| 8 |
- supra-word
|
| 9 |
- cognitively-inspired
|
| 10 |
+
license: gpl-3.0
|
| 11 |
+
datasets:
|
| 12 |
+
- MLZoo/edu-fineweb-10B
|
| 13 |
---
|
| 14 |
|
| 15 |
# LiB Tokenizer
|
|
|
|
| 101 |
## Links
|
| 102 |
|
| 103 |
- [tokenizers fork (Rust implementation)](https://github.com/antalvdb/tokenizers/tree/lib-model)
|
| 104 |
+
- [LiB repository (training scripts)](https://github.com/antalvdb/LiB/tree/feature/hf-compatible-tokenizer)
|