Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -2,20 +2,20 @@
|
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
language:
|
| 5 |
-
- en
|
| 6 |
tags:
|
| 7 |
-
- tokenizer
|
| 8 |
-
- bpe
|
| 9 |
-
- byte-level
|
| 10 |
-
- chatml
|
| 11 |
-
- tool-use
|
| 12 |
-
- code
|
| 13 |
-
- python
|
| 14 |
pipeline_tag: text-generation
|
| 15 |
datasets:
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
---
|
| 20 |
|
| 21 |
# Daisy Tokenizer
|
|
@@ -161,8 +161,9 @@ Benchmarked against common tokenizers on Python code, prose, and instruction dat
|
|
| 161 |
|
| 162 |
- **General text**: lehduong/nemotron-cc-hq (~60%)
|
| 163 |
- **Python code**: HuggingFaceTB/smoltalk, self-oss-instruct (~25%)
|
| 164 |
-
- **Instructions**: HuggingFaceTB/
|
| 165 |
|
| 166 |
## License
|
| 167 |
|
| 168 |
-
Apache 2.0
|
|
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
language:
|
| 5 |
+
- en
|
| 6 |
tags:
|
| 7 |
+
- tokenizer
|
| 8 |
+
- bpe
|
| 9 |
+
- byte-level
|
| 10 |
+
- chatml
|
| 11 |
+
- tool-use
|
| 12 |
+
- code
|
| 13 |
+
- python
|
| 14 |
pipeline_tag: text-generation
|
| 15 |
datasets:
|
| 16 |
+
- nvidia/Nemotron-CC-HQ
|
| 17 |
+
- HuggingFaceTB/smoltalk
|
| 18 |
+
- sahil2801/CodeAlpaca-20k
|
| 19 |
---
|
| 20 |
|
| 21 |
# Daisy Tokenizer
|
|
|
|
| 161 |
|
| 162 |
- **General text**: lehduong/nemotron-cc-hq (~60%)
|
| 163 |
- **Python code**: HuggingFaceTB/smoltalk, self-oss-instruct (~25%)
|
| 164 |
+
- **Instructions**: HuggingFaceTB/OpenHermes-2.5-H4, OpenHermes (~15%)
|
| 165 |
|
| 166 |
## License
|
| 167 |
|
| 168 |
+
Apache 2.0
|
| 169 |
+
|