Instructions to use allenai/dolma2-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/dolma2-tokenizer with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("allenai/dolma2-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -4,5 +4,5 @@ tags: []
|
|
| 4 |
---
|
| 5 |
|
| 6 |
Slightly modified version of `cl100k_base` that supports Dolma 1.x special tokens
|
| 7 |
-
(`|||
|
| 8 |
extra tokens to fill gaps in tiktoken `cl100k_base` version.
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
Slightly modified version of `cl100k_base` that supports Dolma 1.x special tokens
|
| 7 |
+
(`|||PHONE_NUMBER|||`, `|||EMAIL_ADDRESS|||`, `|||IP_ADDRESS|||`) as well as adds
|
| 8 |
extra tokens to fill gaps in tiktoken `cl100k_base` version.
|