gsaltintas commited on
Commit
088cbaa
·
verified ·
1 Parent(s): 42960a4

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -25,14 +25,13 @@ A **Byte-Level BPE** tokenizer trained on **dan_Latn** data from Fineweb-2-HQ.
25
  | Normalizer | NFC |
26
  | Special Tokens | `<s>`, `</s>`, `<pad>`, `<unk>` |
27
  | Training Shards | 2 |
28
- | Data Source | `/scratch/gsa/data/flexitok//dan_Latn/` |
29
 
30
  ## Usage
31
 
32
  ```python
33
  from transformers import AutoTokenizer
34
 
35
- tokenizer = AutoTokenizer.from_pretrained("<repo_id>")
36
  tokens = tokenizer.encode("Hello, world!")
37
  ```
38
 
 
25
  | Normalizer | NFC |
26
  | Special Tokens | `<s>`, `</s>`, `<pad>`, `<unk>` |
27
  | Training Shards | 2 |
 
28
 
29
  ## Usage
30
 
31
  ```python
32
  from transformers import AutoTokenizer
33
 
34
+ tokenizer = AutoTokenizer.from_pretrained("flexitok/-bpe_dan_Latn_128000")
35
  tokens = tokenizer.encode("Hello, world!")
36
  ```
37