gsaltintas commited on
Commit
951f117
·
verified ·
1 Parent(s): 9c457a5

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -25,14 +25,13 @@ A **UnigramLM** tokenizer trained on **arb_Arab** data from Fineweb-2-HQ.
25
  | Normalizer | NFC |
26
  | Special Tokens | `<s>`, `</s>`, `<pad>`, `<unk>` |
27
  | Training Shards | 2 |
28
- | Data Source | `/scratch/gsa/data/flexitok//arb_Arab/` |
29
 
30
  ## Usage
31
 
32
  ```python
33
  from transformers import AutoTokenizer
34
 
35
- tokenizer = AutoTokenizer.from_pretrained("<repo_id>")
36
  tokens = tokenizer.encode("Hello, world!")
37
  ```
38
 
 
25
  | Normalizer | NFC |
26
  | Special Tokens | `<s>`, `</s>`, `<pad>`, `<unk>` |
27
  | Training Shards | 2 |
 
28
 
29
  ## Usage
30
 
31
  ```python
32
  from transformers import AutoTokenizer
33
 
34
+ tokenizer = AutoTokenizer.from_pretrained("flexitok/-unigram_arb_Arab_32000")
35
  tokens = tokenizer.encode("Hello, world!")
36
  ```
37