timpal0l commited on
Commit
ac1ce77
·
verified ·
1 Parent(s): 74b8574

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -47,7 +47,7 @@ tags:
47
  license: apache-2.0
48
  ---
49
 
50
- # OpenEuroLLM Tokenizer (262k)
51
 
52
  A **262,144-token SentencePiece BPE tokenizer** designed for efficient tokenization across all EU official languages and additional European languages. Trained on 173 GB of curated multilingual text from the OpenEuroLLM data catalogue on LUMI HPC.
53
 
@@ -63,7 +63,7 @@ A **262,144-token SentencePiece BPE tokenizer** designed for efficient tokenizat
63
  ```python
64
  from transformers import AutoTokenizer
65
 
66
- tok = AutoTokenizer.from_pretrained("openeurollm/tokenizer-262k")
67
 
68
  text = "Hello world! Bonjour le monde. Hej världen!"
69
  ids = tok(text)["input_ids"]
 
47
  license: apache-2.0
48
  ---
49
 
50
+ # OpenEuroLLM Tokenizer (256k)
51
 
52
  A **262,144-token SentencePiece BPE tokenizer** designed for efficient tokenization across all EU official languages and additional European languages. Trained on 173 GB of curated multilingual text from the OpenEuroLLM data catalogue on LUMI HPC.
53
 
 
63
  ```python
64
  from transformers import AutoTokenizer
65
 
66
+ tok = AutoTokenizer.from_pretrained("openeurollm/tokenizer-256k")
67
 
68
  text = "Hello world! Bonjour le monde. Hej världen!"
69
  ids = tok(text)["input_ids"]