jonasaise commited on
Commit
e582746
·
verified ·
1 Parent(s): a5e1957

Update languages and fix

Browse files
Files changed (1) hide show
  1. README.md +43 -11
README.md CHANGED
@@ -1,16 +1,48 @@
1
  ---
2
  license: apache-2.0
3
  language:
4
- - en
5
- - sv
6
- - fr
7
- - de
8
- - fi
9
- - es
10
- - it
11
- - nl
12
- - pl
13
- # Add other key languages here
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  # OpenEuroLLM Tokenizer v2 (oellm-262k-v2)
@@ -64,10 +96,10 @@ print(f"Encoded token IDs: {encoded_ids}")
64
 
65
  decoded_text = tokenizer.decode(encoded_ids)
66
  print(f"Decoded text: {decoded_text}")
67
- ```
68
 
69
  # The tokenizer automatically adds a BOS token
70
  # >>> Decoded text: <s> Hej, detta är ett test av den nya OpenEuroLLM-tokeniseraren.
 
71
 
72
  ## Intended Use and Limitations
73
  This tokenizer is intended to be used for pre-training and fine-tuning large language models
 
1
  ---
2
  license: apache-2.0
3
  language:
4
+ - als
5
+ - bos
6
+ - bul
7
+ - cat
8
+ - ces
9
+ - dan
10
+ - deu
11
+ - ekk
12
+ - ell
13
+ - eng
14
+ - est
15
+ - eus
16
+ - fin
17
+ - fra
18
+ - gle
19
+ - glg
20
+ - hrv
21
+ - hun
22
+ - isl
23
+ - ita
24
+ - kat
25
+ - lav
26
+ - lit
27
+ - ltg
28
+ - lvs
29
+ - mkd
30
+ - mlt
31
+ - nld
32
+ - nno
33
+ - nob
34
+ - nor
35
+ - pol
36
+ - por
37
+ - ron
38
+ - slk
39
+ - slv
40
+ - spa
41
+ - sqi
42
+ - srp
43
+ - swe
44
+ - tur
45
+ - ukr
46
  ---
47
 
48
  # OpenEuroLLM Tokenizer v2 (oellm-262k-v2)
 
96
 
97
  decoded_text = tokenizer.decode(encoded_ids)
98
  print(f"Decoded text: {decoded_text}")
 
99
 
100
  # The tokenizer automatically adds a BOS token
101
  # >>> Decoded text: <s> Hej, detta är ett test av den nya OpenEuroLLM-tokeniseraren.
102
+ ```
103
 
104
  ## Intended Use and Limitations
105
  This tokenizer is intended to be used for pre-training and fine-tuning large language models