arthu1 commited on
Commit
d151ff5
·
verified ·
1 Parent(s): c0efa62

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - sentencepiece
5
+ - tokenizer
6
+ ---
7
+
8
+ # North Tokenizer
9
+
10
+ Shared SentencePiece tokenizer for the North Star model family:
11
+ - [North Air 1](https://huggingface.co/arthu1/north-air-1) — 124M, fast
12
+ - [North Star 1](https://huggingface.co/arthu1/north-star-1) — 198M, balanced
13
+ - [Wind Arc 1.5](https://huggingface.co/arthu1/wind-arc-1.5) — 198M, deep reasoning
14
+
15
+ **Vocabulary size:** 32,000
16
+ **Format:** SentencePiece (`.model` file)
17
+
18
+ ```python
19
+ import sentencepiece as spm
20
+ sp = spm.SentencePieceProcessor(model_file="tokenizer.model")
21
+ ids = sp.encode("Hello world", out_type=int)
22
+ ```