| license: apache-2.0 | |
| tags: | |
| - sentencepiece | |
| - tokenizer | |
| # North Tokenizer | |
| Shared SentencePiece tokenizer for the North Star model family: | |
| - [North Air 1](https://huggingface.co/arthu1/north-air-1) — 124M, fast | |
| - [North Star 1](https://huggingface.co/arthu1/north-star-1) — 198M, balanced | |
| - [Wind Arc 1.5](https://huggingface.co/arthu1/wind-arc-1.5) — 198M, deep reasoning | |
| **Vocabulary size:** 32,000 | |
| **Format:** SentencePiece (`.model` file) | |
| ```python | |
| import sentencepiece as spm | |
| sp = spm.SentencePieceProcessor(model_file="tokenizer.model") | |
| ids = sp.encode("Hello world", out_type=int) | |
| ``` | |