--- license: apache-2.0 tags: - sentencepiece - tokenizer --- # North Tokenizer Shared SentencePiece tokenizer for the North Star model family: - [North Air 1](https://huggingface.co/arthu1/north-air-1) — 124M, fast - [North Star 1](https://huggingface.co/arthu1/north-star-1) — 198M, balanced - [Wind Arc 1.5](https://huggingface.co/arthu1/wind-arc-1.5) — 198M, deep reasoning **Vocabulary size:** 32,000 **Format:** SentencePiece (`.model` file) ```python import sentencepiece as spm sp = spm.SentencePieceProcessor(model_file="tokenizer.model") ids = sp.encode("Hello world", out_type=int) ```