arthu1
/

north-tokenizer

arthu1 commited on 28 days ago

Commit

d151ff5

verified ·

1 Parent(s): c0efa62

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md ADDED Viewed

+---
+license: apache-2.0
+tags:
+- sentencepiece
+- tokenizer
+---
+# North Tokenizer
+Shared SentencePiece tokenizer for the North Star model family:
+- [North Air 1](https://huggingface.co/arthu1/north-air-1) — 124M, fast
+- [North Star 1](https://huggingface.co/arthu1/north-star-1) — 198M, balanced
+- [Wind Arc 1.5](https://huggingface.co/arthu1/wind-arc-1.5) — 198M, deep reasoning
+**Vocabulary size:** 32,000
+**Format:** SentencePiece (`.model` file)
+```python
+import sentencepiece as spm
+sp = spm.SentencePieceProcessor(model_file="tokenizer.model")
+ids = sp.encode("Hello world", out_type=int)
+```