arthu1
/

north-tokenizer

Model card Files Files and versions

north-tokenizer / README.md

arthu1's picture

Upload README.md with huggingface_hub

d151ff5 verified about 1 month ago

|

history blame contribute delete

607 Bytes

	---
	license: apache-2.0
	tags:
	- sentencepiece
	- tokenizer
	---

	# North Tokenizer

	Shared SentencePiece tokenizer for the North Star model family:
	- [North Air 1](https://huggingface.co/arthu1/north-air-1) — 124M, fast
	- [North Star 1](https://huggingface.co/arthu1/north-star-1) — 198M, balanced
	- [Wind Arc 1.5](https://huggingface.co/arthu1/wind-arc-1.5) — 198M, deep reasoning

	Vocabulary size: 32,000
	Format: SentencePiece (`.model` file)

	```python
	import sentencepiece as spm
	sp = spm.SentencePieceProcessor(model_file="tokenizer.model")
	ids = sp.encode("Hello world", out_type=int)
	```