| tags: | |
| - tokie | |
| library_name: tokie | |
| <p align="center"> | |
| <img src="tokie-banner.png" alt="tokie" width="600"> | |
| </p> | |
| # voyage-code-2 | |
| Pre-built [tokie](https://github.com/chonkie-inc/tokie) tokenizer for [voyageai/voyage-code-2](https://huggingface.co/voyageai/voyage-code-2). | |
| ## Quick Start (Python) | |
| ```bash | |
| pip install tokie | |
| ``` | |
| ```python | |
| import tokie | |
| tokenizer = tokie.Tokenizer.from_pretrained("tokiers/voyage-code-2") | |
| encoding = tokenizer.encode("Hello, world!") | |
| print(encoding.ids) | |
| print(encoding.attention_mask) | |
| ``` | |
| ## Quick Start (Rust) | |
| ```toml | |
| [dependencies] | |
| tokie = { version = "0.0.4", features = ["hf"] } | |
| ``` | |
| ```rust | |
| use tokie::Tokenizer; | |
| let tokenizer = Tokenizer::from_pretrained("tokiers/voyage-code-2").unwrap(); | |
| let encoding = tokenizer.encode("Hello, world!", true); | |
| println!("{:?}", encoding.ids); | |
| ``` | |
| ## Files | |
| - `tokenizer.tkz` — tokie binary format (~10x smaller, loads in ~5ms) | |
| - `tokenizer.json` — original HuggingFace tokenizer (if available) | |
| ## About tokie | |
| **50x faster tokenization, 10x smaller model files, 100% accurate.** | |
| tokie is a drop-in replacement for HuggingFace tokenizers, built in Rust. See [GitHub](https://github.com/chonkie-inc/tokie) for benchmarks and documentation. | |
| ## License | |
| MIT OR Apache-2.0 (tokie library). Original model files retain their original license from [voyageai/voyage-code-2](https://huggingface.co/voyageai/voyage-code-2). | |