--- license: apache-2.0 tags: - tokenizers - BPE - sentencepiece - code-generation --- # cofos_tokenizer Specialized SentencePiece BPE tokenizer for the **cofos** programming and logic language model. ## Configuration - Vocabulary size: **16384** - Model type: BPE - Byte fallback: enabled - Digit splitting: enabled (digits 0-9 are guaranteed atomic) - Whitespace normalization: disabled (`identity` rule) — indentation preserved ## Special atomic tokens Keywords (`def`, `class`, `fn`, `struct`, `impl`, `return`, `async`, …), operators (`==`, `!=`, `=>`, `->`, `::`, `///`, …) and structural tags (``, ``, ``, …) are all guaranteed single tokens. ## Usage ```python import sentencepiece as spm sp = spm.SentencePieceProcessor() sp.Load("cofos_tokenizer.model") print(sp.EncodeAsPieces("def hello():\n return 42")) ```