# Khmer‑English GPT‑2 Tokenizer * **Vocab size:** 50,257 * **Algorithm:** Byte‑Level BPE (byte_fallback) * **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|> * **Trained on:** `metythorn/khmer‑english‑corpus`