| # Khmer‑English GPT‑2 Tokenizer | |
| * **Vocab size:** 50,257 | |
| * **Algorithm:** Byte‑Level BPE (byte_fallback) | |
| * **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|> | |
| * **Trained on:** `metythorn/khmer‑english‑corpus` | |
| # Khmer‑English GPT‑2 Tokenizer | |
| * **Vocab size:** 50,257 | |
| * **Algorithm:** Byte‑Level BPE (byte_fallback) | |
| * **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|> | |
| * **Trained on:** `metythorn/khmer‑english‑corpus` | |