khopilot
/

khmer-tokenizer-v7

Feature Extraction

graph-regularization

semantic-embeddings

Model card Files Files and versions

khmer-tokenizer-v7 / tokenizer_config.json

khopilot's picture

Upload tokenizer_config.json with huggingface_hub

b785edd verified 5 months ago

history blame contribute delete

410 Bytes

	{
	"model_type": "sentencepiece",
	"tokenizer_class": "PreTrainedTokenizerFast",
	"vocab_size": 8000,
	"model_max_length": 512,
	"bos_token": "<s>",
	"eos_token": "</s>",
	"unk_token": "<unk>",
	"pad_token": "<pad>",
	"sp_model_kwargs": {},
	"add_bos_token": false,
	"add_eos_token": false,
	"clean_up_tokenization_spaces": true,
	"legacy": true,
	"name_or_path": "khopilot/khmer-tokenizer-v7"
	}