sinlib / config.json
Ransaka's picture
Added tokenizer with large sinhala corpus
ddf49e2 verified
raw
history blame
215 Bytes
{
"unknown_token": "<|unk|>",
"pad_token": "<|pad|>",
"unknown_token_id": 576,
"pad_token_id": 577,
"max_length": 10,
"end_of_text_token": "<|end_of_text|>",
"end_of_text_token_id": 578
}