pashto-tokenizer / tokenizer_report.json
ihanif's picture
Upload tokenizer_report.json with huggingface_hub
4a1357b verified
{
"vocab_size": 32000,
"min_frequency": 2,
"training_sentences": 24085371,
"pashto_bpe_fertility": 1.2516,
"xlmr_fertility": 1.5018,
"token_reduction_pct": 16.67,
"n_samples": 10000
}