[LentaBPE-v15k-f2]

πŸ—ƒοΈ Corpus

78k+ words from lenta.ru (2025)

βš™οΈ Parameters

  • Algorithm: BPE
  • Vocabulary size: 15,000
  • Min frequency: 2

πŸ“Š Metrics

  • OOV rate: 22.21%
  • Reconstruction accuracy: 90%
  • Compression ratio: 6.231

πŸ’» Use case

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("theformatisvalid/LentaBPE-v15k-f2")
tokens = tokenizer.tokenize("ΠŸΡ€ΠΈΠ²Π΅Ρ‚, ΠΊΠ°ΠΊ Π΄Π΅Π»Π°?")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support