[LentaBPE-v15k-f2]
ποΈ Corpus
78k+ words from lenta.ru (2025)
βοΈ Parameters
- Algorithm: BPE
- Vocabulary size: 15,000
- Min frequency: 2
π Metrics
- OOV rate: 22.21%
- Reconstruction accuracy: 90%
- Compression ratio: 6.231
π» Use case
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("theformatisvalid/LentaBPE-v15k-f2")
tokens = tokenizer.tokenize("ΠΡΠΈΠ²Π΅Ρ, ΠΊΠ°ΠΊ Π΄Π΅Π»Π°?")
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support