theformatisvalid
/

LentaBPE-v15k-f2

Model card Files Files and versions

[LentaBPE-v15k-f2]

🗃️ Corpus

78k+ words from lenta.ru (2025)

⚙️ Parameters

Algorithm: BPE
Vocabulary size: 15,000
Min frequency: 2

📊 Metrics

OOV rate: 22.21%
Reconstruction accuracy: 90%
Compression ratio: 6.231

💻 Use case

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("theformatisvalid/LentaBPE-v15k-f2")
tokens = tokenizer.tokenize("Привет, как дела?")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support