Instructions to use ai-forever/ruBert-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ai-forever/ruBert-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="ai-forever/ruBert-base")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("ai-forever/ruBert-base") model = AutoModelForMaskedLM.from_pretrained("ai-forever/ruBert-base") - Inference
- Notebooks
- Google Colab
- Kaggle
Tokenizer doesn't distinguish dash and hyphen
#10
by nshmyrevgmail - opened
он шутит - сказал человек - амфибия.
['[CLS]', '-', 'он', 'шутит', '-', 'сказал', 'человек', '-', 'амфи', '##бия', '.', '[SEP]']
он шутит - сказал человек-амфибия.
['[CLS]', '-', 'он', 'шутит', '-', 'сказал', 'человек', '-', 'амфи', '##бия', '.', '[SEP]']
While it is a common issue, it is a bigger problem for Russian where hyphen is much more actively used than in English