Instructions to use markusiko/rubert-base-punctuation with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use markusiko/rubert-base-punctuation with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="markusiko/rubert-base-punctuation")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("markusiko/rubert-base-punctuation") model = AutoModelForTokenClassification.from_pretrained("markusiko/rubert-base-punctuation") - Notebooks
- Google Colab
- Kaggle
ruBert-base for Punctuation Correction
The model is built upon the foundation of ruBert-base and has been fine-tuned to correctly place punctuation marks in Russian sentences (it predicts the mark after each word).
Some additional info about the model:
Fine-Tuning Source: The model has undergone fine-tuning using a diverse dataset comprising over 20,000 paragraphs from Russian literary works.
Supported Classes: The model is designed to predict classes following specific punctuation marks: ? ! . , : ... and space (as class O).
Input Format: To achieve optimal results, input text should be provided without punctuation marks. The model does not process changes in letter case.
Usage Guidelines
To use the model effectively, follow these guidelines:
Input Text: Feed the model with text excluding punctuation marks.
Letter Case: The model does not recognize changes in letter case.
Authors
- Mark Stolyarov
- Downloads last month
- 147