softcatala/Europarl-catalan
Updated • 49 • 1
How to use softcatala/fullstop-catalan-punctuation-prediction with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="softcatala/fullstop-catalan-punctuation-prediction") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("softcatala/fullstop-catalan-punctuation-prediction")
model = AutoModelForTokenClassification.from_pretrained("softcatala/fullstop-catalan-punctuation-prediction")This model predicts the punctuation of Catalan language.
The model restores the following punctuation markers: "." "," "?" "-" ":"
Based on the work https://github.com/oliverguhr/fullstop-deep-punctuation-prediction
The performance differs for the single punctuation markers as hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop. The model achieves the following F1 scores for Catalan language:
| Label | CA |
|---|---|
| 0 | 0.99 |
| . | 0.93 |
| , | 0.82 |
| ? | 0.76 |
| - | 0.89 |
| : | 0.64 |
| macro average | 0.84 |
Jordi Mas jmas@softcatala.org