Instructions to use lmvasque/readability-es-benchmark-mbert-es-paragraphs-2class with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lmvasque/readability-es-benchmark-mbert-es-paragraphs-2class with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="lmvasque/readability-es-benchmark-mbert-es-paragraphs-2class")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("lmvasque/readability-es-benchmark-mbert-es-paragraphs-2class") model = AutoModelForSequenceClassification.from_pretrained("lmvasque/readability-es-benchmark-mbert-es-paragraphs-2class") - Notebooks
- Google Colab
- Kaggle
Readability benchmark (ES): mbert-es-paragraphs-2class
This project is part of a series of models from the paper "A Benchmark for Neural Readability Assessment of Texts in Spanish". You can find more details about the project in our GitHub.
Models
Our models were fine-tuned in multiple settings, including readability assessment in 2-class (simple/complex) and 3-class (basic/intermediate/advanced) for sentences and paragraph datasets. You can find more details in our paper. These are the available models you can use (current model page in bold):
| Model | Granularity | # classes |
|---|---|---|
| BERTIN (ES) | paragraphs | 2 |
| BERTIN (ES) | paragraphs | 3 |
| mBERT (ES) | paragraphs | 2 |
| mBERT (ES) | paragraphs | 3 |
| mBERT (EN+ES) | paragraphs | 3 |
| BERTIN (ES) | sentences | 2 |
| BERTIN (ES) | sentences | 3 |
| mBERT (ES) | sentences | 2 |
| mBERT (ES) | sentences | 3 |
| mBERT (EN+ES) | sentences | 3 |
For the zero-shot setting, we used the original models BERTIN and mBERT with no further training.
Results
These are our results for all the readability models in different settings. Please select your model based on the desired performance:
| Granularity | Model | F1 Score (2-class) | Precision (2-class) | Recall (2-class) | F1 Score (3-class) | Precision (3-class) | Recall (3-class) |
|---|---|---|---|---|---|---|---|
| Paragraph | Baseline (TF-IDF+LR) | 0.829 | 0.832 | 0.827 | 0.556 | 0.563 | 0.550 |
| Paragraph | BERTIN (Zero) | 0.308 | 0.222 | 0.500 | 0.227 | 0.284 | 0.338 |
| Paragraph | BERTIN (ES) | 0.924 | 0.923 | 0.925 | 0.772 | 0.776 | 0.768 |
| Paragraph | mBERT (Zero) | 0.308 | 0.222 | 0.500 | 0.253 | 0.312 | 0.368 |
| Paragraph | mBERT (EN) | - | - | - | 0.505 | 0.560 | 0.552 |
| Paragraph | mBERT (ES) | 0.933 | 0.932 | 0.936 | 0.776 | 0.777 | 0.778 |
| Paragraph | mBERT (EN+ES) | - | - | - | 0.779 | 0.783 | 0.779 |
| Sentence | Baseline (TF-IDF+LR) | 0.811 | 0.814 | 0.808 | 0.525 | 0.531 | 0.521 |
| Sentence | BERTIN (Zero) | 0.367 | 0.290 | 0.500 | 0.188 | 0.232 | 0.335 |
| Sentence | BERTIN (ES) | 0.900 | 0.900 | 0.900 | 0.699 | 0.701 | 0.698 |
| Sentence | mBERT (Zero) | 0.367 | 0.290 | 0.500 | 0.278 | 0.329 | 0.351 |
| Sentence | mBERT (EN) | - | - | - | 0.521 | 0.565 | 0.539 |
| Sentence | mBERT (ES) | 0.893 | 0.891 | 0.896 | 0.688 | 0.686 | 0.691 |
| Sentence | mBERT (EN+ES) | - | - | - | 0.679 | 0.676 | 0.682 |
Citation
If you use our results and scripts in your research, please cite our work: "A Benchmark for Neural Readability Assessment of Texts in Spanish" (to be published)
@inproceedings{vasquez-rodriguez-etal-2022-benchmarking,
title = "A Benchmark for Neural Readability Assessment of Texts in Spanish",
author = "V{\'a}squez-Rodr{\'\i}guez, Laura and
Cuenca-Jim{\'\e}nez, Pedro-Manuel and
Morales-Esquivel, Sergio Esteban and
Alva-Manchego, Fernando",
booktitle = "Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), EMNLP 2022",
month = dec,
year = "2022",
}
- Downloads last month
- 2