--- library_name: transformers tags: - readability license: mit base_model: - aubmindlab/bert-base-arabertv2 pipeline_tag: text-classification --- # AraBERTv2+D3Tok+CE Readability Model ## Model description **AraBERTv2+D3Tok+CE** is a readability assessment model that was built by fine-tuning the **AraBERTv2** model with cross-entropy loss (**CE**). For the fine-tuning, we used the **D3Tok** input variant from [BAREC-Corpus-v1.0](https://huggingface.co/datasets/CAMeL-Lab/BAREC-Corpus-v1.0). Our fine-tuning procedure and the hyperparameters we used can be found in our paper *"[A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment](https://arxiv.org/abs/2502.13520)."* ## Intended uses You can use the AraBERTv2+D3Tok+CE model as part of the transformers pipeline. You need to preprocess your text into the D3Tok input variant using the preprocessing step [here](https://github.com/CAMeL-Lab/barec_analyzer/tree/main). ## How to use To use the model: ```python from transformers import pipeline readability = pipeline("text-classification", model="CAMeL-Lab/readability-arabertv2-d3tok-CE") with open("/PATH/TO/preprocessed_d3tok", "r") as f: sentences = f.read().split("\n") readability_levels = [int(readability(sentences)[i]['label'][6:])+1 for i in range(len(sentences))] ``` ## Citation ```bibtex @inproceedings{elmadani-etal-2025-readability, title = "A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment", author = "Elmadani, Khalid N. and Habash, Nizar and Taha-Thomure, Hanada", booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", year = "2025", address = "Vienna, Austria", publisher = "Association for Computational Linguistics" } ```