--- library_name: transformers tags: - readability license: mit base_model: - aubmindlab/bert-base-arabertv2 pipeline_tag: text-classification --- # AraBERTv2+D3Tok+Reg Readability Model ## Model description **AraBERTv2+D3Tok+Reg** is a readability assessment model that was built by fine-tuning the **AraBERTv2** model with Mean Squared Error loss (**Reg**). For the fine-tuning, we used the **D3Tok** input variant from [BAREC-Corpus-v1.0](https://huggingface.co/datasets/CAMeL-Lab/BAREC-Corpus-v1.0). Our fine-tuning procedure and the hyperparameters we used can be found in our paper *"[A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment](https://arxiv.org/abs/2502.13520)."* ## Intended uses You can use the AraBERTv2+D3Tok+Reg model as part of the transformers pipeline. You need to preprocess your text into the D3Tok input variant using the preprocessing step [here](https://github.com/CAMeL-Lab/barec_analyzer/tree/main). ## How to use To use the model: ```python from transformers import pipeline readability = pipeline("text-classification", model="CAMeL-Lab/readability-arabertv2-d3tok-reg") with open("/PATH/TO/preprocessed_d3tok", "r") as f: sentences = f.read().split("\n") results = readability(sentences, function_to_apply="none") readability_levels = [max(round(result['score']+0.5),1) for result in results] ``` ## Citation ```bibtex @inproceedings{elmadani-etal-2025-readability, title = "A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment", author = "Elmadani, Khalid N. and Habash, Nizar and Taha-Thomure, Hanada", booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", year = "2025", address = "Vienna, Austria", publisher = "Association for Computational Linguistics" } ```