--- language: ar license: apache-2.0 tags: - arabic - regression - arabertv02 - scoring - education datasets: - AraScore metrics: - mse - rmse - mae - r2 pipeline_tag: text-classification library_name: transformers --- # Arabic Text Scoring Regression Model ## Model Description This model is fine-tuned from [AraELECTRA](https://huggingface.co/aubmindlab/bert-base-arabertv02) for the task of scoring Arabic text answers. It predicts a continuous score for a given Arabic text response. ## Training Data The model was trained on the AraScore dataset, which contains Arabic text answers with corresponding scores. ## Metrics The model achieves the following performance metrics: - MSE (Mean Squared Error) - RMSE (Root Mean Squared Error) - MAE (Mean Absolute Error) - R² (R-squared) ## Usage ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch import re # Load model and tokenizer model_name = "kenzykhaled/arabic-answer-scoring" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Function to preprocess Arabic text def preprocess_arabic_text(text): if not isinstance(text, str): return "" # Remove diacritics (تشكيل) text = re.sub(r'[ً-ٰٟ]', '', text) # Normalize Arabic letters text = re.sub('[إأآا]', 'ا', text) # Normalize Alif forms text = re.sub('ى', 'ي', text) # Normalize Yaa text = re.sub('ة', 'ه', text) # Normalize Taa Marbouta # Remove non-Arabic characters except spaces text = re.sub(r'[^؀-ۿ\s]', '', text) # Remove extra spaces text = re.sub(r'\s+', ' ', text).strip() return text # Define prediction function def predict_score(text): # Preprocess and tokenize processed_text = preprocess_arabic_text(text) inputs = tokenizer(processed_text, return_tensors="pt", padding=True, truncation=True, max_length=256) # Move to appropriate device (GPU if available) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) inputs = {k: v.to(device) for k, v in inputs.items()} # Predict model.eval() with torch.no_grad(): outputs = model(**inputs) score = outputs.logits.item() return score # Example usage sample_text = "هذه إجابة نموذجية باللغة العربية." score = predict_score(sample_text) print(f"Predicted score: ") ``` ## Limitations - The model is optimized for educational answer scoring and may not perform well on other types of text. - The model works best with text similar to that in the training data. ## Citation If you use this model, please cite: ``` @misc{arabic-scoring-model, author = {Your Name}, title = {Arabic Text Answer Scoring Model}, year = {2025}, publisher = {Hugging Face} } ```