kenzykhaled
/

arabic-answer-scoring

Text Classification

Model card Files Files and versions

arabic-answer-scoring / README.md

kenzykhaled's picture

Update README.md

dd6f7d1 verified 8 months ago

|

history blame contribute delete

2.91 kB

	---
	language: ar
	license: apache-2.0
	tags:
	- arabic
	- regression
	- arabertv02
	- scoring
	- education
	datasets:
	- AraScore
	metrics:
	- mse
	- rmse
	- mae
	- r2
	pipeline_tag: text-classification
	library_name: transformers
	---

	# Arabic Text Scoring Regression Model

	## Model Description

	This model is fine-tuned from [AraELECTRA](https://huggingface.co/aubmindlab/bert-base-arabertv02) for the task of
	scoring Arabic text answers. It predicts a continuous score for a given Arabic text response.

	## Training Data

	The model was trained on the AraScore dataset, which contains Arabic text answers with corresponding scores.

	## Metrics

	The model achieves the following performance metrics:
	- MSE (Mean Squared Error)
	- RMSE (Root Mean Squared Error)
	- MAE (Mean Absolute Error)
	- R² (R-squared)

	## Usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch
	import re

	# Load model and tokenizer
	model_name = "kenzykhaled/arabic-answer-scoring"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Function to preprocess Arabic text
	def preprocess_arabic_text(text):
	if not isinstance(text, str):
	return ""

	# Remove diacritics (تشكيل)
	text = re.sub(r'[ً-ٰٟ]', '', text)

	# Normalize Arabic letters
	text = re.sub('[إأآا]', 'ا', text) # Normalize Alif forms
	text = re.sub('ى', 'ي', text) # Normalize Yaa
	text = re.sub('ة', 'ه', text) # Normalize Taa Marbouta

	# Remove non-Arabic characters except spaces
	text = re.sub(r'[^؀-ۿ\s]', '', text)

	# Remove extra spaces
	text = re.sub(r'\s+', ' ', text).strip()

	return text

	# Define prediction function
	def predict_score(text):
	# Preprocess and tokenize
	processed_text = preprocess_arabic_text(text)
	inputs = tokenizer(processed_text, return_tensors="pt", padding=True, truncation=True, max_length=256)

	# Move to appropriate device (GPU if available)
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	inputs = {k: v.to(device) for k, v in inputs.items()}

	# Predict
	model.eval()
	with torch.no_grad():
	outputs = model(**inputs)
	score = outputs.logits.item()

	return score

	# Example usage
	sample_text = "هذه إجابة نموذجية باللغة العربية."
	score = predict_score(sample_text)
	print(f"Predicted score: ")
	```

	## Limitations

	- The model is optimized for educational answer scoring and may not perform well on other types of text.
	- The model works best with text similar to that in the training data.

	## Citation

	If you use this model, please cite:
	```
	@misc{arabic-scoring-model,
	author = {Your Name},
	title = {Arabic Text Answer Scoring Model},
	year = {2025},
	publisher = {Hugging Face}
	}
	```