You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

CAMeLBERT-MSA-POS-MSA-Lemma-Clustering

CAMeLBERT-MSA-POS-MSA-Lemma-Clustering is a Modern Standard Arabic (MSA) lemmatization model. It is built by fine-tuning the CAMeLBERT-MSA-POS-MSA model on the Penn Arabic Treebank (PATB) training set. This model approaches lemmatization as a classification task, where each lemma is represented as a unique class within a clustered lemma vocabulary. The fine-tuning procedure, hyperparameters, and detailed methodology are presented in our paper “Lemmatization as a Classification Task: Results from Arabic across Multiple Genres”

This model is integrated into the lemmatization workflow available in our GitHub repository.

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1)

this model