CAMeLBERT-MSA-POS-MSA-Lemma-Clustering
Model Description
CAMeLBERT-MSA-POS-MSA-Lemma-Clustering is a Modern Standard Arabic (MSA) lemmatization model. It is built by fine-tuning the CAMeLBERT-MSA-POS-MSA model on the Penn Arabic Treebank (PATB) training set. This model approaches lemmatization as a classification task, where each lemma is represented as a unique class within a clustered lemma vocabulary. The fine-tuning procedure, hyperparameters, and detailed methodology are presented in our paper โLemmatization as a Classification Task: Results from Arabic across Multiple Genresโ
Intended uses
This model is integrated into the lemmatization workflow available in our GitHub repository.
- Downloads last month
- 51