--- license: apache-2.0 language: - ar base_model: - UBC-NLP/AraT5v2-base-1024 library_name: transformers tags: - TST - Arabic - Author_Style - AraGenEval --- # AraStyleTransfer-21 | 21 Arabic Author Styles. One Model. 🏆 **First Place Winner at AraGenEval 2025 Competition** A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering. ## 🔗 Paper Link (ACL Anthology) 📘 **ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer** [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf] ## 🏗️ Model Architecture - **Base Model:** UBC-NLP/AraT5v2-base-1024 - **Approach:** Descriptive Author Tokens + Prompt Engineering - **Input Format:** `"اكتب النص التالي بأسلوب : [text]"` - **Training:** Fine-tuned with author-specific tokens ## 🔬 Technical Details ### Stylometric Analysis The model incorporates comprehensive stylometric analysis including: - **Lexical Features:** Sentence length, word length, vocabulary richness - **Syntactic Patterns:** Definite articles, conjunctions, prepositions - **Author-Specific Vocabulary:** TF-IDF based characteristic words - **Style Classification:** Formality, complexity, emotional intensity ### Prompt Engineering - **Format:** `"اكتب النص التالي بأسلوب : [original_text]"` - **Author Tokens:** Descriptive tokens like `` - **Target:** Generated text in author's style ## 📚 Supported Authors

## 📁 Input File Format For batch processing, your input file should have the following format: ## 📊 Example Snippets from the Dataset | id | text_in_msa (partial) | text_in_author_style (partial) | |----|------------------------|--------------------------------| | 3835 | "لم أقم مطلقًا بالاحتفال بعيد ميلادي... وكنت أتجادل مع كامل الشناوي..." | "عمري ما احتفلت بعيد ميلادي... وأتشاجر مع كامل الشناوي على ذلك الاكتئاب..." | | 3836 | "الزمن العام هو العداد الجماعي الذي يسجل السنين... ويبرز الزمن الخاص..." | "الزمن العام يعدّ السنين للناس كلها... أما عدادك الخاص فأنت نادرًا ما تنظر فيه..." | | 3837 | "مصر الغنية الراقية... اشتراكية وديمقراطية تتفاعل معًا... أحلام الخمسين..." | "مصر المصنِّعة... الكون مائة زهرة... وحين أبلغ الخمسين أبدأ أعيش وأتعلم الموسيقى..." | | 3838 | "غرابة التجربة... طفولة جادة تمامًا بلا مرح... الطفولة كانت عيبًا..." | "غريبة هي الأفكار... كنتُ رجلًا رهيبًا في ثوب طفل... والطفولة تُهمة نخشى الاعتراف بها..." | | 3839 | "هذا ليس ندمًا... موجة تفوقك قوة... النصر الحقيقي أن تعيش كما تختار..." | "ليس مرارة ولا ندمًا... أنت تناضل موجة أعتى منك... والحق أن تحيا كما اخترت أنت..." | ## 📊 Performance Metrics - **BLEU Score:** 24.58 - **chrF Score:** 59.01 - **Competition:** First Place in AraGenEval 2024 - **Supported Authors:** 21 Arabic authors Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.

## 🚀 Quick Start: Style Transfer Example ```python from transformers import T5Tokenizer, T5ForConditionalGeneration import torch # Load model model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21" tokenizer = T5Tokenizer.from_pretrained(model_name) model = T5ForConditionalGeneration.from_pretrained(model_name) device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) # Input text and author text = "لم أقم مطلقًا بالاحتفال بعيد ميلادي منذ طفولتي." author = "يوسف إدريس" # Prompt format prompt = f"اكتب النص التالي بأسلوب : {text}" # Tokenize inputs = tokenizer(prompt, return_tensors="pt").to(device) # Generate output_ids = model.generate( **inputs, max_length=256, num_beams=5, early_stopping=True ) # Decode generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) print("Original:", text) print("Author:", author) print("Output:", generated_text) ``` ## 🎯 Use Cases - **Content Creation:** Generate text in specific author styles - **Educational Tools:** Demonstrate different writing styles - **Research:** Study Arabic literary styles and patterns - **Creative Writing:** Inspire new content in classic styles ## 🤝 Contributing This model was developed for the [AraGenEval 2025](https://ezzini.github.io/AraGenEval/) competition. For questions or contributions, please refer to the competition guidelines. ## 📄 License This model is released under the same license as the base AraT5v2 model. ## BibTeX Citation ```bibtex @inproceedings{nacar2025anlpers, title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer}, author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii}, booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks}, pages={49--53}, year={2025} } ``` --- **🏆 First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition**