--- library_name: transformers tags: - text-classification - ai-detection - arabic-nlp - machine-generated-text datasets: - CogniSAL/ARATECT language: - ar base_model: - aubmindlab/bert-base-arabertv02 --- # KONAN ## Model Description **KONAN** is an **Arabic text classification model** designed to **distinguish between human-written and machine-generated Arabic news articles**. The model aims to support research and applications related to **AI-generated content detection**, **misinformation analysis**, and **media authenticity** in Arabic-speaking contexts. It is based on the [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) base model and fine-tuned on a curated dataset of **Arabic news texts**, labeled as either: - **human** - **machine (AI-written)** The model learns stylistic, syntactic, and semantic patterns that differentiate human journalism from automatically generated text. --- ## Finetuning Procedure The model was fine-tuned using **supervised learning for sequence classification**, with **PEFT (LoRA)** adapters to efficiently adapt the base model while retaining its strong Arabic language understanding. Key aspects of training include: - Coverage of multiple Arabic news domains (politics, economy, sports, technology, society) - Exposure to different AI generation styles and prompting strategies - Normalization of Arabic text (diacritics removal, punctuation consistency) --- ## Intended Use This model is intended for: - Detecting AI-generated Arabic news articles - Assisting journalists, fact-checkers, and researchers - Studying stylistic differences between human and machine-written Arabic text --- ## How to Use ### Example: Classifying Arabic News Text ```python from transformers import pipeline text = """ أعلنت وزارة الاقتصاد اليوم عن إطلاق خطة جديدة تهدف إلى دعم الشركات الصغيرة والمتوسطة وتعزيز فرص العمل خلال السنوات القادمة. """ classifier = pipeline("text-classification", model="salmane11/konan", tokenizer="salmane11/konan", truncation=True, device = 0) def detect_ai_generated_news(news: str) -> str: label = classifier(news) if label[0]['label']=="machine": return True else: return False #detect_ai_generated_news(aljazeera_news['content'][0]) ``` ## Cite our work ```bibtex @article{lamsiyah2025m, title={M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text}, author={Lamsiyah, Salima and Ezzini, Saad and El Mahdaouy, Abdelkader and Alami, Hamza and Benlahbib, Abdessamad and El Amrany, Samir and Chafik, Salmane and Hammouchi, Hicham}, journal={M-DAIGT-ST 2025}, pages={1}, year={2025} }