| | --- |
| | library_name: transformers |
| | tags: |
| | - text-classification |
| | - ai-detection |
| | - arabic-nlp |
| | - machine-generated-text |
| | datasets: |
| | - CogniSAL/ARATECT |
| | language: |
| | - ar |
| | base_model: |
| | - aubmindlab/bert-base-arabertv02 |
| | --- |
| | |
| | # KONAN |
| |
|
| | ## Model Description |
| |
|
| | **KONAN** is an **Arabic text classification model** designed to **distinguish between human-written and machine-generated Arabic news articles**. |
| | The model aims to support research and applications related to **AI-generated content detection**, **misinformation analysis**, and **media authenticity** in Arabic-speaking contexts. |
| |
|
| | It is based on the [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) base model and fine-tuned on a curated dataset of **Arabic news texts**, labeled as either: |
| | - **human** |
| | - **machine (AI-written)** |
| |
|
| | The model learns stylistic, syntactic, and semantic patterns that differentiate human journalism from automatically generated text. |
| |
|
| | --- |
| |
|
| | ## Finetuning Procedure |
| |
|
| | The model was fine-tuned using **supervised learning for sequence classification**, with **PEFT (LoRA)** adapters to efficiently adapt the base model while retaining its strong Arabic language understanding. |
| |
|
| | Key aspects of training include: |
| | - Coverage of multiple Arabic news domains (politics, economy, sports, technology, society) |
| | - Exposure to different AI generation styles and prompting strategies |
| | - Normalization of Arabic text (diacritics removal, punctuation consistency) |
| |
|
| | --- |
| |
|
| |
|
| |
|
| | ## Intended Use |
| | This model is intended for: |
| | - Detecting AI-generated Arabic news articles |
| | - Assisting journalists, fact-checkers, and researchers |
| | - Studying stylistic differences between human and machine-written Arabic text |
| |
|
| | --- |
| |
|
| | ## How to Use |
| |
|
| | ### Example: Classifying Arabic News Text |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | text = """ |
| | أعلنت وزارة الاقتصاد اليوم عن إطلاق خطة جديدة تهدف إلى دعم الشركات |
| | الصغيرة والمتوسطة وتعزيز فرص العمل خلال السنوات القادمة. |
| | """ |
| | |
| | classifier = pipeline("text-classification", model="salmane11/konan", tokenizer="salmane11/konan", truncation=True, device = 0) |
| | |
| | def detect_ai_generated_news(news: str) -> str: |
| | label = classifier(news) |
| | if label[0]['label']=="machine": |
| | return True |
| | else: |
| | return False |
| | #detect_ai_generated_news(aljazeera_news['content'][0]) |
| | |
| | ``` |
| |
|
| | ## Cite our work |
| | ```bibtex |
| | @article{lamsiyah2025m, |
| | title={M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text}, |
| | author={Lamsiyah, Salima and Ezzini, Saad and El Mahdaouy, Abdelkader and Alami, Hamza and Benlahbib, Abdessamad and El Amrany, Samir and Chafik, Salmane and Hammouchi, Hicham}, |
| | journal={M-DAIGT-ST 2025}, |
| | pages={1}, |
| | year={2025} |
| | } |
| | |
| | |
| | |