Konan / README.md
salmane11's picture
Update README.md
a24af28 verified
---
library_name: transformers
tags:
- text-classification
- ai-detection
- arabic-nlp
- machine-generated-text
datasets:
- CogniSAL/ARATECT
language:
- ar
base_model:
- aubmindlab/bert-base-arabertv02
---
# KONAN
## Model Description
**KONAN** is an **Arabic text classification model** designed to **distinguish between human-written and machine-generated Arabic news articles**.
The model aims to support research and applications related to **AI-generated content detection**, **misinformation analysis**, and **media authenticity** in Arabic-speaking contexts.
It is based on the [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) base model and fine-tuned on a curated dataset of **Arabic news texts**, labeled as either:
- **human**
- **machine (AI-written)**
The model learns stylistic, syntactic, and semantic patterns that differentiate human journalism from automatically generated text.
---
## Finetuning Procedure
The model was fine-tuned using **supervised learning for sequence classification**, with **PEFT (LoRA)** adapters to efficiently adapt the base model while retaining its strong Arabic language understanding.
Key aspects of training include:
- Coverage of multiple Arabic news domains (politics, economy, sports, technology, society)
- Exposure to different AI generation styles and prompting strategies
- Normalization of Arabic text (diacritics removal, punctuation consistency)
---
## Intended Use
This model is intended for:
- Detecting AI-generated Arabic news articles
- Assisting journalists, fact-checkers, and researchers
- Studying stylistic differences between human and machine-written Arabic text
---
## How to Use
### Example: Classifying Arabic News Text
```python
from transformers import pipeline
text = """
أعلنت وزارة الاقتصاد اليوم عن إطلاق خطة جديدة تهدف إلى دعم الشركات
الصغيرة والمتوسطة وتعزيز فرص العمل خلال السنوات القادمة.
"""
classifier = pipeline("text-classification", model="salmane11/konan", tokenizer="salmane11/konan", truncation=True, device = 0)
def detect_ai_generated_news(news: str) -> str:
label = classifier(news)
if label[0]['label']=="machine":
return True
else:
return False
#detect_ai_generated_news(aljazeera_news['content'][0])
```
## Cite our work
```bibtex
@article{lamsiyah2025m,
title={M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text},
author={Lamsiyah, Salima and Ezzini, Saad and El Mahdaouy, Abdelkader and Alami, Hamza and Benlahbib, Abdessamad and El Amrany, Samir and Chafik, Salmane and Hammouchi, Hicham},
journal={M-DAIGT-ST 2025},
pages={1},
year={2025}
}