salmane11
/

Konan

Text Classification

machine-generated-text

text-embeddings-inference

Model card Files Files and versions

Metrics Training metrics Community

Konan / README.md

salmane11's picture

Update README.md

a24af28 verified about 2 months ago

|

history blame contribute delete

2.77 kB

	---
	library_name: transformers
	tags:
	- text-classification
	- ai-detection
	- arabic-nlp
	- machine-generated-text
	datasets:
	- CogniSAL/ARATECT
	language:
	- ar
	base_model:
	- aubmindlab/bert-base-arabertv02
	---

	# KONAN

	## Model Description

	KONAN is an Arabic text classification model designed to distinguish between human-written and machine-generated Arabic news articles.
	The model aims to support research and applications related to AI-generated content detection, misinformation analysis, and media authenticity in Arabic-speaking contexts.

	It is based on the [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) base model and fine-tuned on a curated dataset of Arabic news texts, labeled as either:
	- human
	- machine (AI-written)

	The model learns stylistic, syntactic, and semantic patterns that differentiate human journalism from automatically generated text.

	---

	## Finetuning Procedure

	The model was fine-tuned using supervised learning for sequence classification, with PEFT (LoRA) adapters to efficiently adapt the base model while retaining its strong Arabic language understanding.

	Key aspects of training include:
	- Coverage of multiple Arabic news domains (politics, economy, sports, technology, society)
	- Exposure to different AI generation styles and prompting strategies
	- Normalization of Arabic text (diacritics removal, punctuation consistency)

	---



	## Intended Use
	This model is intended for:
	- Detecting AI-generated Arabic news articles
	- Assisting journalists, fact-checkers, and researchers
	- Studying stylistic differences between human and machine-written Arabic text

	---

	## How to Use

	### Example: Classifying Arabic News Text

	```python
	from transformers import pipeline

	text = """
	أعلنت وزارة الاقتصاد اليوم عن إطلاق خطة جديدة تهدف إلى دعم الشركات
	الصغيرة والمتوسطة وتعزيز فرص العمل خلال السنوات القادمة.
	"""

	classifier = pipeline("text-classification", model="salmane11/konan", tokenizer="salmane11/konan", truncation=True, device = 0)

	def detect_ai_generated_news(news: str) -> str:
	label = classifier(news)
	if label[0]['label']=="machine":
	return True
	else:
	return False
	#detect_ai_generated_news(aljazeera_news['content'][0])

	```

	## Cite our work
	```bibtex
	@article{lamsiyah2025m,
	title={M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text},
	author={Lamsiyah, Salima and Ezzini, Saad and El Mahdaouy, Abdelkader and Alami, Hamza and Benlahbib, Abdessamad and El Amrany, Samir and Chafik, Salmane and Hammouchi, Hicham},
	journal={M-DAIGT-ST 2025},
	pages={1},
	year={2025}
	}