File size: 5,632 Bytes
5bee3a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad5877d
5bee3a1
 
 
 
 
 
 
 
ad5877d
5bee3a1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
language: ar
license: mit
library_name: transformers
tags:
- arabic
- authorship-attribution
- text-classification
- arabert
- literature
datasets:
- custom
metrics:
- accuracy
- f1
model-index:
- name: arabic-authorship-classification
  results:
  - task:
      type: text-classification
      name: Authorship Attribution
    metrics:
    - type: accuracy
      value: 0.7912
      name: Accuracy
    - type: f1
      value: 0.7023
      name: F1 Macro
    - type: f1
      value: 0.7891
      name: F1 Weighted
---

# Arabic Authorship Classification Model

## Model Description

This model is fine-tuned for Arabic authorship attribution, capable of classifying texts from **21 distinguished Arabic authors**. Built on AraBERT architecture, it demonstrates strong performance in identifying literary writing styles across classical and modern Arabic literature.

## Model Details

- **Model Type:** Text Classification
- **Base Model:** aubmindlab/bert-base-arabertv2
- **Language:** Arabic (ar)
- **Task:** Multi-class Authorship Attribution
- **Classes:** 21 authors
- **Parameters:** ~163M
- **Dataset Size:** 4,157 texts

## Performance

| Metric | Score |
|--------|-------|
| Accuracy | 79.12% |
| F1 Macro | 70.23% |
| F1 Micro | 79.12% |
| F1 Weighted | 78.91% |
| Training Loss | 0.3439 |
| Validation Loss | 0.7434 |

## Supported Authors

The model identifies texts from these 21 authors:

**Arabic Literature:**
- حسن حنفي (Hassan Hanafi) - 548 samples
- عبد الغفار مكاوي (Abdul Ghaffar Makawi) - 396 samples  
- نجيب محفوظ (Naguib Mahfouz) - 327 samples
- جُرجي زيدان (Jurji Zaydan) - 327 samples
- نوال السعداوي (Nawal El Saadawi) - 295 samples
- عباس محمود العقاد (Abbas Mahmoud al-Aqqad) - 267 samples
- محمد حسين هيكل (Mohamed Hussein Heikal) - 260 samples
- طه حسين (Taha Hussein) - 255 samples
- أحمد أمين (Ahmed Amin) - 246 samples
- أمين الريحاني (Ameen Rihani) - 142 samples
- فؤاد زكريا (Fouad Zakaria) - 125 samples
- يوسف إدريس (Yusuf Idris) - 120 samples
- سلامة موسى (Salama Moussa) - 119 samples
- ثروت أباظة (Tharwat Abaza) - 90 samples
- أحمد شوقي (Ahmed Shawqi) - 58 samples
- أحمد تيمور باشا (Ahmed Taymour Pasha) - 57 samples
- جبران خليل جبران (Khalil Gibran) - 30 samples
- كامل كيلاني (Kamel Kilani) - 25 samples

**Translated Literature:**
- ويليام شيكسبير (William Shakespeare) - 238 samples
- غوستاف لوبون (Gustave Le Bon) - 150 samples  
- روبرت بار (Robert Barr) - 82 samples

## Usage

### Direct Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("your-username/arabic-authorship-classification")
model = AutoModelForSequenceClassification.from_pretrained("your-username/arabic-authorship-classification")

# Predict
text = "النص العربي المراد تصنيفه"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)
    confidence = torch.max(predictions)

print(f"Predicted class: {predicted_class.item()}")
print(f"Confidence: {confidence:.4f}")
```

### Pipeline Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", 
                     model="your-username/arabic-authorship-classification",
                     tokenizer="your-username/arabic-authorship-classification")

result = classifier("النص العربي للتصنيف")
print(result)
```

## Training Data

- **Size:** 4,157 Arabic text samples
- **Source:** Curated Arabic literary corpus
- **Genres:** Essays, novels, poetry, philosophical works
- **Period:** Classical to modern Arabic literature
- **Quality:** High-quality literary texts

## Training Procedure

### Training Hyperparameters

- **Base Model:** aubmindlab/bert-base-arabertv2
- **Max Length:** 512 tokens
- **Learning Rate:** 2e-5
- **Batch Size:** 8 (train), 16 (eval)
- **Epochs:** 150 (with early stopping)
- **Optimizer:** AdamW
- **Weight Decay:** 0.01

### Training Infrastructure

- **Hardware:** GPU-accelerated training
- **Framework:** PyTorch + Transformers
- **Mixed Precision:** Enabled (fp16)

## Evaluation

The model achieves strong performance across all 21 author classes:

- **Balanced Performance:** F1 weighted (78.91%) shows good performance across all authors
- **High Accuracy:** 79.12% accuracy for 21-class classification
- **Robust Generalization:** Reasonable gap between training and validation loss

## Limitations

- Performance may vary on non-literary Arabic texts
- Best suited for Modern Standard Arabic (MSA)
- May struggle with very short texts (<50 words)
- Not tested on dialectical Arabic variations
- Limited to the 21 authors in training data

## Bias and Ethical Considerations

- Training data focuses on established literary figures
- May reflect historical and cultural biases in literary canon
- Gender representation varies across authors
- Consider fairness when applying to contemporary texts

## Citation

```bibtex
@misc{arabic-authorship-classification-2024,
  title={Arabic Authorship Classification Model},
  author={Sabari Nathan},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/your-username/arabic-authorship-classification}
}
```

## Model Card Authors

Sabari Nathan