File size: 10,519 Bytes

---
license: mit
language:
  - ar
  - en
library_name: transformers
tags:
  - arabic
  - text-generation
  - detoxification
  - ensemble
  - bloom
pipeline_tag: text-generation
model-index:
  - name: arab-detoxification-isp
    results:
    - task:
        type: text-generation
        name: Text Generation
      dataset:
        type: custom
        name: Arabic Detox Dataset
      metrics:
      - type: accuracy
        value: 0.95
        name: STA
---

<div align="center">

# 🛡️ Arabic Text Detoxification Model

### Ensemble Knowledge Distillation Approach

[![Model](https://img.shields.io/badge/Model-Bloom--1b7-blue)](https://huggingface.co/bigscience/bloom-1b7)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Language](https://img.shields.io/badge/Language-Arabic-red)](https://en.wikipedia.org/wiki/Arabic)
[![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/ispromashka/arab-detoxification-isp)

**Transform toxic Arabic text into polite, neutral alternatives while preserving meaning**

[Model Demo](#-quick-start) | [Architecture](#-architecture-overview) | [Dataset](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset) | [Results](#-evaluation-results)

</div>

---

## 📊 Architecture Overview

<div align="center">
<img src="https://huggingface.co/ispromashka/arab-detoxification-isp/resolve/main/architecture.png" alt="Model Architecture" width="100%">
</div>

---

## 🎯 Model Description

This model performs **text detoxification** for Arabic language — converting offensive, toxic, or aggressive text into neutral, polite alternatives while preserving the original semantic meaning.

### Key Features

| Feature | Description |
|---------|-------------|
| 🏗️ **Architecture** | Bloom-1b7 (1.7B parameters) fine-tuned with ensemble distillation |
| 🌍 **Language** | Arabic (Modern Standard Arabic + dialects) |
| 📚 **Training** | Ensemble of 3 models → Knowledge distillation → Final model |
| ⚡ **Hardware** | Optimized for NVIDIA A100 40GB, works on consumer GPUs |
| 📏 **Context** | Up to 2048 tokens |

### Ensemble Components

| Model | Parameters | Role | Source |
|-------|------------|------|--------|
| AraGPT2-Medium | 370M | Arabic Language Expert | AUB MIND Lab |
| Bloom-560m | 560M | Multilingual Generalization | BigScience |
| Bloom-1b7 | 1.7B | High Capacity Patterns | BigScience |

---

## 📈 Evaluation Results

<div align="center">

| Metric | Score | Description |
|--------|-------|-------------|
| **J-Score** | **0.7129** | Joint metric (geometric mean) |
| **STA** | 0.9500 | Style Transfer Accuracy |
| **SIM (ref)** | 0.9995 | Similarity to reference |
| **Fluency** | 1.0000 | Grammatical correctness |

</div>

```
J-Score    ████████████████████████████░░░░░░░░░░  0.71
STA        ██████████████████████████████████████  0.95
SIM (ref)  ██████████████████████████████████████  1.00
Fluency    ██████████████████████████████████████  1.00
```

---

## 🚀 Quick Start

### Installation

```bash
pip install transformers torch
```

### Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_name = "ispromashka/arab-detoxification-isp"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.to("cuda")  # or "cpu"

def detoxify(text: str) -> str:
    """Convert toxic Arabic text to neutral form."""
    prompt = f"سام: {text}\nمهذب:"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.2,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
    )
    
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return result.split("مهذب:")[-1].strip().split("\n")[0]

# Example
toxic_text = "أنت غبي جداً"
neutral_text = detoxify(toxic_text)
print(f"Input:  {toxic_text}")
print(f"Output: {neutral_text}")
```

---

## 💡 Examples

| Category | Toxic Input (سام) | Neutral Output (مهذب) |
|----------|-------------------|----------------------|
| Insult | أنت غبي جداً | ربما تحتاج إلى مزيد من الوقت للفهم |
| Command | اخرس يا أحمق | أرجو أن تكون أكثر هدوءاً |
| Criticism | هذا العمل تافه وسخيف | العمل يمكن تطويره |
| Threat | سأجعلك تندم | دعنا نحل هذا بسلام |
| Contempt | أنت فاشل تماماً | النجاح يحتاج لمزيد من الجهد |
| Mockery | يا له من غبي | ربما لم يفهم جيداً |
| Blame | كل شيء خطؤك | نحتاج تحديد المسؤوليات |
| Appearance | منظرك سيء | المظهر يمكن تحسينه |

---

## 🔬 Methodology

### Training Pipeline

```
┌─────────────────────────────────────────────────────────────┐
│                    STAGE 1: Base Models                     │
├─────────────────────────────────────────────────────────────┤
│  Train 3 specialized models independently on detox dataset  │
│  • AraGPT2-Medium (25 epochs)                               │
│  • Bloom-560m (25 epochs)                                   │
│  • Bloom-1b7 (20 epochs)                                    │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                 STAGE 2: Ensemble Selection                 │
├─────────────────────────────────────────────────────────────┤
│  For each input, select best prediction using:              │
│  Sentence-BERT (paraphrase-multilingual-mpnet-base-v2)      │
│  Selection: argmax(cosine_similarity(pred, reference))      │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│               STAGE 3: Knowledge Distillation               │
├─────────────────────────────────────────────────────────────┤
│  Fine-tune fresh Bloom-1b7 on:                              │
│  • Original dataset (3000+ examples)                        │
│  • Ensemble best predictions (1500+ examples)               │
│  • Total: 4500+ training examples                           │
└─────────────────────────────────────────────────────────────┘
```

### Evaluation Metrics

**J-Score** (Primary metric):

$$J = \sqrt[3]{STA \times SIM \times FL}$$

Where:
- **STA** (Style Transfer Accuracy): Measures toxicity removal success
- **SIM** (Semantic Similarity): Content preservation (Sentence-BERT cosine similarity)
- **FL** (Fluency): Ratio of grammatically valid outputs

---

## 📁 Dataset

Dataset used for training and evaluation:  
[**ispromashka/arabic-detox-dataset**](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset)

### Composition

| Category | Examples | Description |
|----------|----------|-------------|
| Personal Insults | 30 | Direct personal attacks |
| Aggressive Commands | 20 | Hostile imperatives |
| Work Criticism | 25 | Professional negative feedback |
| Threats | 15 | Intimidation and warnings |
| Contempt | 15 | Expressions of superiority |
| Blame | 15 | Accusatory statements |
| Appearance Criticism | 15 | Physical/aesthetic insults |
| Mockery | 15 | Sarcastic belittling |
| **Total Unique** | **150** | — |
| **Augmented (×20)** | **3,000+** | Training examples |

### Data Format

```
سام: {toxic_text}
مهذب: {neutral_text}<EOS>
```

---

## ⚙️ Training Configuration

| Parameter | Base Models | Final Model |
|-----------|-------------|-------------|
| Hardware | NVIDIA A100 40GB | NVIDIA A100 40GB |
| Precision | BF16 | BF16 |
| Batch Size | 8–16 | 8 |
| Learning Rate | 2e-5 – 3e-5 | 1.5e-5 |
| Epochs | 20–25 | 15 |
| Optimizer | AdamW | AdamW |
| Scheduler | Cosine | Cosine |
| Warmup | 10% | 10% |
| Total Time | ~85 min | ~30 min |

---

## ⚠️ Limitations

- **Language Coverage**: Optimized for Modern Standard Arabic; dialectal performance may vary
- **Text Length**: Best for short-medium texts (< 100 tokens)
- **Domain**: Trained on general toxicity; domain-specific content may need fine-tuning
- **Context**: Does not consider conversation history

---

## 📖 Citation

```bibtex
@misc{arabicdetox2024,
  author = {ispromashka},
  title = {Arabic Text Detoxification: Ensemble Knowledge Distillation Approach},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ispromashka/arab-detoxification-isp}
}
```

---

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

<div align="center">

**Made with ❤️ for the Arabic NLP community**

[GitHub](https://github.com/ispromadhka)

</div>