--- license: mit language: - ar - en library_name: transformers tags: - arabic - text-generation - detoxification - ensemble - bloom pipeline_tag: text-generation model-index: - name: arab-detoxification-isp results: - task: type: text-generation name: Text Generation dataset: type: custom name: Arabic Detox Dataset metrics: - type: accuracy value: 0.95 name: STA ---
# ðŸ›Ąïļ Arabic Text Detoxification Model ### Ensemble Knowledge Distillation Approach [![Model](https://img.shields.io/badge/Model-Bloom--1b7-blue)](https://huggingface.co/bigscience/bloom-1b7) [![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) [![Language](https://img.shields.io/badge/Language-Arabic-red)](https://en.wikipedia.org/wiki/Arabic) [![HuggingFace](https://img.shields.io/badge/ðŸĪ—-HuggingFace-yellow)](https://huggingface.co/ispromashka/arab-detoxification-isp) **Transform toxic Arabic text into polite, neutral alternatives while preserving meaning** [Model Demo](#-quick-start) | [Architecture](#-architecture-overview) | [Dataset](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset) | [Results](#-evaluation-results)
--- ## 📊 Architecture Overview
Model Architecture
--- ## ðŸŽŊ Model Description This model performs **text detoxification** for Arabic language — converting offensive, toxic, or aggressive text into neutral, polite alternatives while preserving the original semantic meaning. ### Key Features | Feature | Description | |---------|-------------| | 🏗ïļ **Architecture** | Bloom-1b7 (1.7B parameters) fine-tuned with ensemble distillation | | 🌍 **Language** | Arabic (Modern Standard Arabic + dialects) | | 📚 **Training** | Ensemble of 3 models → Knowledge distillation → Final model | | ⚡ **Hardware** | Optimized for NVIDIA A100 40GB, works on consumer GPUs | | 📏 **Context** | Up to 2048 tokens | ### Ensemble Components | Model | Parameters | Role | Source | |-------|------------|------|--------| | AraGPT2-Medium | 370M | Arabic Language Expert | AUB MIND Lab | | Bloom-560m | 560M | Multilingual Generalization | BigScience | | Bloom-1b7 | 1.7B | High Capacity Patterns | BigScience | --- ## 📈 Evaluation Results
| Metric | Score | Description | |--------|-------|-------------| | **J-Score** | **0.7129** | Joint metric (geometric mean) | | **STA** | 0.9500 | Style Transfer Accuracy | | **SIM (ref)** | 0.9995 | Similarity to reference | | **Fluency** | 1.0000 | Grammatical correctness |
``` J-Score ████████████████████████████░░░░░░░░░░ 0.71 STA ██████████████████████████████████████ 0.95 SIM (ref) ██████████████████████████████████████ 1.00 Fluency ██████████████████████████████████████ 1.00 ``` --- ## 🚀 Quick Start ### Installation ```bash pip install transformers torch ``` ### Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model model_name = "ispromashka/arab-detoxification-isp" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16) model.to("cuda") # or "cpu" def detoxify(text: str) -> str: """Convert toxic Arabic text to neutral form.""" prompt = f"ØģØ§Ų…: {text}\nŲ…Ų‡Ø°ØĻ:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=50, temperature=0.7, top_p=0.9, repetition_penalty=1.2, do_sample=True, pad_token_id=tokenizer.pad_token_id, ) result = tokenizer.decode(outputs[0], skip_special_tokens=True) return result.split("Ų…Ų‡Ø°ØĻ:")[-1].strip().split("\n")[0] # Example toxic_text = "ØĢŲ†ØŠ ØšØĻ؊ ØŽØŊØ§Ų‹" neutral_text = detoxify(toxic_text) print(f"Input: {toxic_text}") print(f"Output: {neutral_text}") ``` --- ## ðŸ’Ą Examples | Category | Toxic Input (ØģØ§Ų…) | Neutral Output (Ų…Ų‡Ø°ØĻ) | |----------|-------------------|----------------------| | Insult | ØĢŲ†ØŠ ØšØĻ؊ ØŽØŊØ§Ų‹ | ØąØĻŲ…Ø§ ؊ح؊ا؎ ØĨŲ„Ų‰ Ų…Øē؊ØŊ Ų…Ų† Ø§Ų„ŲˆŲ‚ØŠ ؄؄؁؇؅ | | Command | اØŪØąØģ ŲŠØ§ ØĢØ­Ų…Ų‚ | ØĢØąØŽŲˆ ØĢŲ† ØŠŲƒŲˆŲ† ØĢ؃ØŦØą Ų‡ØŊŲˆØĄØ§Ų‹ | | Criticism | Ų‡Ø°Ø§ Ø§Ų„ØđŲ…Ų„ ØŠØ§ŲŲ‡ ؈ØģØŪ؊؁ | Ø§Ų„ØđŲ…Ų„ ŲŠŲ…ŲƒŲ† ØŠØ·ŲˆŲŠØąŲ‡ | | Threat | ØģØĢØŽØđŲ„Ųƒ ØŠŲ†ØŊŲ… | ØŊØđŲ†Ø§ Ų†Ø­Ų„ Ų‡Ø°Ø§ ØĻØģŲ„Ø§Ų… | | Contempt | ØĢŲ†ØŠ ŲØ§ØīŲ„ ØŠŲ…Ø§Ų…Ø§Ų‹ | Ø§Ų„Ų†ØŽØ§Ø­ ŲŠØ­ØŠØ§ØŽ Ų„Ų…Øē؊ØŊ Ų…Ų† Ø§Ų„ØŽŲ‡ØŊ | | Mockery | ŲŠØ§ Ų„Ų‡ Ų…Ų† ØšØĻ؊ | ØąØĻŲ…Ø§ Ų„Ų… ŲŠŲŲ‡Ų… ØŽŲŠØŊØ§Ų‹ | | Blame | ŲƒŲ„ ØīŲŠØĄ ØŪØ·ØĪ؃ | Ų†Ø­ØŠØ§ØŽ ؊حØŊ؊ØŊ Ø§Ų„Ų…ØģØĪŲˆŲ„ŲŠØ§ØŠ | | Appearance | Ų…Ų†ØļØąŲƒ ØģŲŠØĄ | Ø§Ų„Ų…ØļŲ‡Øą ŲŠŲ…ŲƒŲ† ؊حØģŲŠŲ†Ų‡ | --- ## 🔎 Methodology ### Training Pipeline ``` ┌─────────────────────────────────────────────────────────────┐ │ STAGE 1: Base Models │ ├─────────────────────────────────────────────────────────────â”Ī │ Train 3 specialized models independently on detox dataset │ │ â€Ē AraGPT2-Medium (25 epochs) │ │ â€Ē Bloom-560m (25 epochs) │ │ â€Ē Bloom-1b7 (20 epochs) │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ STAGE 2: Ensemble Selection │ ├─────────────────────────────────────────────────────────────â”Ī │ For each input, select best prediction using: │ │ Sentence-BERT (paraphrase-multilingual-mpnet-base-v2) │ │ Selection: argmax(cosine_similarity(pred, reference)) │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ STAGE 3: Knowledge Distillation │ ├─────────────────────────────────────────────────────────────â”Ī │ Fine-tune fresh Bloom-1b7 on: │ │ â€Ē Original dataset (3000+ examples) │ │ â€Ē Ensemble best predictions (1500+ examples) │ │ â€Ē Total: 4500+ training examples │ └─────────────────────────────────────────────────────────────┘ ``` ### Evaluation Metrics **J-Score** (Primary metric): $$J = \sqrt[3]{STA \times SIM \times FL}$$ Where: - **STA** (Style Transfer Accuracy): Measures toxicity removal success - **SIM** (Semantic Similarity): Content preservation (Sentence-BERT cosine similarity) - **FL** (Fluency): Ratio of grammatically valid outputs --- ## 📁 Dataset Dataset used for training and evaluation: [**ispromashka/arabic-detox-dataset**](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset) ### Composition | Category | Examples | Description | |----------|----------|-------------| | Personal Insults | 30 | Direct personal attacks | | Aggressive Commands | 20 | Hostile imperatives | | Work Criticism | 25 | Professional negative feedback | | Threats | 15 | Intimidation and warnings | | Contempt | 15 | Expressions of superiority | | Blame | 15 | Accusatory statements | | Appearance Criticism | 15 | Physical/aesthetic insults | | Mockery | 15 | Sarcastic belittling | | **Total Unique** | **150** | — | | **Augmented (×20)** | **3,000+** | Training examples | ### Data Format ``` ØģØ§Ų…: {toxic_text} Ų…Ų‡Ø°ØĻ: {neutral_text} ``` --- ## ⚙ïļ Training Configuration | Parameter | Base Models | Final Model | |-----------|-------------|-------------| | Hardware | NVIDIA A100 40GB | NVIDIA A100 40GB | | Precision | BF16 | BF16 | | Batch Size | 8–16 | 8 | | Learning Rate | 2e-5 – 3e-5 | 1.5e-5 | | Epochs | 20–25 | 15 | | Optimizer | AdamW | AdamW | | Scheduler | Cosine | Cosine | | Warmup | 10% | 10% | | Total Time | ~85 min | ~30 min | --- ## ⚠ïļ Limitations - **Language Coverage**: Optimized for Modern Standard Arabic; dialectal performance may vary - **Text Length**: Best for short-medium texts (< 100 tokens) - **Domain**: Trained on general toxicity; domain-specific content may need fine-tuning - **Context**: Does not consider conversation history --- ## 📖 Citation ```bibtex @misc{arabicdetox2024, author = {ispromashka}, title = {Arabic Text Detoxification: Ensemble Knowledge Distillation Approach}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/ispromashka/arab-detoxification-isp} } ``` --- ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ---
**Made with âĪïļ for the Arabic NLP community** [GitHub](https://github.com/ispromashka)