Update README.md
Browse files
README.md
CHANGED
|
@@ -1,39 +1,29 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
language:
|
| 4 |
-
- ar
|
| 5 |
-
- en
|
| 6 |
library_name: transformers
|
| 7 |
tags:
|
| 8 |
-
- arabic
|
| 9 |
-
- text-generation
|
| 10 |
-
- detoxification
|
| 11 |
-
- ensemble
|
| 12 |
-
- bloom
|
| 13 |
-
- nlp
|
| 14 |
pipeline_tag: text-generation
|
| 15 |
-
base_model:
|
| 16 |
-
- bigscience/bloom-1b7
|
| 17 |
-
datasets:
|
| 18 |
-
- custom
|
| 19 |
-
metrics:
|
| 20 |
-
- accuracy
|
| 21 |
model-index:
|
| 22 |
-
- name:
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
- type: similarity
|
| 35 |
-
value: 0.9995
|
| 36 |
-
name: Reference Similarity
|
| 37 |
---
|
| 38 |
|
| 39 |
<div align="center">
|
|
@@ -49,7 +39,7 @@ model-index:
|
|
| 49 |
|
| 50 |
**Transform toxic Arabic text into polite, neutral alternatives while preserving meaning**
|
| 51 |
|
| 52 |
-
[Model Demo](
|
| 53 |
|
| 54 |
</div>
|
| 55 |
|
|
@@ -218,6 +208,9 @@ Where:
|
|
| 218 |
|
| 219 |
## ๐ Dataset
|
| 220 |
|
|
|
|
|
|
|
|
|
|
| 221 |
### Composition
|
| 222 |
|
| 223 |
| Category | Examples | Description |
|
|
@@ -248,9 +241,9 @@ Where:
|
|
| 248 |
|-----------|-------------|-------------|
|
| 249 |
| Hardware | NVIDIA A100 40GB | NVIDIA A100 40GB |
|
| 250 |
| Precision | BF16 | BF16 |
|
| 251 |
-
| Batch Size | 8
|
| 252 |
-
| Learning Rate | 2e-5
|
| 253 |
-
| Epochs | 20
|
| 254 |
| Optimizer | AdamW | AdamW |
|
| 255 |
| Scheduler | Cosine | Cosine |
|
| 256 |
| Warmup | 10% | 10% |
|
|
@@ -267,16 +260,6 @@ Where:
|
|
| 267 |
|
| 268 |
---
|
| 269 |
|
| 270 |
-
## ๐ฎ Future Work
|
| 271 |
-
|
| 272 |
-
- Expand to Arabic dialects (Egyptian, Gulf, Levantine)
|
| 273 |
-
- Add toxicity detection classifier
|
| 274 |
-
- Multi-turn conversation support
|
| 275 |
-
- Larger model variants (3B, 7B)
|
| 276 |
-
- Arabic-English code-switching support
|
| 277 |
-
|
| 278 |
-
---
|
| 279 |
-
|
| 280 |
## ๐ Citation
|
| 281 |
|
| 282 |
```bibtex
|
|
@@ -291,15 +274,6 @@ Where:
|
|
| 291 |
|
| 292 |
---
|
| 293 |
|
| 294 |
-
## ๐ Acknowledgments
|
| 295 |
-
|
| 296 |
-
- [BigScience](https://bigscience.huggingface.co/) for BLOOM models
|
| 297 |
-
- [AUB MIND Lab](https://mind.aub.edu.lb/) for AraGPT2
|
| 298 |
-
- [SBERT](https://www.sbert.net/) for multilingual embeddings
|
| 299 |
-
- [Hugging Face](https://huggingface.co/) for model hosting and Transformers library
|
| 300 |
-
|
| 301 |
-
---
|
| 302 |
-
|
| 303 |
## ๐ License
|
| 304 |
|
| 305 |
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
language:
|
| 4 |
+
- ar
|
| 5 |
+
- en
|
| 6 |
library_name: transformers
|
| 7 |
tags:
|
| 8 |
+
- arabic
|
| 9 |
+
- text-generation
|
| 10 |
+
- detoxification
|
| 11 |
+
- ensemble
|
| 12 |
+
- bloom
|
|
|
|
| 13 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
model-index:
|
| 15 |
+
- name: arab-detoxification-isp
|
| 16 |
+
results:
|
| 17 |
+
- task:
|
| 18 |
+
type: text-generation
|
| 19 |
+
name: Text Generation
|
| 20 |
+
dataset:
|
| 21 |
+
type: custom
|
| 22 |
+
name: Arabic Detox Dataset
|
| 23 |
+
metrics:
|
| 24 |
+
- type: accuracy
|
| 25 |
+
value: 0.95
|
| 26 |
+
name: STA
|
|
|
|
|
|
|
|
|
|
| 27 |
---
|
| 28 |
|
| 29 |
<div align="center">
|
|
|
|
| 39 |
|
| 40 |
**Transform toxic Arabic text into polite, neutral alternatives while preserving meaning**
|
| 41 |
|
| 42 |
+
[Model Demo](#-quick-start) | [Architecture](#-architecture-overview) | [Dataset](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset) | [Results](#-evaluation-results)
|
| 43 |
|
| 44 |
</div>
|
| 45 |
|
|
|
|
| 208 |
|
| 209 |
## ๐ Dataset
|
| 210 |
|
| 211 |
+
Dataset used for training and evaluation:
|
| 212 |
+
[**ispromashka/arabic-detox-dataset**](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset)
|
| 213 |
+
|
| 214 |
### Composition
|
| 215 |
|
| 216 |
| Category | Examples | Description |
|
|
|
|
| 241 |
|-----------|-------------|-------------|
|
| 242 |
| Hardware | NVIDIA A100 40GB | NVIDIA A100 40GB |
|
| 243 |
| Precision | BF16 | BF16 |
|
| 244 |
+
| Batch Size | 8โ16 | 8 |
|
| 245 |
+
| Learning Rate | 2e-5 โ 3e-5 | 1.5e-5 |
|
| 246 |
+
| Epochs | 20โ25 | 15 |
|
| 247 |
| Optimizer | AdamW | AdamW |
|
| 248 |
| Scheduler | Cosine | Cosine |
|
| 249 |
| Warmup | 10% | 10% |
|
|
|
|
| 260 |
|
| 261 |
---
|
| 262 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
## ๐ Citation
|
| 264 |
|
| 265 |
```bibtex
|
|
|
|
| 274 |
|
| 275 |
---
|
| 276 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 277 |
## ๐ License
|
| 278 |
|
| 279 |
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|