ispromashka commited on
Commit
91ca32b
ยท
verified ยท
1 Parent(s): 92b22c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -52
README.md CHANGED
@@ -1,39 +1,29 @@
1
  ---
2
  license: mit
3
  language:
4
- - ar
5
- - en
6
  library_name: transformers
7
  tags:
8
- - arabic
9
- - text-generation
10
- - detoxification
11
- - ensemble
12
- - bloom
13
- - nlp
14
  pipeline_tag: text-generation
15
- base_model:
16
- - bigscience/bloom-1b7
17
- datasets:
18
- - custom
19
- metrics:
20
- - accuracy
21
  model-index:
22
- - name: arabic-detox-ensemble
23
- results:
24
- - task:
25
- type: text-generation
26
- name: Text Detoxification
27
- metrics:
28
- - type: j-score
29
- value: 0.7129
30
- name: J-Score
31
- - type: accuracy
32
- value: 0.95
33
- name: Style Transfer Accuracy
34
- - type: similarity
35
- value: 0.9995
36
- name: Reference Similarity
37
  ---
38
 
39
  <div align="center">
@@ -49,7 +39,7 @@ model-index:
49
 
50
  **Transform toxic Arabic text into polite, neutral alternatives while preserving meaning**
51
 
52
- [Model Demo](#usage) | [Paper](#methodology) | [Dataset](#dataset) | [Results](#evaluation-results)
53
 
54
  </div>
55
 
@@ -218,6 +208,9 @@ Where:
218
 
219
  ## ๐Ÿ“ Dataset
220
 
 
 
 
221
  ### Composition
222
 
223
  | Category | Examples | Description |
@@ -248,9 +241,9 @@ Where:
248
  |-----------|-------------|-------------|
249
  | Hardware | NVIDIA A100 40GB | NVIDIA A100 40GB |
250
  | Precision | BF16 | BF16 |
251
- | Batch Size | 8-16 | 8 |
252
- | Learning Rate | 2e-5 - 3e-5 | 1.5e-5 |
253
- | Epochs | 20-25 | 15 |
254
  | Optimizer | AdamW | AdamW |
255
  | Scheduler | Cosine | Cosine |
256
  | Warmup | 10% | 10% |
@@ -267,16 +260,6 @@ Where:
267
 
268
  ---
269
 
270
- ## ๐Ÿ”ฎ Future Work
271
-
272
- - Expand to Arabic dialects (Egyptian, Gulf, Levantine)
273
- - Add toxicity detection classifier
274
- - Multi-turn conversation support
275
- - Larger model variants (3B, 7B)
276
- - Arabic-English code-switching support
277
-
278
- ---
279
-
280
  ## ๐Ÿ“– Citation
281
 
282
  ```bibtex
@@ -291,15 +274,6 @@ Where:
291
 
292
  ---
293
 
294
- ## ๐Ÿ™ Acknowledgments
295
-
296
- - [BigScience](https://bigscience.huggingface.co/) for BLOOM models
297
- - [AUB MIND Lab](https://mind.aub.edu.lb/) for AraGPT2
298
- - [SBERT](https://www.sbert.net/) for multilingual embeddings
299
- - [Hugging Face](https://huggingface.co/) for model hosting and Transformers library
300
-
301
- ---
302
-
303
  ## ๐Ÿ“„ License
304
 
305
  This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 
1
  ---
2
  license: mit
3
  language:
4
+ - ar
5
+ - en
6
  library_name: transformers
7
  tags:
8
+ - arabic
9
+ - text-generation
10
+ - detoxification
11
+ - ensemble
12
+ - bloom
 
13
  pipeline_tag: text-generation
 
 
 
 
 
 
14
  model-index:
15
+ - name: arab-detoxification-isp
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Text Generation
20
+ dataset:
21
+ type: custom
22
+ name: Arabic Detox Dataset
23
+ metrics:
24
+ - type: accuracy
25
+ value: 0.95
26
+ name: STA
 
 
 
27
  ---
28
 
29
  <div align="center">
 
39
 
40
  **Transform toxic Arabic text into polite, neutral alternatives while preserving meaning**
41
 
42
+ [Model Demo](#-quick-start) | [Architecture](#-architecture-overview) | [Dataset](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset) | [Results](#-evaluation-results)
43
 
44
  </div>
45
 
 
208
 
209
  ## ๐Ÿ“ Dataset
210
 
211
+ Dataset used for training and evaluation:
212
+ [**ispromashka/arabic-detox-dataset**](https://huggingface.co/datasets/ispromashka/arabic-detox-dataset)
213
+
214
  ### Composition
215
 
216
  | Category | Examples | Description |
 
241
  |-----------|-------------|-------------|
242
  | Hardware | NVIDIA A100 40GB | NVIDIA A100 40GB |
243
  | Precision | BF16 | BF16 |
244
+ | Batch Size | 8โ€“16 | 8 |
245
+ | Learning Rate | 2e-5 โ€“ 3e-5 | 1.5e-5 |
246
+ | Epochs | 20โ€“25 | 15 |
247
  | Optimizer | AdamW | AdamW |
248
  | Scheduler | Cosine | Cosine |
249
  | Warmup | 10% | 10% |
 
260
 
261
  ---
262
 
 
 
 
 
 
 
 
 
 
 
263
  ## ๐Ÿ“– Citation
264
 
265
  ```bibtex
 
274
 
275
  ---
276
 
 
 
 
 
 
 
 
 
 
277
  ## ๐Ÿ“„ License
278
 
279
  This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.