Commit
·
e1af151
1
Parent(s):
ee432a1
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,56 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
Online Demo
|
| 5 |
+
--------
|
| 6 |
+
You can try our model using the demo link below
|
| 7 |
+
|
| 8 |
+
[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
ArBanking77 Corpus
|
| 12 |
+
--------
|
| 13 |
+
ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL)
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
Corpus Download
|
| 17 |
+
--------
|
| 18 |
+
A sample data is available in the `data` directory. But the entire ArBanking77 corpus is
|
| 19 |
+
available to download upon request for academic and commercial use. Request to download
|
| 20 |
+
ArBanking77 (corpus and the model).
|
| 21 |
+
|
| 22 |
+
[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
|
| 23 |
+
|
| 24 |
+
Model Download
|
| 25 |
+
--------
|
| 26 |
+
huggingface: [https://huggingface.co/SinaLab/ArBanking77](https://huggingface.co/SinaLab/ArBanking77)
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
Model Training
|
| 30 |
+
--------
|
| 31 |
+
|
| 32 |
+
```commandline
|
| 33 |
+
python run_glue_no_trainer.py
|
| 34 |
+
--model_name_or_path aubmindlab/bert-base-arabertv2
|
| 35 |
+
--train_file ./data/Banking77_Arabized_Ver3_train_MSA_PAL_merged.json
|
| 36 |
+
--validation_file ./data/Banking77_Arabized_Ver3_val_MSA_PAL_merged.json
|
| 37 |
+
--seed 42
|
| 38 |
+
--max_length 128
|
| 39 |
+
--learning_rate 4e-5
|
| 40 |
+
--num_train_epochs 20
|
| 41 |
+
--per_device_train_batch_size 32
|
| 42 |
+
--output_dir ./results
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
File
|
| 46 |
+
source: [run_glue_no_trainer.py](https://github.com/huggingface/transformers/blob/e9ad51306fdcc3fb79d837d667e21c6d075a2451/examples/pytorch/text-classification/run_glue_no_trainer.py)
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
Credits
|
| 50 |
+
-------
|
| 51 |
+
This research is partially funded by the Palestinian Higher Council for Innovation and Excellence.
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
Citation
|
| 55 |
+
-------
|
| 56 |
+
Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana Ghanem: [ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic](http://www.jarrar.info/publications/JBKEG23.pdf). In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.
|