Update README.md
Browse files
README.md
CHANGED
|
@@ -10,60 +10,120 @@ tags:
|
|
| 10 |
datasets:
|
| 11 |
- SinaLab/ArBanking77
|
| 12 |
---
|
| 13 |
-
## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
|
| 14 |
|
| 15 |
https://www.jarrar.info/publications/JBKEG23.pdf
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
|
| 23 |
|
| 24 |
ArBanking77 Corpus
|
| 25 |
--------
|
| 26 |
-
ArBanking77 consists of 31,404 (MSA and Palestinian
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
|
| 29 |
-
Corpus Download
|
| 30 |
--------
|
| 31 |
-
A sample data is available in the `data` directory.
|
| 32 |
-
available to download upon request for academic and commercial use.
|
| 33 |
-
|
|
|
|
| 34 |
|
| 35 |
-
[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
|
| 36 |
|
| 37 |
Model Download
|
| 38 |
--------
|
| 39 |
-
|
| 40 |
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
-
|
| 43 |
--------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
```
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
Credits
|
| 63 |
-------
|
| 64 |
-
This research
|
|
|
|
|
|
|
| 65 |
|
| 66 |
|
| 67 |
Citation
|
| 68 |
-------
|
| 69 |
-
Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana
|
|
|
|
|
|
|
|
|
| 10 |
datasets:
|
| 11 |
- SinaLab/ArBanking77
|
| 12 |
---
|
|
|
|
| 13 |
|
| 14 |
https://www.jarrar.info/publications/JBKEG23.pdf
|
| 15 |
|
| 16 |
+
## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
|
| 17 |
+
======================
|
| 18 |
+
ArBanking77 is an MSA and Dialectal Arabic Corpus for Arabic Intent Detection in Banking Domain. It consists of 31,404
|
| 19 |
+
samples (MSA and Palestinian dialects). This repo contains the source-code and sample dataset to train and evaluate
|
| 20 |
+
Arabic Intent Detection model.
|
| 21 |
|
| 22 |
|
| 23 |
ArBanking77 Corpus
|
| 24 |
--------
|
| 25 |
+
ArBanking77 consists of 31,404 (MSA and Palestinian dialects) that are manually Arabized and localized from the original
|
| 26 |
+
English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (
|
| 27 |
+
intents) including card arrival, card linking, exchange rate, and automatic top-up. You can find the list of these 77
|
| 28 |
+
intents in the `./data/Banking77_intents.csv` file. A neural model based on AraBERT was fine-tuned on the ArBanking77
|
| 29 |
+
dataset (F1-score 92% for MSA, 90% for PAL)
|
| 30 |
|
| 31 |
|
| 32 |
+
Full Corpus Download
|
| 33 |
--------
|
| 34 |
+
A sample data is available in the `data` directory. However, the entire ArBanking77 corpus is
|
| 35 |
+
available to download upon request for academic and commercial use. However, we cannot provide the augmented data.
|
| 36 |
+
|
| 37 |
+
[Request to download ArBanking77 (corpus and the model)](https://sina.birzeit.edu/arbanking77/)
|
| 38 |
|
|
|
|
| 39 |
|
| 40 |
Model Download
|
| 41 |
--------
|
| 42 |
+
[SinaLab HuggingFace](https://huggingface.co/SinaLab/ArBanking77)
|
| 43 |
|
| 44 |
+
Online Demo
|
| 45 |
+
--------
|
| 46 |
+
You can try our model using this [demo link](https://sina.birzeit.edu/arbanking77/).
|
| 47 |
|
| 48 |
+
Requirements
|
| 49 |
--------
|
| 50 |
+
At this point, the code is compatible with `Python 3.11`
|
| 51 |
+
|
| 52 |
+
Clone this repo
|
| 53 |
+
|
| 54 |
+
git clone https://github.com/SinaLab/ArabicNER.git
|
| 55 |
|
| 56 |
+
This package has dependencies on multiple Python packages. It is recommended to Conda to create a new environment
|
| 57 |
+
that mimics the same environment the model was trained in. Provided in this repo `requirements.txt` from which you
|
| 58 |
+
can create a new conda environment using the command below.
|
| 59 |
+
|
| 60 |
+
conda create -n env_name python=3.11
|
| 61 |
+
|
| 62 |
+
Install requirements using pip command:
|
| 63 |
+
|
| 64 |
+
pip install -r requirements.txt
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
Project Structure
|
| 68 |
+
--------
|
| 69 |
+
```
|
| 70 |
+
.
|
| 71 |
+
βββ data <- data dir
|
| 72 |
+
β βββ Banking77_Arabized_MSA_PAL_train_sample.csv
|
| 73 |
+
β βββ Banking77_Arabized_MSA_PAL_val_sample.csv
|
| 74 |
+
β βββ Banking77_Arabized_MSA_test_sample.csv
|
| 75 |
+
β βββ Banking77_Arabized_PAL_test_sample.csv
|
| 76 |
+
β βββ Banking77_intents.csv
|
| 77 |
+
βββ outputs
|
| 78 |
+
β βββ models <- trained models
|
| 79 |
+
β βββ results <- evaluation results and reports
|
| 80 |
+
βββ src <- training and evaluation scripts
|
| 81 |
+
β βββ run_glue_no_trainer.py
|
| 82 |
+
β βββ run_glue_no_trainer_eval.py
|
| 83 |
+
β βββ utils.py
|
| 84 |
+
βββ .gitignore
|
| 85 |
+
βββ LICENSE
|
| 86 |
+
βββ README.md
|
| 87 |
+
βββ requirements.txt
|
| 88 |
```
|
| 89 |
|
| 90 |
+
Model Training
|
| 91 |
+
--------
|
| 92 |
+
You can start model training by running the following command. It's recommended to pass the arguments demonstrated below
|
| 93 |
+
to get results similar to the ones reported in the paper.
|
| 94 |
+
|
| 95 |
+
python ./src/run_glue_no_trainer.py
|
| 96 |
+
--model_name_or_path aubmindlab/bert-base-arabertv02
|
| 97 |
+
--train_file ./data/Banking77_Arabized_MSA_PAL_train_sample.csv
|
| 98 |
+
--validation_file ./data/Banking77_Arabized_MSA_PAL_val_sample.csv
|
| 99 |
+
--seed 42
|
| 100 |
+
--max_length 128
|
| 101 |
+
--learning_rate 4e-5
|
| 102 |
+
--num_train_epochs 20
|
| 103 |
+
--per_device_train_batch_size 64
|
| 104 |
+
--output_dir ./outputs/models
|
| 105 |
+
|
| 106 |
+
Evaluation
|
| 107 |
+
--------
|
| 108 |
+
Additionally, you can evaluate the trained model on `Banking77_Arabized_MSA_test_sample.csv` and `Banking77_Arabized_PAL_test_sample.csv` test sets as follows:
|
| 109 |
|
| 110 |
+
python ./src/run_glue_no_trainer_eval.py
|
| 111 |
+
--model_name_or_path ./outputs/models
|
| 112 |
+
--validation_file ./data/Banking77_Arabized_MSA_test_sample.csv
|
| 113 |
+
--seed 42
|
| 114 |
+
--per_device_eval_batch_size 64
|
| 115 |
+
--results_dir ./outputs/results
|
| 116 |
+
--log_path ./outputs/logs/log.txt
|
| 117 |
|
| 118 |
Credits
|
| 119 |
-------
|
| 120 |
+
This research was funded by the Palestinian Higher Council for Innovation and Excellence and the Scientific and
|
| 121 |
+
Technological Research Council of TΓΌrkiye (TΓBΔ°TAK) under project No. 120N761 - CONVERSER: Conversational AI System for
|
| 122 |
+
Arabic.
|
| 123 |
|
| 124 |
|
| 125 |
Citation
|
| 126 |
-------
|
| 127 |
+
Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana
|
| 128 |
+
Ghanem: [ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic](http://www.jarrar.info/publications/JBKEG23.pdf).
|
| 129 |
+
In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.
|