update README
Browse files- README.md +24 -3
- assets/marefa-tebyan.png +0 -0
README.md
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
language: ar
|
| 3 |
datasets:
|
|
@@ -9,6 +10,9 @@ widget:
|
|
| 9 |
# Tebyan تبيـان
|
| 10 |
## Marefa Arabic Named Entity Recognition Model
|
| 11 |
## نموذج المعرفة لتصنيف أجزاء النص
|
|
|
|
|
|
|
|
|
|
| 12 |
---------
|
| 13 |
**Version**: 1.3
|
| 14 |
|
|
@@ -31,7 +35,7 @@ Person, Location, Organization, Nationality, Job, Product, Event, Time, Art-Work
|
|
| 31 |
|
| 32 |
*You can test the model quickly by checking this [Colab notebook](https://colab.research.google.com/drive/1OGp9Wgm-oBM5BBhTLx6Qow4dNRSJZ-F5?usp=sharing)*
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
Install the following Python packages
|
| 37 |
|
|
@@ -43,8 +47,6 @@ Install the following Python packages
|
|
| 43 |
-----------
|
| 44 |
|
| 45 |
```python
|
| 46 |
-
|
| 47 |
-
# ==== Set configurations
|
| 48 |
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
| 49 |
import torch
|
| 50 |
|
|
@@ -170,6 +172,25 @@ Output
|
|
| 170 |
|
| 171 |
Check this [notebook](https://colab.research.google.com/drive/1WUYrnmDFFEItqGMvbyjqZEJJqwU7xQR-?usp=sharing) to fine-tune the NER model
|
| 172 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
## Acknowledgment شكر و تقدير
|
| 174 |
|
| 175 |
قام بإعداد البيانات التي تم تدريب النموذج عليها, مجموعة من المتطوعين الذين قضوا ساعات يقومون بتنقيح البيانات و مراجعتها
|
|
|
|
| 1 |
+
|
| 2 |
---
|
| 3 |
language: ar
|
| 4 |
datasets:
|
|
|
|
| 10 |
# Tebyan تبيـان
|
| 11 |
## Marefa Arabic Named Entity Recognition Model
|
| 12 |
## نموذج المعرفة لتصنيف أجزاء النص
|
| 13 |
+
|
| 14 |
+

|
| 15 |
+
|
| 16 |
---------
|
| 17 |
**Version**: 1.3
|
| 18 |
|
|
|
|
| 35 |
|
| 36 |
*You can test the model quickly by checking this [Colab notebook](https://colab.research.google.com/drive/1OGp9Wgm-oBM5BBhTLx6Qow4dNRSJZ-F5?usp=sharing)*
|
| 37 |
|
| 38 |
+
----
|
| 39 |
|
| 40 |
Install the following Python packages
|
| 41 |
|
|
|
|
| 47 |
-----------
|
| 48 |
|
| 49 |
```python
|
|
|
|
|
|
|
| 50 |
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
| 51 |
import torch
|
| 52 |
|
|
|
|
| 172 |
|
| 173 |
Check this [notebook](https://colab.research.google.com/drive/1WUYrnmDFFEItqGMvbyjqZEJJqwU7xQR-?usp=sharing) to fine-tune the NER model
|
| 174 |
|
| 175 |
+
## Evaluation
|
| 176 |
+
|
| 177 |
+
We tested the model agains a test set of 1959 sentences. The results is in the follwing table
|
| 178 |
+
|
| 179 |
+
| type | f1-score | precision | recall | support |
|
| 180 |
+
|:-------------|-----------:|------------:|---------:|----------:|
|
| 181 |
+
| person | 0.93298 | 0.931479 | 0.934487 | 4335 |
|
| 182 |
+
| location | 0.891537 | 0.896926 | 0.886212 | 4939 |
|
| 183 |
+
| time | 0.873003 | 0.876087 | 0.869941 | 1853 |
|
| 184 |
+
| nationality | 0.871246 | 0.843153 | 0.901277 | 2350 |
|
| 185 |
+
| job | 0.837656 | 0.79912 | 0.880097 | 2477 |
|
| 186 |
+
| organization | 0.781317 | 0.773328 | 0.789474 | 2299 |
|
| 187 |
+
| event | 0.686695 | 0.733945 | 0.645161 | 744 |
|
| 188 |
+
| artwork | 0.653552 | 0.678005 | 0.630802 | 474 |
|
| 189 |
+
| product | 0.625483 | 0.553531 | 0.718935 | 338 |
|
| 190 |
+
| **weighted avg** | 0.859008 | 0.852365 | 0.86703 | 19809 |
|
| 191 |
+
| **micro avg** | 0.858771 | 0.850669 | 0.86703 | 19809 |
|
| 192 |
+
| **macro avg** | 0.79483 | 0.787286 | 0.806265 | 19809 |
|
| 193 |
+
|
| 194 |
## Acknowledgment شكر و تقدير
|
| 195 |
|
| 196 |
قام بإعداد البيانات التي تم تدريب النموذج عليها, مجموعة من المتطوعين الذين قضوا ساعات يقومون بتنقيح البيانات و مراجعتها
|
assets/marefa-tebyan.png
ADDED
|