File size: 7,459 Bytes
6e177ff
3c70de3
 
ae03034
3c70de3
 
 
 
 
 
 
 
 
 
 
 
 
0a490ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a7ea4bc
cd3e073
8c6acc1
97378f5
 
 
 
 
 
 
 
 
8c6acc1
 
 
 
 
 
 
 
427a2d3
8c6acc1
 
 
 
 
 
 
520ace2
8c6acc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
427a2d3
8c6acc1
 
 
 
 
 
520ace2
 
 
 
8c6acc1
 
 
 
 
 
 
 
 
 
427a2d3
8c6acc1
 
 
 
 
f280792
 
8c6acc1
 
 
 
 
 
 
 
1ffb8fa
f280792
 
1ffb8fa
f280792
 
 
 
 
1ffb8fa
f280792
 
1ffb8fa
f280792
 
1ffb8fa
f280792
 
1ffb8fa
 
 
 
 
 
 
f280792
 
1ffb8fa
f280792
 
1ffb8fa
8c6acc1
f280792
1ffb8fa
f280792
 
8c6acc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
---
language:
  - ar
license: apache-2.0
pipeline_tag: token-classification
library_name: spacy
tags:
  - arabic
  - named-entity-recognition
  - ner
  - finance
  - financial-ner
  - spacy
  - information-extraction
  - ontology-aligned
datasets:
  - custom
model-index:
  - name: AMWAL
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        dataset:
          name: Arabic Financial News Corpus
          type: custom
        metrics:
          - type: precision
            value: 0.9608
          - type: recall
            value: 0.9587
          - type: f1
            value: 0.9597
    paper:
      title: "AMWAL: Named Entity Recognition for Arabic Financial News"
      authors:
        - Muhammad S. Abdo
        - Yash Hatekar
        - Damir Cavar
      conference: "FinNLP / FNP / LLMFinLegal Workshop (2025)"
      year: 2025
      url: https://aclanthology.org/2025.finnlp-1.20
---

## 📄 Associated Paper

This model is described in the following paper:

**AMWAL: Named Entity Recognition for Arabic Financial News**  
Muhammad S. Abdo, Yash Hatekar, Damir Cavar  

ACL Anthology: https://aclanthology.org/2025.finnlp-1.20


# AMWAL: Arabic Financial Named Entity Recognition (NER)

## Quick Start

### Install (recommended)

```bash
pip install git+https://huggingface.co/Muhsabrys/AMWAL_ArFinNER
```

```python
from amwal import load_ner

ner = load_ner()

text = "يطرح البنك المركزي المصري، بعد غد، سندات خزانة ثابتة ومتغيرة العائد بقيمة 45 مليار جنيه"
result = ner(text)

print(result["entities"])
```

---

## Model Summary

**AMWAL** is a **spaCy-based Named Entity Recognition (NER) system** designed for extracting **financial entities from Arabic text**, with a primary focus on **Arabic financial news and reports**.

The model addresses challenges specific to Arabic financial NLP, including orthographic variation, domain-specific terminology, and the scarcity of annotated financial resources for Arabic.

---

## Intended Use

AMWAL is intended for:

* Arabic financial news analysis
* Information extraction from financial reports
* Financial text preprocessing
* Academic research in Arabic NLP and finance
* Data enrichment for financial knowledge graphs

It is **not intended** for:

* General-purpose Arabic NER
* Non-financial domains
* Direct use with Hugging Face Transformers APIs

---

## Data Collection and Annotation

A specialized Arabic financial corpus was constructed from **three major Arabic financial newspapers**, covering the period **2000–2023**.

The annotation process followed a **semi-automatic workflow**:

1. Automatic candidate entity extraction
2. Manual annotation
3. Expert review and correction

The final dataset contains:

* **17.1K annotated entity tokens**
* **21 financial entity categories**
* Consistent domain coverage across multiple time periods

---

## Entity Schema and Standardization

Entity categories were standardized using concepts from the
**Financial Industry Business Ontology (FIBO, 2020)** to ensure conceptual consistency and compatibility with structured financial representations.

---

## Model Architecture and Training

* **Framework:** spaCy
* **Pipeline:** Custom Named Entity Recognition (NER)
* **Domain:** Arabic financial text

The model was trained on the annotated corpus using spaCy’s NER pipeline.
To mitigate sparsity caused by Arabic orthographic variation, normalization was applied consistently during training and inference.

---

## Arabic Normalization

The following normalization steps are applied **internally during inference**, matching the training setup:

* Removal of all diacritics
* Character normalization:

  * `إ`, `أ`, `آ``ا`
  * `ؤ`, `ئ``ء`
  * `ة``ه`
  * `ى``ي`

The original input text is always preserved and returned as `raw_text`.

---

## Entity Types

The model recognizes **21 financial entity types**, including (but not limited to):

* `COUNTRY`
* `CITY`
* `CURRENCY`
* `FINANCIAL_INSTRUMENT`
* `BANK`
* `ORGANIZATION`
* `NATIONALITY`
* `EVENT`
* `TIME`
* `QUANTITY_OR_UNIT`

---

## Evaluation Results

The model was evaluated on a held-out test set using standard NER metrics:

| Metric    | Score      |
| --------- | ---------- |
| Precision | **96.08%** |
| Recall    | **95.87%** |
| F1-score  | **95.97%** |

These results are competitive with reported financial NER systems in other languages, despite the additional challenges posed by Arabic morphology and orthography.

---

## Usage

AMWAL supports **two officially supported usage modes**.

### Option 1 — Install via `pip` (recommended)

```bash
pip install git+https://huggingface.co/Muhsabrys/AMWAL_ArFinNER
```

```python
from amwal import load_ner

ner = load_ner()
result = ner("يطرح البنك المركزي المصري، بعد غد، سندات خزانة ثابتة ومتغيرة العائد بقيمة 45 مليار جنيه")
print(result["entities"])
[{'text': 'البنك المركزي المصري', 'label': 'BANK', 'start': 5, 'end': 25}, {'text': 'سندات', 'label': 'FINANCIAL_INSTRUMENT', 'start': 35, 'end': 40}, {'text': '45 مليار', 'label': 'QUNATITY_OR_UNIT', 'start': 74, 'end': 82}, {'text': 'جنيه', 'label': 'CURRENCY', 'start': 83, 'end': 87}]

```

---

### Option 2 — Use directly from Hugging Face (no installation)

```python
from huggingface_hub import snapshot_download
import sys

repo_path = snapshot_download("Muhsabrys/AMWAL_ArFinNER")
sys.path.append(repo_path)

from amwal import load_ner

ner = load_ner(local_path=repo_path)
result = ner("الصادرات البترولية المصرية ترتفع إلى 3.6 مليار دولار خلال 9 أشهر")
print(result["entities"])
```

---

## Output Format

```json
{
  "entities_in_order": [
    {
      "text": "الصادرات",
      "label": "Events",
      "start": 1,
      "end": 9
    },
    {
      "text": "البتروليه",
      "label": "PRODUCT_OR_SERVICE",
      "start": 10,
      "end": 19
    },
    {
      "text": "المصريه",
      "label": "NATIONALITY",
      "start": 20,
      "end": 27
    },
    {
      "text": "ترتفع",
      "label": "Events",
      "start": 28,
      "end": 33
    },
    {
      "text": "مليار",
      "label": "QUNATITY_OR_UNIT",
      "start": 42,
      "end": 47
    },
    {
      "text": "دولار",
      "label": "CURRENCY",
      "start": 48,
      "end": 53
    }
  ]
}
```

---

## Limitations

* Domain-specific to financial text
* Not suitable for general-purpose Arabic NER
* Does not model relations between entities
* Not compatible with Hugging Face Transformers APIs

---

## Future Work

Planned future directions include:

* Expanding the annotated corpus
* Introducing hierarchical entity structures
* Modeling relations between financial entities
* Constructing an Arabic financial knowledge graph

---

## Citation

```bibtex
@inproceedings{abdo2025amwal,
  title={AMWAL: Named Entity Recognition for Arabic Financial News},
  author={Abdo, Muhammad S and Hatekar, Yash and {\'C}avar, Damir},
  booktitle={Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)},
  pages={207--213},
  year={2025}
}
```