Muhsabrys commited on
Commit
1791be7
ยท
verified ยท
1 Parent(s): a7ea4bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -213
README.md CHANGED
@@ -1,4 +1,3 @@
1
- ```yaml
2
  ---
3
  language:
4
  - ar
@@ -42,215 +41,3 @@ model-index:
42
  year: 2025
43
  url: https://aclanthology.org/2025.finnlp-1.20
44
  ---
45
- ```
46
-
47
- # AMWAL: Arabic Financial Named Entity Recognition (NER)
48
-
49
- ## Quick Start
50
-
51
- ### Install (recommended)
52
-
53
- ```bash
54
- pip install git+https://huggingface.co/Muhsabrys/AMWAL-ner-arabic
55
- ```
56
-
57
- ```python
58
- from amwal import load_ner
59
-
60
- ner = load_ner()
61
-
62
- text = "ุฃุนู„ู† ุตู†ุฏูˆู‚ ู‚ุทุฑ ุงู„ุณูŠุงุฏูŠ ุนู† ุงุณุชุซู…ุงุฑ ุจู‚ูŠู…ุฉ 500 ู…ู„ูŠูˆู† ุฏูˆู„ุงุฑ ุฃู…ุฑูŠูƒูŠ ููŠ ุณู†ุฏุงุช ุญูƒูˆู…ูŠุฉ ูŠุงุจุงู†ูŠุฉ ู…ู‚ูˆู…ุฉ ุจุงู„ูŠู† ููŠ ุทูˆูƒูŠูˆ."
63
- result = ner(text)
64
-
65
- print(result["entities"])
66
- ```
67
-
68
- ---
69
-
70
- ## Model Summary
71
-
72
- **AMWAL** is a **spaCy-based Named Entity Recognition (NER) system** designed for extracting **financial entities from Arabic text**, with a primary focus on **Arabic financial news and reports**.
73
-
74
- The model addresses challenges specific to Arabic financial NLP, including orthographic variation, domain-specific terminology, and the scarcity of annotated financial resources for Arabic.
75
-
76
- ---
77
-
78
- ## Intended Use
79
-
80
- AMWAL is intended for:
81
-
82
- * Arabic financial news analysis
83
- * Information extraction from financial reports
84
- * Financial text preprocessing
85
- * Academic research in Arabic NLP and finance
86
- * Data enrichment for financial knowledge graphs
87
-
88
- It is **not intended** for:
89
-
90
- * General-purpose Arabic NER
91
- * Non-financial domains
92
- * Direct use with Hugging Face Transformers APIs
93
-
94
- ---
95
-
96
- ## Data Collection and Annotation
97
-
98
- A specialized Arabic financial corpus was constructed from **three major Arabic financial newspapers**, covering the period **2000โ€“2023**.
99
-
100
- The annotation process followed a **semi-automatic workflow**:
101
-
102
- 1. Automatic candidate entity extraction
103
- 2. Manual annotation
104
- 3. Expert review and correction
105
-
106
- The final dataset contains:
107
-
108
- * **17.1K annotated entity tokens**
109
- * **21 financial entity categories**
110
- * Consistent domain coverage across multiple time periods
111
-
112
- ---
113
-
114
- ## Entity Schema and Standardization
115
-
116
- Entity categories were standardized using concepts from the
117
- **Financial Industry Business Ontology (FIBO, 2020)** to ensure conceptual consistency and compatibility with structured financial representations.
118
-
119
- ---
120
-
121
- ## Model Architecture and Training
122
-
123
- * **Framework:** spaCy
124
- * **Pipeline:** Custom Named Entity Recognition (NER)
125
- * **Domain:** Arabic financial text
126
-
127
- The model was trained on the annotated corpus using spaCyโ€™s NER pipeline.
128
- To mitigate sparsity caused by Arabic orthographic variation, normalization was applied consistently during training and inference.
129
-
130
- ---
131
-
132
- ## Arabic Normalization
133
-
134
- The following normalization steps are applied **internally during inference**, matching the training setup:
135
-
136
- * Removal of all diacritics
137
- * Character normalization:
138
-
139
- * `ุฅ`, `ุฃ`, `ุข` โ†’ `ุง`
140
- * `ุค`, `ุฆ` โ†’ `ุก`
141
- * `ุฉ` โ†’ `ู‡`
142
- * `ู‰` โ†’ `ูŠ`
143
-
144
- The original input text is always preserved and returned as `raw_text`.
145
-
146
- ---
147
-
148
- ## Entity Types
149
-
150
- The model recognizes **21 financial entity types**, including (but not limited to):
151
-
152
- * `COUNTRY`
153
- * `CITY`
154
- * `CURRENCY`
155
- * `FINANCIAL_INSTRUMENT`
156
- * `BANK`
157
- * `ORGANIZATION`
158
- * `NATIONALITY`
159
- * `EVENT`
160
- * `TIME`
161
- * `QUANTITY_OR_UNIT`
162
-
163
- ---
164
-
165
- ## Evaluation Results
166
-
167
- | Metric | Score |
168
- | --------- | ---------- |
169
- | Precision | **96.08%** |
170
- | Recall | **95.87%** |
171
- | F1-score | **95.97%** |
172
-
173
- ---
174
-
175
- ## Usage
176
-
177
- ### Option 1 โ€” Install via `pip` (recommended)
178
-
179
- ```bash
180
- pip install git+https://huggingface.co/Muhsabrys/AMWAL-ner-arabic
181
- ```
182
-
183
- ```python
184
- from amwal import load_ner
185
-
186
- ner = load_ner()
187
- result = ner("ู†ุต ุนุฑุจูŠ ู…ุงู„ูŠ")
188
- ```
189
-
190
- ---
191
-
192
- ### Option 2 โ€” Use directly from Hugging Face (no installation)
193
-
194
- ```python
195
- from huggingface_hub import snapshot_download
196
- import sys
197
-
198
- repo_path = snapshot_download("Muhsabrys/AMWAL-ner-arabic")
199
- sys.path.append(repo_path)
200
-
201
- from amwal import load_ner
202
-
203
- ner = load_ner(local_path=repo_path)
204
- result = ner("ู†ุต ุนุฑุจูŠ ู…ุงู„ูŠ")
205
- ```
206
-
207
- ---
208
-
209
- ## Output Format
210
-
211
- ```json
212
- {
213
- "raw_text": "...",
214
- "normalized_text": "...",
215
- "entities": [
216
- {
217
- "text": "ู‚ุทุฑ",
218
- "label": "COUNTRY",
219
- "start": 11,
220
- "end": 14
221
- }
222
- ]
223
- }
224
- ```
225
-
226
- ---
227
-
228
- ## Limitations
229
-
230
- * Domain-specific to financial text
231
- * Not suitable for general-purpose Arabic NER
232
- * Does not model relations between entities
233
- * Not compatible with Hugging Face Transformers APIs
234
-
235
- ---
236
-
237
- ## Future Work
238
-
239
- * Expanding the annotated corpus
240
- * Introducing hierarchical entity structures
241
- * Modeling relations between financial entities
242
- * Constructing an Arabic financial knowledge graph
243
-
244
- ---
245
-
246
- ## Citation
247
-
248
- ```bibtex
249
- @inproceedings{abdo2025amwal,
250
- title={AMWAL: Named Entity Recognition for Arabic Financial News},
251
- author={Abdo, Muhammad S and Hatekar, Yash and {\'C}avar, Damir},
252
- booktitle={Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)},
253
- pages={207--213},
254
- year={2025}
255
- }
256
- ```
 
 
1
  ---
2
  language:
3
  - ar
 
41
  year: 2025
42
  url: https://aclanthology.org/2025.finnlp-1.20
43
  ---