edchengg commited on
Commit
6a7eb46
·
2 Parent(s): 8f31827 e040700

Merge branch 'main' of https://huggingface.co/ychenNLP/arabic-relation-extraction into main

Browse files
Files changed (1) hide show
  1. README.md +43 -23
README.md CHANGED
@@ -12,18 +12,18 @@ datasets:
12
  # Arabic Relation Extraction Model
13
  - [Github repo](https://github.com/edchengg/GigaBERT)
14
  - Relation Extraction model based on [GigaBERTv4](https://huggingface.co/lanwuwei/GigaBERT-v4-Arabic-and-English).
 
15
  - ACE2005 Training data: Arabic
16
  - [Relation tags](https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/arabic-relations-guidelines-v6.5.pdf) including: Physical, Part-whole, Personal-Social, ORG-Affiliation, Agent-Artifact, Gen-Affiliation
17
-
18
  ## Hyperparameters
19
  - learning_rate=2e-5
20
  - num_train_epochs=10
21
  - weight_decay=0.01
22
 
23
- ## ACE2005 Evaluation results (F1)
24
- | Language | Arabic |
25
- |:----:|:-----------:|
26
- | | 89.4 |
27
 
28
  ## How to use
29
  Workflow of a relation extraction model:
@@ -68,7 +68,7 @@ def process_ner_output(entity_mention, inputs):
68
  re_input.append({"re_input": new_re_input, "arg1": ent_1, "arg2": ent_2, "input": inputs})
69
  return re_input
70
 
71
- def post_process_re_output(re_output, re_input, ner_output):
72
  final_output = []
73
  for idx, out in enumerate(re_output):
74
  if out["label"] != 'O':
@@ -77,28 +77,48 @@ def post_process_re_output(re_output, re_input, ner_output):
77
  tmp.pop('re_input', None)
78
  final_output.append(tmp)
79
 
80
- template = {"input": re_input["input"],
81
  "entity": ner_output,
82
  "relation": final_output}
83
 
84
  return template
85
 
86
- >>> input = "Hugging face is a French company in New york."
87
- >>> output = ner_pip(input) # inference NER tags
88
-
89
- >>> re_input = process_ner_output(output, input) # prepare a pair of entity and predict relation type
90
-
91
- >>> re_output = []
92
- >>> for idx in range(len(re_input)):
93
- >>> tmp_re_output = re_pip(re_input[idx]["re_input"]) # for each pair of entity, predict relation
94
- >>> re_output.append(tmp_re_output)
95
-
96
-
97
-
98
- >>> re_ner_output = post_process_re_output(re_output) # post process NER and relation predictions
99
- >>> print("Sentence: ",re_ner_output["input"])
100
- >>> print("Entity: ", re_ner_output["entity"])
101
- >>> print("Relation: ", re_ner_output["relation"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  ```
103
 
104
  ### BibTeX entry and citation info
 
12
  # Arabic Relation Extraction Model
13
  - [Github repo](https://github.com/edchengg/GigaBERT)
14
  - Relation Extraction model based on [GigaBERTv4](https://huggingface.co/lanwuwei/GigaBERT-v4-Arabic-and-English).
15
+ - Model detail: mark two entities in the sentence with special markers (e.g., ```XXXX <PER> entity1 </PER> XXXXXXX <ORG> entity2 </ORG> XXXXX```). Then we use the BERT [CLS] representation to make a prediction.
16
  - ACE2005 Training data: Arabic
17
  - [Relation tags](https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/arabic-relations-guidelines-v6.5.pdf) including: Physical, Part-whole, Personal-Social, ORG-Affiliation, Agent-Artifact, Gen-Affiliation
 
18
  ## Hyperparameters
19
  - learning_rate=2e-5
20
  - num_train_epochs=10
21
  - weight_decay=0.01
22
 
23
+ ## ACE2005 Evaluation results (F1) - using gold entities
24
+ | Language | Arabic | English |
25
+ |:----:|:-----------:|:-----------:|
26
+ | | 72.6 | 72.1|
27
 
28
  ## How to use
29
  Workflow of a relation extraction model:
 
68
  re_input.append({"re_input": new_re_input, "arg1": ent_1, "arg2": ent_2, "input": inputs})
69
  return re_input
70
 
71
+ def post_process_re_output(re_output, text_input, ner_output):
72
  final_output = []
73
  for idx, out in enumerate(re_output):
74
  if out["label"] != 'O':
 
77
  tmp.pop('re_input', None)
78
  final_output.append(tmp)
79
 
80
+ template = {"input": text_input,
81
  "entity": ner_output,
82
  "relation": final_output}
83
 
84
  return template
85
 
86
+ text_input = 'قال وزير العدل التركي بكير بوزداغ إن أنقرة تريد 12 مشتبهاً بهم من فنلندا و 21 من السويد'
87
+ ner_output = ner_pip(text_input) # inference NER tags
88
+
89
+ re_input = process_ner_output(ner_output, text_input) # prepare a pair of entity and predict relation type
90
+
91
+ re_output = []
92
+ for idx in range(len(re_input)):
93
+ tmp_re_output = re_pip(re_input[idx]["re_input"]) # for each pair of entity, predict relation
94
+ re_output.append(tmp_re_output[0])
95
+
96
+
97
+
98
+ re_ner_output = post_process_re_output(re_output, text_input, ner_output) # post process NER and relation predictions
99
+ print("Sentence: ",re_ner_output["input"])
100
+ print('====Entity====')
101
+ for ent in re_ner_output["entity"]:
102
+ print('{}--{}'.format(ent["word"], ent["entity_group"]))
103
+ print('====Relation====')
104
+ for rel in re_ner_output["relation"]:
105
+ print('{}--{}:{}'.format(rel['arg1']['word'], rel['arg2']['word'], rel['relation_type']['label']))
106
+
107
+ Sentence: قال وزير العدل التركي بكير بوزداغ إن أنقرة تريد 12 مشتبهاً بهم من فنلندا و 21 من السويد
108
+ ====Entity====
109
+ وزير--PER
110
+ العدل--ORG
111
+ التركي--GPE
112
+ بكير بوزداغ--PER
113
+ انقرة--GPE
114
+ مشتبها بهم--PER
115
+ فنلندا--GPE
116
+ 21--PER
117
+ السويد--GPE
118
+ ====Relation====
119
+ وزير--العدل:ORG-AFF
120
+ مشتبها بهم--فنلندا:PHYS
121
+ 21--السويد:PHYS
122
  ```
123
 
124
  ### BibTeX entry and citation info