ethaneng commited on
Commit
aa5ad66
·
verified ·
1 Parent(s): 6c587e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -26
README.md CHANGED
@@ -14,26 +14,34 @@ base_model:
14
 
15
  aviation-ner is a fine-tuned a transformer based model to identify and extract aviation hazards associated with product factors from Service Difficulty Reports. The work is a collaboration between FAA and Boeing data scientist teams. The NER model will enhance searchability and facilitate clustering and trend analysis of safety events.
16
 
17
- Entity Definition Flight Phase (FLT)
18
- It references to the IATA taxonomy that focuses on safety management. IATA includes flight planning and ground servicing phases since these phases can directly impact a flight.
19
 
20
- Product Location (LOC)
 
 
 
21
  A location within the airplane and directional information which disambiguates one Product Factor from another or helps to identify each aircraft component specifically
22
 
23
- Crew Action (ACT)
24
- A task which is/was carried out to attempt to resolve/correct a Product Condition excluding maintenance action. Examples: follow QRH, complied with procedure, run or accomplished procedure, disable/enable systems, turn on/off systems, change state of the airplane or its systems, change flight phase, change flight altitude, etc. Communication related actions such as request, call, notify, and notice are excluded.
 
 
25
 
26
- Product (PROD)
27
  Airplane and components/equipment/systems installed on the delivered product. Typically, this means something that you can touch, hold, remove, replace, control or interact with. Examples: tire pressure, software, navigation database, and cabin pressure.
28
 
29
- Product Condition (PCON)
30
- A specific quality, behavior, or situation with regards to a Product Factor or Product Location. Examples: Smoke, Fire, Fumes, Odor, Loss of Aircraft Control, FOD, Fuel Issue, Gear Up Landing, Ground Strike, Jet Blast, Loss of VLOS
 
 
31
 
32
- Bird strike or Animal strike (BIRD)
33
  Bird strike or a near miss between an aircraft and wildlife, during high-speed take-off or landing. For animal strike, an impact/collision between an aircraft and wildlife (Deer, elk, coyote, fox), during high-speed take-off or landing
34
 
35
- Emergency or Abnormal Situation (SIT)
36
- An emergency situation is one in which the safety of the aircraft or of persons on board or on the ground is endangered for any reason. An abnormal situation is one in which it is no longer possible to continue the flight using normal procedures but the safety of the aircraft or persons on board or on the ground is not in danger. Examples: Evacuated, Flight Cancelled/Delayed, Diverted, Executed Go Around / Missed Approach, Inflight Shutdown, Exited Penetrated Airspace, FLC Overrode Automation, FLC Complied with Automation / Advisory, Landed as Precaution, Landed in Emergency Condition, Overcame Equipment Problem, Regained Aircraft Control, Rejected Takeoff, Requested ATC Assistance/Clarification, Returned to Clearance, Returned to Departure Airport, Returned to Gate, Returned to Home, Took Evasive Action
 
 
37
 
38
 
39
  ## Installation & Usage
@@ -45,23 +53,128 @@ An emergency situation is one in which the safety of the aircraft or of persons
45
  **NuZero requires labels to be lower-cased**
46
 
47
  ```python
 
 
 
48
  from gliner import GLiNER
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
- def merge_entities(entities):
51
- if not entities:
52
- return []
53
- merged = []
54
- current = entities[0]
55
- for next_entity in entities[1:]:
56
- if next_entity['label'] == current['label'] and (next_entity['start'] == current['end'] + 1 or next_entity['start'] == current['end']):
57
- current['text'] = text[current['start']: next_entity['end']].strip()
58
- current['end'] = next_entity['end']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  else:
60
- merged.append(current)
61
- current = next_entity
62
- # Append the last entity
63
- merged.append(current)
64
- return merged
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
 
 
 
66
 
67
- model = GLiNER.from_pretrained("numind/NuNerZero")
 
 
 
 
14
 
15
  aviation-ner is a fine-tuned a transformer based model to identify and extract aviation hazards associated with product factors from Service Difficulty Reports. The work is a collaboration between FAA and Boeing data scientist teams. The NER model will enhance searchability and facilitate clustering and trend analysis of safety events.
16
 
17
+ ## Entity Definition
 
18
 
19
+ 1. Flight Phase (FLT)
20
+ references to the IATA taxonomy that focuses on safety management. IATA includes flight planning and ground servicing phases since these phases can directly impact a flight.
21
+
22
+ 2. Product Location (LOC)
23
  A location within the airplane and directional information which disambiguates one Product Factor from another or helps to identify each aircraft component specifically
24
 
25
+ 3. Crew Action (ACT)
26
+ A task which is/was carried out to attempt to resolve/correct a Product Condition excluding maintenance action.
27
+
28
+ Examples: follow QRH, complied with procedure, run or accomplished procedure, disable/enable systems, turn on/off systems, change state of the airplane or its systems, change flight phase, change flight altitude, etc. Communication related actions such as request, call, notify, and notice are excluded.
29
 
30
+ 4. Product (PROD)
31
  Airplane and components/equipment/systems installed on the delivered product. Typically, this means something that you can touch, hold, remove, replace, control or interact with. Examples: tire pressure, software, navigation database, and cabin pressure.
32
 
33
+ 5. Product Condition (PCON)
34
+ A specific quality, behavior, or situation with regards to a Product Factor or Product Location.
35
+
36
+ Examples: Smoke, Fire, Fumes, Odor, Loss of Aircraft Control, FOD, Fuel Issue, Gear Up Landing, Ground Strike, Jet Blast, Loss of VLOS
37
 
38
+ 6. Bird strike or Animal strike (BIRD)
39
  Bird strike or a near miss between an aircraft and wildlife, during high-speed take-off or landing. For animal strike, an impact/collision between an aircraft and wildlife (Deer, elk, coyote, fox), during high-speed take-off or landing
40
 
41
+ 7. Emergency or Abnormal Situation (SIT)
42
+ An emergency situation is one in which the safety of the aircraft or of persons on board or on the ground is endangered for any reason. An abnormal situation is one in which it is no longer possible to continue the flight using normal procedures but the safety of the aircraft or persons on board or on the ground is not in danger.
43
+
44
+ Examples: Evacuated, Flight Cancelled/Delayed, Diverted, Executed Go Around / Missed Approach, Inflight Shutdown, Exited Penetrated Airspace, FLC Overrode Automation, FLC Complied with Automation / Advisory, Landed as Precaution, Landed in Emergency Condition, Overcame Equipment Problem, Regained Aircraft Control, Rejected Takeoff, Requested ATC Assistance/Clarification, Returned to Clearance, Returned to Departure Airport, Returned to Gate, Returned to Home, Took Evasive Action
45
 
46
 
47
  ## Installation & Usage
 
53
  **NuZero requires labels to be lower-cased**
54
 
55
  ```python
56
+ import pandas as pd
57
+ import re
58
+ import time
59
  from gliner import GLiNER
60
+ from ner_tokenization import NerTokenization
61
+
62
+ class NERTagging:
63
+ labels = ["b-prod", "i-prod", "b-loc", "i-loc", "b-pcon", "i-pcon", "b-sit", "i-sit", "b-act", "i-act", "b-bird", "i-bird", "b-flt", "i-flt"]
64
+
65
+ def __init__(self, model_path):
66
+
67
+ if (os.path.exists(model_path))
68
+ self.model = GLiNER.from_pretrained(model_path, local_files_only=True)
69
+ else: self.model = GLiNER.from_pretrained(model_path)
70
+
71
+ self.tokenizer = NerTokenization()
72
+
73
+ def tokenize_with_offsets(self, text):
74
+
75
+ offsets_d = {}
76
+
77
+ for match in re.finditer(r'\S+', text): # \S+ matches any sequence of non-whitespace characters
78
+ start, end = match.start(), match.end()
79
+ offsets_d[(start, end)] = [match.group(), "O"]
80
 
81
+ return offsets_d
82
+
83
+ def strip_bi(self, tag):
84
+
85
+ if tag == "O":
86
+ base_tag = tag
87
+ else:
88
+ base_tag = tag.split("-")[1]
89
+
90
+ return base_tag
91
+
92
+ def get_list_of_tokens_with_tags(self, entities, d, strip_bi):
93
+
94
+ for this_ent in entities:
95
+ start, end, label = this_ent["start"], this_ent["end"], this_ent["label"]
96
+ k = (start, end)
97
+ if k in d:
98
+ d[k][1] = label
99
+ # else:
100
+ # print("misaligned") # matches currently set to be exact
101
+
102
+ sorted_text = sorted(d.items(), key = lambda tup : tup[0][0])
103
+
104
+ if strip_bi:
105
+ tagged_tokens = [(tup[1][0], self.strip_bi(tup[1][1])) for tup in sorted_text]
106
  else:
107
+ tagged_tokens = [(tup[1][0], (tup[1][1])) for tup in sorted_text]
108
+
109
+ return tagged_tokens
110
+
111
+ def ner_label_main(self, text, strip_bi):
112
+
113
+ text = self.tokenizer.tokenize_string(text)
114
+ entities = self.model.predict_entities(text, NERTagging.labels)
115
+ text_d = self.tokenize_with_offsets(text)
116
+ tups = self.get_list_of_tokens_with_tags(entities, text_d, strip_bi)
117
+ return tups
118
+
119
+ def parse_out_labels_to_dict(self, tups):
120
+
121
+ d = {}
122
+
123
+ temp_tag, temp_entity = None, []
124
+
125
+ for token, tag in tups:
126
+
127
+ if tag != "O": # first check if token is part of entity
128
+
129
+ if tag.startswith("i"): # if not new entity, keep appending to current
130
+
131
+ if temp_tag is None: # handle mislabels where parts start with I
132
+ temp_tag, temp_entity = labeler.strip_bi(tag), [token] # reset
133
+ else:
134
+ temp_entity.append(token)
135
+
136
+ else: # tag starts with B - new entity
137
+
138
+ if temp_tag: # add old entity to d
139
+ if temp_tag not in d:
140
+ d[temp_tag] = []
141
+ d[temp_tag].append(" ".join(temp_entity))
142
+
143
+ temp_tag, temp_entity = labeler.strip_bi(tag), [token] # reset
144
+
145
+ else: # tag is "o"
146
+
147
+ if temp_tag: # add old entity to d
148
+ if temp_tag not in d:
149
+ d[temp_tag] = []
150
+ d[temp_tag].append(" ".join(temp_entity))
151
+
152
+ temp_tag, temp_entity = None, [] # reset
153
+
154
+ if temp_entity:
155
+ if temp_tag not in d:
156
+ d[temp_tag] = []
157
+ d[temp_tag].append(" ".join(temp_entity))
158
+
159
+ return d
160
+
161
+ if __name__ == "__main__":
162
+
163
+ model_path = "boeing/aviation-ner"
164
+ labeler = NERTagging(model_path)
165
+
166
+ # list of strings
167
+ all_text = ["A Cargojet Boeing 767-300 on behalf of Amazon Prime Air, registration C-GAZI performing flight W8-2387 (dep Nov 18th) from Hamilton,ON to Vancouver,BC (Canada), had declared PAN PAN prior to landing reporting flaps problems, they would land at a higher speed than normal, prompting emergency services to assume their standby locations. The aircraft landed on Vancouver's runway 08L at 01:28L (09:28Z) at a higher than normal speed (about 175 knots over ground), overran the runway by about 572 meters/1880 feet and suffered the collapse of the nose gear, the crew declared Mayday after coming to a stop. Both runways were closed following the runway excursion, runway 08R had been closed for works, runway 08L needed to be closed due to the occurrence, runway 08R was opened following the occurrence."]
168
+
169
+ # entity tags
170
+ tags = ["prod", "loc", "pcon", "sit", "act", "bird", "flt"]
171
+ for i, this_text in enumerate(all_text):
172
 
173
+ # tuples of tokens and tags
174
+ token_tag_tups = labeler.ner_label_main(this_text, strip_bi=False)
175
+ print(token_tag_tups)
176
 
177
+ # dictionary of tags: mentions from this_text
178
+ entity_dict = labeler.parse_out_labels_to_dict(token_tag_tups)
179
+ entity_dict = {key: ", ".join(value) for key, value in entity_dict.items()}
180
+ print(entity_dict)