Aviation-ner is a fine-tuned transformer based model to identify and extract aviation hazards associated with product factors from Service Difficulty Reports. SDRs are submitted via the service difficulty reporting system by operators or certified repair stations as a means to document and share information with the aviation community about failures, malfunctions, or defects of aeronautical products. The free-form text description field often contains valuable safety related information, however it lacks predictable grammatical structure and is not in any way standardized. Additionally, it can contain typographical errors, part numbers, abbreviations, and references to specific sections of maintenance manuals or operating procedures, making it difficult to reliably extract this information with regular expressions or language models designed to take in clean, full sentences as input.
The work is a collaboration between FAA and Boeing data scientist teams. The NER model will enhance searchability and facilitate clustering and trend analysis of safety events.
Entity Definition
Flight Phase (FLT) references to the IATA taxonomy that focuses on safety management. IATA includes flight planning and ground servicing phases since these phases can directly impact a flight.
Product Location (LOC) A location within the airplane and directional information which disambiguates one Product Factor from another or helps to identify each aircraft component specifically
Crew Action (ACT) A task which is/was carried out to attempt to resolve/correct a Product Condition excluding maintenance action.
Examples: follow QRH, complied with procedure, run or accomplished procedure, disable/enable systems, turn on/off systems, change state of the airplane or its systems, change flight phase, change flight altitude, etc. Communication related actions such as request, call, notify, and notice are excluded.
Product (PROD) Airplane and components/equipment/systems installed on the delivered product. Typically, this means something that you can touch, hold, remove, replace, control or interact with. Examples: tire pressure, software, navigation database, and cabin pressure.
Product Condition (PCON) A specific quality, behavior, or situation with regards to a Product Factor or Product Location.
Examples: Smoke, Fire, Fumes, Odor, Loss of Aircraft Control, FOD, Fuel Issue, Gear Up Landing, Ground Strike, Jet Blast, Loss of VLOS
Bird strike or Animal strike (BIRD) Bird strike or a near miss between an aircraft and wildlife, during high-speed take-off or landing. For animal strike, an impact/collision between an aircraft and wildlife (Deer, elk, coyote, fox), during high-speed take-off or landing
Emergency or Abnormal Situation (SIT) An emergency situation is one in which the safety of the aircraft or of persons on board or on the ground is endangered for any reason. An abnormal situation is one in which it is no longer possible to continue the flight using normal procedures but the safety of the aircraft or persons on board or on the ground is not in danger.
Examples: Evacuated, Flight Cancelled/Delayed, Diverted, Executed Go Around / Missed Approach, Inflight Shutdown, Exited Penetrated Airspace, FLC Overrode Automation, FLC Complied with Automation / Advisory, Landed as Precaution, Landed in Emergency Condition, Overcame Equipment Problem, Regained Aircraft Control, Rejected Takeoff, Requested ATC Assistance/Clarification, Returned to Clearance, Returned to Departure Airport, Returned to Gate, Returned to Home, Took Evasive Action
Installation & Usage
!pip install gliner==0.1.12
!pip install git+https://github.com/Boeing/aviation_ner_sdr@main
NuZero requires labels to be lower-cased
import pandas as pd
import re
import os
import time
from gliner import GLiNER
from aviation_ner_sdr.ner_tokenization import NerTokenization
class NERTagging:
labels = ["b-prod", "i-prod", "b-loc", "i-loc", "b-pcon", "i-pcon", "b-sit", "i-sit", "b-act", "i-act", "b-bird", "i-bird", "b-flt", "i-flt"]
def __init__(self, model_path):
if (os.path.exists(model_path)): self.model = GLiNER.from_pretrained(model_path, local_files_only=True)
else: self.model = GLiNER.from_pretrained(model_path)
self.tokenizer = NerTokenization()
def tokenize_with_offsets(self, text):
offsets_d = {}
for match in re.finditer(r'\S+', text): # \S+ matches any sequence of non-whitespace characters
start, end = match.start(), match.end()
offsets_d[(start, end)] = [match.group(), "O"]
return offsets_d
def strip_bi(self, tag):
if tag == "O":
base_tag = tag
else:
base_tag = tag.split("-")[1]
return base_tag
def get_list_of_tokens_with_tags(self, entities, d, strip_bi):
for this_ent in entities:
start, end, label = this_ent["start"], this_ent["end"], this_ent["label"]
k = (start, end)
if k in d:
d[k][1] = label
# else:
# print("misaligned") # matches currently set to be exact
sorted_text = sorted(d.items(), key = lambda tup : tup[0][0])
if strip_bi:
tagged_tokens = [(tup[1][0], self.strip_bi(tup[1][1])) for tup in sorted_text]
else:
tagged_tokens = [(tup[1][0], (tup[1][1])) for tup in sorted_text]
return tagged_tokens
def ner_label_main(self, text, strip_bi):
text = self.tokenizer.tokenize_string(text)
entities = self.model.predict_entities(text, NERTagging.labels)
text_d = self.tokenize_with_offsets(text)
tups = self.get_list_of_tokens_with_tags(entities, text_d, strip_bi)
return tups
def parse_out_labels_to_dict(self, tups):
d = {}
temp_tag, temp_entity = None, []
for token, tag in tups:
if tag != "O": # first check if token is part of entity
if tag.startswith("i"): # if not new entity, keep appending to current
if temp_tag is None: # handle mislabels where parts start with I
temp_tag, temp_entity = labeler.strip_bi(tag), [token] # reset
else:
temp_entity.append(token)
else: # tag starts with B - new entity
if temp_tag: # add old entity to d
if temp_tag not in d:
d[temp_tag] = []
d[temp_tag].append(" ".join(temp_entity))
temp_tag, temp_entity = labeler.strip_bi(tag), [token] # reset
else: # tag is "o"
if temp_tag: # add old entity to d
if temp_tag not in d:
d[temp_tag] = []
d[temp_tag].append(" ".join(temp_entity))
temp_tag, temp_entity = None, [] # reset
if temp_entity:
if temp_tag not in d:
d[temp_tag] = []
d[temp_tag].append(" ".join(temp_entity))
return d
if __name__ == "__main__":
model_path = "boeing/aviation-ner"
labeler = NERTagging(model_path)
# list of strings
all_text = ["A Cargojet Boeing 767-300 on behalf of Amazon Prime Air, registration C-GAZI performing flight W8-2387 (dep Nov 18th) from Hamilton,ON to Vancouver,BC (Canada), had declared PAN PAN prior to landing reporting flaps problems, they would land at a higher speed than normal, prompting emergency services to assume their standby locations. The aircraft landed on Vancouver's runway 08L at 01:28L (09:28Z) at a higher than normal speed (about 175 knots over ground), overran the runway by about 572 meters/1880 feet and suffered the collapse of the nose gear, the crew declared Mayday after coming to a stop. Both runways were closed following the runway excursion, runway 08R had been closed for works, runway 08L needed to be closed due to the occurrence, runway 08R was opened following the occurrence."]
# entity tags
tags = ["prod", "loc", "pcon", "sit", "act", "bird", "flt"]
for i, this_text in enumerate(all_text):
# tuples of tokens and tags
token_tag_tups = labeler.ner_label_main(this_text, strip_bi=False)
print(token_tag_tups)
# dictionary of tags: mentions from this_text
entity_dict = labeler.parse_out_labels_to_dict(token_tag_tups)
entity_dict = {key: ", ".join(value) for key, value in entity_dict.items()}
print(entity_dict)
Output
{'sit': 'declared, PAN PAN, declared Mayday', 'flt': 'landing, land, landed', 'prod': 'flaps, nose gear', 'pcon': 'problems, higher speed than normal, higher than normal speed, overran, collapse'}
Model tree for boeing/aviation-ner
Base model
numind/NuNER_Zero