IMI-HD
/

medbert-hematopatho

 - GerMedBERT/medbert-512
 pipeline_tag: token-classification
 license: mit
+---
+# Pathology notes NER Model Example
+In this script we will provide the code to use our NER model.
+## Part 1: Define label list, load model and tokenizer
+#### 1.1 Define label list
+Label list is the list of all the labels in the IOB-scheme:
+Each entity/attribute has a B- (beginning) and I- (inner) label.
+The words with no tag are labeled as "O".
+```python
+ ["B-Mutation", "B-ExpressionSignal", "B-PolaritySignal", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-Infection", "I-Infection", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-SizeSignal", "I-ShiftSignal", "I-PolaritySignal", "O", "B-AmountSignal", "B-MalignancySignal", "I-SizeSignal", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-QuantitySignal", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-QuantitySignal", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-AmountSignal", "I-ClonalitySignal", "I-ExpressionSignal"]
+label_list = ["B-Mutation", "B-ExpressionSignal", "B-Polarity", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-InfectiousAgent", "I-InfectiousAgent", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-Size", "I-ShiftSignal", "I-Polarity", "O", "B-Amount", "B-MalignancySignal", "I-Size", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-Quantity", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-Quantity", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-Amount", "I-ClonalitySignal", "I-ExpressionSignal"]
+label_list
+```
+#### 1.2 Load fine-tuned NER model
+```python
+#create Classmap
+from datasets import ClassLabel
+classmap = ClassLabel(num_classes=len(label_list), names=label_list)
+#load model
+from transformers import AutoModelForTokenClassification
+model = AutoModelForTokenClassification.from_pretrained("GerMedBERT-best_model", num_labels=len(label_list), id2label={i:classmap.int2str(i) for i in range(classmap.num_classes)}, label2id={c:classmap.str2int(c) for c in classmap.names})
+```
+#### 1.3 Load tokenizer
+```python
+# %% load tokenizer
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("GerMedBERT/medbert-512")
+```
+## Part 2: Application of the model to an example pathology note
+#### 2.1 Create nlp pipeline
+```python
+# Create pipeline
+from transformers import pipeline
+import pandas as pd
+nlp = pipeline("ner", model=model, tokenizer=tokenizer)
+```
+#### 2.2 First Example in English and German
+The results of the following examples show that even though the model was trained only on German annotated texts, the model also works on English text, but to a lesser extent.
+```python
+# Example 1 in English and German
+english_example1 = "Immunohistochemically, there is a slightly increased amount of plasma cells, which are partly situated in small groups (MUM1, CD138). "
+german_example1 = "Immunhistochemisch zeigt sich eine leichte Vermehrung der Plasmazellen, die teils in kleinen Gruppen angeordnet sind (MUM1, CD138)"
+#print results of english example
+eng_results = nlp(english_example1)
+df_eng1 = pd.DataFrame(eng_results)
+print(df_eng1)
+# print results of german example
+ger_results = nlp(german_example1)
+df_ger1 = pd.DataFrame(ger_results)
+print(df_ger1)
+```
+#### 2.3 Second example in English and German
+english_example2 = "The diffuse infiltrates of blasts show a homogeneous and strong expression of CD20 and CD10 in absence of CD3, BCL-2, and TDT."
+german_example2 = "Diffuse Blasteninfiltrate zeigen eine homogene und starke Expression von CD20 und CD10 in Abwesenheit von CD3, BCL-2 und TDT."
+```python
+#print results of english example
+eng_results = nlp(english_example2)
+df_eng2 = pd.DataFrame(eng_results)
+print(df_eng2)
+# print results of german example
+ger_results = nlp(german_example2)
+df_ger2 = pd.DataFrame(ger_results)
+print(df_ger2)
+```