maga-24 commited on
Commit
eee3d4f
·
verified ·
1 Parent(s): 69ad463

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -1
README.md CHANGED
@@ -5,4 +5,89 @@ base_model:
5
  - GerMedBERT/medbert-512
6
  pipeline_tag: token-classification
7
  license: mit
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - GerMedBERT/medbert-512
6
  pipeline_tag: token-classification
7
  license: mit
8
+ ---
9
+
10
+ # Pathology notes NER Model Example
11
+ In this script we will provide the code to use our NER model.
12
+
13
+ ## Part 1: Define label list, load model and tokenizer
14
+
15
+ #### 1.1 Define label list
16
+ Label list is the list of all the labels in the IOB-scheme:
17
+ Each entity/attribute has a B- (beginning) and I- (inner) label.
18
+ The words with no tag are labeled as "O".
19
+
20
+ ```python
21
+ ["B-Mutation", "B-ExpressionSignal", "B-PolaritySignal", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-Infection", "I-Infection", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-SizeSignal", "I-ShiftSignal", "I-PolaritySignal", "O", "B-AmountSignal", "B-MalignancySignal", "I-SizeSignal", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-QuantitySignal", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-QuantitySignal", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-AmountSignal", "I-ClonalitySignal", "I-ExpressionSignal"]
22
+
23
+ label_list = ["B-Mutation", "B-ExpressionSignal", "B-Polarity", "I-HematoDiagnosis", "I-MorphologicAbnormality", "B-InfectiousAgent", "I-InfectiousAgent", "I-Proliferation", "B-Hematopoiesis", "I-DiagnosisType", "I-CellAssociation", "B-Size", "I-ShiftSignal", "I-Polarity", "O", "B-Amount", "B-MalignancySignal", "I-Size", "I-OtherDiagnosis", "I-MalignancySignal", "B-Expression", "B-DiagnosisType", "B-Proliferation", "I-Expression", "B-Quantity", "B-MorphologicAbnormality", "B-ShiftSignal", "B-HematoDiagnosis", "B-CellType", "B-OtherDiagnosis", "B-ClonalitySignal", "B-CellAssociation", "I-Quantity", "I-Mutation", "I-Hematopoiesis", "I-CellType", "I-Amount", "I-ClonalitySignal", "I-ExpressionSignal"]
24
+ label_list
25
+ ```
26
+
27
+ #### 1.2 Load fine-tuned NER model
28
+
29
+
30
+ ```python
31
+ #create Classmap
32
+ from datasets import ClassLabel
33
+ classmap = ClassLabel(num_classes=len(label_list), names=label_list)
34
+
35
+
36
+ #load model
37
+ from transformers import AutoModelForTokenClassification
38
+ model = AutoModelForTokenClassification.from_pretrained("GerMedBERT-best_model", num_labels=len(label_list), id2label={i:classmap.int2str(i) for i in range(classmap.num_classes)}, label2id={c:classmap.str2int(c) for c in classmap.names})
39
+ ```
40
+
41
+ #### 1.3 Load tokenizer
42
+
43
+ ```python
44
+ # %% load tokenizer
45
+ from transformers import AutoTokenizer
46
+ tokenizer = AutoTokenizer.from_pretrained("GerMedBERT/medbert-512")
47
+ ```
48
+
49
+ ## Part 2: Application of the model to an example pathology note
50
+
51
+ #### 2.1 Create nlp pipeline
52
+
53
+ ```python
54
+ # Create pipeline
55
+ from transformers import pipeline
56
+ import pandas as pd
57
+
58
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer)
59
+ ```
60
+
61
+ #### 2.2 First Example in English and German
62
+ The results of the following examples show that even though the model was trained only on German annotated texts, the model also works on English text, but to a lesser extent.
63
+
64
+ ```python
65
+ # Example 1 in English and German
66
+ english_example1 = "Immunohistochemically, there is a slightly increased amount of plasma cells, which are partly situated in small groups (MUM1, CD138). "
67
+ german_example1 = "Immunhistochemisch zeigt sich eine leichte Vermehrung der Plasmazellen, die teils in kleinen Gruppen angeordnet sind (MUM1, CD138)"
68
+
69
+ #print results of english example
70
+ eng_results = nlp(english_example1)
71
+ df_eng1 = pd.DataFrame(eng_results)
72
+ print(df_eng1)
73
+ # print results of german example
74
+ ger_results = nlp(german_example1)
75
+ df_ger1 = pd.DataFrame(ger_results)
76
+ print(df_ger1)
77
+ ```
78
+
79
+ #### 2.3 Second example in English and German
80
+ english_example2 = "The diffuse infiltrates of blasts show a homogeneous and strong expression of CD20 and CD10 in absence of CD3, BCL-2, and TDT."
81
+ german_example2 = "Diffuse Blasteninfiltrate zeigen eine homogene und starke Expression von CD20 und CD10 in Abwesenheit von CD3, BCL-2 und TDT."
82
+
83
+ ```python
84
+ #print results of english example
85
+ eng_results = nlp(english_example2)
86
+ df_eng2 = pd.DataFrame(eng_results)
87
+ print(df_eng2)
88
+
89
+ # print results of german example
90
+ ger_results = nlp(german_example2)
91
+ df_ger2 = pd.DataFrame(ger_results)
92
+ print(df_ger2)
93
+ ```