Spanish
richardjonker2000 commited on
Commit
e99bf7b
·
verified ·
1 Parent(s): 2c8f056

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - IEETA/SPACCC-Spanish-NER
5
+ language:
6
+ - es
7
+ metrics:
8
+ - f1
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+ Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements.
15
+ It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances.
16
+ Classes: symptoms, procedures, diseases, chemicals, and proteins
17
+
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+
25
+
26
+ - **Developed by:** IEETA
27
+ - **Shared by [optional]:** IEETA
28
+ - **Model type:** Multi-Head-CRF, Roberta Base
29
+ - **Language(s) (NLP):** Spanish
30
+ - **License:** MIT
31
+ - **Finetuned from model [optional]:** lcampillos/roberta-es-clinical-trials-ner
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** https://github.com/ieeta-pt/Multi-Head-CR
38
+ - **Paper:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ Note we do not take any liability for the use of the model in any professional/medical domain. The model is intended for academic purposes only. It performs Named Entity Recognition over 5 classes namely: SYMPTOM PROCEDURE DISEASE PROTEIN CHEMICAL
43
+
44
+ ## How to Get Started with the Model
45
+
46
+ Please refer to our GitHub repository for more information on how to train the model and run inference. https://github.com/ieeta-pt/Multi-Head-CRF
47
+
48
+ ## Training Details
49
+
50
+ ### Training Data
51
+
52
+ The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
53
+
54
+ [More Information Needed]
55
+
56
+
57
+
58
+
59
+ ### Speeds, Sizes, Times [optional]
60
+
61
+ The models were trained using an Nvidia Quadra RTX 8000. The models for 5 classes took approximately 1 hour to train and occupies around 1gb of disk space. Further this model shows linear complexity (+8 minutes) per entity class to classify.
62
+
63
+
64
+ ### Testing Data, Factors & Metrics
65
+
66
+ #### Testing Data
67
+ The testing data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
68
+
69
+
70
+ #### Metrics
71
+
72
+ The models were evaluated using the F1 score metric, the standard for entity recognition tasks.
73
+
74
+ ### Results
75
+
76
+ We provide 4 seperate models with various hyperparmeter changes:
77
+
78
+ | HLs per head | Augmentation | Percentage Tags | Augmentation Probability | F1 |
79
+ |--------------|--------------|-----------------|--------------------------|--------|
80
+ | 3 | Random | 0.25 | 0.50 | 78.73 |
81
+ | 3 | Unknown | 0.50 | 0.25 | 78.50 |
82
+ | 3 | None | - | - | **78.89** |
83
+ | 1 | Random | 0.25 | 0.50 | **78.89** |
84
+
85
+ All models are trained with a context size of 32 for 60 epochs.
86
+
87
+ #### Summary
88
+
89
+
90
+ ## Citation [optional]
91
+
92
+
93
+ **BibTeX:**
94
+
95
+ [More Information Needed]
96
+
97
+
98
+
99
+
100
+