IEETA
/

Multi-Head-CRF

Spanish

Model card Files Files and versions

xet

Community

richardjonker2000 commited on May 13, 2024

Commit

a68b4ec

verified ·

1 Parent(s): e99bf7b

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -35

README.md CHANGED Viewed

@@ -8,34 +8,33 @@ metrics:
 - f1
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements.
-It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances.
-Classes: symptoms, procedures, diseases, chemicals, and proteins
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
 - **Developed by:** IEETA
-- **Shared by [optional]:** IEETA
 - **Model type:** Multi-Head-CRF, Roberta Base
 - **Language(s) (NLP):** Spanish
 - **License:** MIT
-- **Finetuned from model [optional]:** lcampillos/roberta-es-clinical-trials-ner
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** https://github.com/ieeta-pt/Multi-Head-CR
-- **Paper:** [More Information Needed]
 ## Uses
@@ -43,37 +42,35 @@ Note we do not take any liability for the use of the model in any professional/m
 ## How to Get Started with the Model
-Please refer to our GitHub repository for more information on how to train the model and run inference.  https://github.com/ieeta-pt/Multi-Head-CRF
 ## Training Details
 ### Training Data
 The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
-[More Information Needed]
-### Speeds, Sizes, Times [optional]
-The models were trained using an Nvidia Quadra RTX 8000. The models for 5 classes took approximately 1 hour to train and occupies around 1gb of disk space. Further this model shows linear complexity (+8 minutes) per entity class to classify.
 ### Testing Data, Factors & Metrics
 #### Testing Data
 The testing data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
 #### Metrics
 The models were evaluated using the F1 score metric, the standard for entity recognition tasks.
 ### Results
-We provide 4 seperate models with various hyperparmeter changes:
 | HLs per head | Augmentation | Percentage Tags | Augmentation Probability | F1     |
 |--------------|--------------|-----------------|--------------------------|--------|
@@ -84,17 +81,9 @@ We provide 4 seperate models with various hyperparmeter changes:
 All models are trained with a context size of 32 for 60 epochs.
-#### Summary
-## Citation [optional]
 **BibTeX:**
-[More Information Needed]

 - f1
 ---
+# Model Card for Biomedical Named Entity Recognition in Spanish Clinical Texts
+Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements. It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances. The classes it recognizes include symptoms, procedures, diseases, chemicals, and proteins.
+We provide 4 different, models, available as branches of this repository.
 ## Model Details
 ### Model Description
 - **Developed by:** IEETA
 - **Model type:** Multi-Head-CRF, Roberta Base
 - **Language(s) (NLP):** Spanish
 - **License:** MIT
+- **Finetuned from model:** lcampillos/roberta-es-clinical-trials-ner
+### Model Sources
+- **Repository:** [IEETA Multi-Head-CRF GitHub](https://github.com/ieeta-pt/Multi-Head-CRF)
+- **Paper:** Multi-head CRF classifier for biomedical multi-class Named Entity Recognition on Spanish clinical notes [Awaiting Publication]
+*Authors:*
+- Richard A A Jonker ([ORCID: 0000-0002-3806-6940](https://orcid.org/0000-0002-3806-6940))
+- Tiago Almeida ([ORCID: 0000-0002-4258-3350](https://orcid.org/0000-0002-4258-3350))
+- Rui Antunes ([ORCID: 0000-0003-3533-8872](https://orcid.org/0000-0003-3533-8872))
+- João R Almeida ([ORCID: 0000-0003-0729-2264](https://orcid.org/0000-0003-0729-2264))
+- Sérgio Matos ([ORCID: 0000-0003-1941-3983](https://orcid.org/0000-0003-1941-3983))
 ## Uses
 ## How to Get Started with the Model
+Please refer to our GitHub repository for more information on how to train the model and run inference: [IEETA Multi-Head-CRF GitHub](https://github.com/ieeta-pt/Multi-Head-CRF)
 ## Training Details
 ### Training Data
 The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
+The dataset used consists of 4 seperate datasets:
+- [MedProcNer](https://zenodo.org/records/8224056)
+- [DisTEMIST](https://zenodo.org/records/7614764)
+- [PharmaCoNER](https://zenodo.org/records/4270158)
+- [SympTEMIST](https://zenodo.org/records/10635215)
+### Speeds, Sizes, Times
+The models were trained using an Nvidia Quadra RTX 8000. The models for 5 classes took approximately 1 hour to train and occupy around 1GB of disk space. Additionally, this model shows linear complexity (+8 minutes) per entity class to classify.
 ### Testing Data, Factors & Metrics
 #### Testing Data
 The testing data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
 #### Metrics
 The models were evaluated using the F1 score metric, the standard for entity recognition tasks.
 ### Results
+We provide 4 separate models with various hyperparameter changes:
 | HLs per head | Augmentation | Percentage Tags | Augmentation Probability | F1     |
 |--------------|--------------|-----------------|--------------------------|--------|
 All models are trained with a context size of 32 for 60 epochs.
+## Citation
 **BibTeX:**
+[Awaiting Publication]