HERBERT-P-30 / README.md
fernandogd97's picture
Update README.md
ac39277 verified
metadata
library_name: transformers
tags:
  - contrastive-learning
  - Spanish-UMLS
  - Hierarchical-enrichment
  - entity-linking
  - biomedical
  - spanish
license: mit
language:
  - es
base_model:
  - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es

HERBERT: Leveraging UMLS Hierarchical Knowledge to Enhance Clinical Entity Normalization in Spanish

HERBERT-P is a contrastive-learning-based bi-encoder for medical entity normalization in Spanish, leveraging synonym and parent relationships from UMLS to enhance candidate retrieval for entity linking in clinical texts.

Key features:

  • Base model: PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
  • Trained with 30 positive pairs per anchor (synonyms + parents)
  • Task: Normalization of disease, procedure, and symptom mentions to SNOMED-CT/UMLS codes.
  • Domain: Spanish biomedical/clinical texts.
  • Corpora: DisTEMIST, MedProcNER, SympTEMIST.

Benchmark Results

Corpus Top-1 Top-5 Top-25 Top-200
DisTEMIST 0.588 0.723 0.803 0.867
SympTEMIST 0.635 0.784 0.882 0.946
MedProcNER 0.651 0.765 0.838 0.892