ClinLinker / README.md
fernandogd97's picture
Update README.md
8be3ce7 verified
---
license: apache-2.0
language:
- es
base_model:
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
tags:
- medical
- spanish
- bi-encoder
- entity-linking
- sapbert
- umls
- snomed-ct
---
# **ClinLinker**
## Model Description
ClinLinker is a state-of-the-art bi-encoder model for medical entity linking (MEL) in Spanish, optimized for clinical domain tasks. It enriches concept representations by incorporating synonyms from the UMLS and SNOMED-CT ontologies. The model was trained with a contrastive-learning strategy using hard negative mining and multi-similarity loss.
## 💡 Intended Use
- **Domain**: Spanish Clinical NLP
- **Tasks**: Entity linking (diseases, symptoms, procedures) to SNOMED-CT
- **Evaluated On**: DisTEMIST, MedProcNER, SympTEMIST
- **Users**: Researchers and practitioners working in clinical NLP
## 📈 Performance Summary (Top-25 Accuracy)
| Model | DisTEMIST | MedProcNER | SympTEMIST |
|--------------------|-----------|------------|------------|
| **ClinLinker** | **0.845** | **0.898** | **0.909** |
| ClinLinker-KB-P | 0.853 | 0.891 | 0.918 |
| ClinLinker-KB-GP | 0.864 | 0.901 | 0.922 |
| SapBERT-XLM-R-large| 0.800 | 0.850 | 0.827 |
| RoBERTa biomedical | 0.600 | 0.668 | 0.609 |
*Results correspond to the cleaned gold-standard version (no "NO CODE" or "COMPOSITE").*
## 🧪 Usage
```python
from transformers import AutoModel, AutoTokenizer
import torch
model = AutoModel.from_pretrained("ICB-UMA/ClinLinker")
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker")
mention = "insuficiencia renal aguda"
inputs = tokenizer(mention, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)
```
For scalable retrieval, use [Faiss](https://github.com/facebookresearch/faiss) or the [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) class.
## ⚠️ Limitations
- The model is optimized for Spanish clinical data and may underperform outside this domain.
- Expert validation is advised in critical applications.
## 📚 Citation
> Gallego, F., López-García, G., Gasco-Sánchez, L., Krallinger, M., Veredas, F.J. (2024). ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_19
## Authors
Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J Veredas