| --- |
| license: apache-2.0 |
| language: |
| - es |
| base_model: |
| - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es |
| tags: |
| - medical |
| - spanish |
| - bi-encoder |
| - entity-linking |
| - sapbert |
| - umls |
| - snomed-ct |
| --- |
| |
| # **ClinLinker** |
|
|
| ## Model Description |
|
|
| ClinLinker is a state-of-the-art bi-encoder model for medical entity linking (MEL) in Spanish, optimized for clinical domain tasks. It enriches concept representations by incorporating synonyms from the UMLS and SNOMED-CT ontologies. The model was trained with a contrastive-learning strategy using hard negative mining and multi-similarity loss. |
|
|
| ## 💡 Intended Use |
| - **Domain**: Spanish Clinical NLP |
| - **Tasks**: Entity linking (diseases, symptoms, procedures) to SNOMED-CT |
| - **Evaluated On**: DisTEMIST, MedProcNER, SympTEMIST |
| - **Users**: Researchers and practitioners working in clinical NLP |
|
|
| ## 📈 Performance Summary (Top-25 Accuracy) |
|
|
| | Model | DisTEMIST | MedProcNER | SympTEMIST | |
| |--------------------|-----------|------------|------------| |
| | **ClinLinker** | **0.845** | **0.898** | **0.909** | |
| | ClinLinker-KB-P | 0.853 | 0.891 | 0.918 | |
| | ClinLinker-KB-GP | 0.864 | 0.901 | 0.922 | |
| | SapBERT-XLM-R-large| 0.800 | 0.850 | 0.827 | |
| | RoBERTa biomedical | 0.600 | 0.668 | 0.609 | |
|
|
| *Results correspond to the cleaned gold-standard version (no "NO CODE" or "COMPOSITE").* |
|
|
| ## 🧪 Usage |
|
|
| ```python |
| from transformers import AutoModel, AutoTokenizer |
| import torch |
| |
| model = AutoModel.from_pretrained("ICB-UMA/ClinLinker") |
| tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker") |
| |
| mention = "insuficiencia renal aguda" |
| inputs = tokenizer(mention, return_tensors="pt") |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| embedding = outputs.last_hidden_state[:, 0, :] |
| print(embedding.shape) |
| ``` |
|
|
| For scalable retrieval, use [Faiss](https://github.com/facebookresearch/faiss) or the [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) class. |
|
|
| ## ⚠️ Limitations |
| - The model is optimized for Spanish clinical data and may underperform outside this domain. |
| - Expert validation is advised in critical applications. |
|
|
| ## 📚 Citation |
|
|
| > Gallego, F., López-García, G., Gasco-Sánchez, L., Krallinger, M., Veredas, F.J. (2024). ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_19 |
| |
| ## Authors |
| |
| Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J Veredas |
| |