File size: 2,706 Bytes
3f28b6e
ad52637
 
 
 
 
3f28b6e
ad52637
 
 
 
 
 
 
3f28b6e
 
ad52637
3f28b6e
ad52637
3f28b6e
ad52637
3f28b6e
ad52637
 
 
 
 
3f28b6e
8be3ce7
3f28b6e
ad52637
 
 
 
 
 
 
3f28b6e
ad52637
3f28b6e
ad52637
3f28b6e
 
ad52637
3f28b6e
 
 
ad52637
3f28b6e
 
ad52637
3f28b6e
 
ad52637
 
 
 
 
 
8be3ce7
ad52637
 
 
 
 
 
 
 
3f28b6e
ad52637
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: apache-2.0
language:
- es
base_model:
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
tags:
- medical
- spanish
- bi-encoder
- entity-linking
- sapbert
- umls
- snomed-ct
---

# **ClinLinker**

## Model Description

ClinLinker is a state-of-the-art bi-encoder model for medical entity linking (MEL) in Spanish, optimized for clinical domain tasks. It enriches concept representations by incorporating synonyms from the UMLS and SNOMED-CT ontologies. The model was trained with a contrastive-learning strategy using hard negative mining and multi-similarity loss.

## 💡 Intended Use
- **Domain**: Spanish Clinical NLP
- **Tasks**: Entity linking (diseases, symptoms, procedures) to SNOMED-CT
- **Evaluated On**: DisTEMIST, MedProcNER, SympTEMIST
- **Users**: Researchers and practitioners working in clinical NLP

## 📈 Performance Summary (Top-25 Accuracy)

| Model               | DisTEMIST | MedProcNER | SympTEMIST |
|--------------------|-----------|------------|------------|
| **ClinLinker**     | **0.845** | **0.898**  | **0.909**  |
| ClinLinker-KB-P    | 0.853     | 0.891      | 0.918      |
| ClinLinker-KB-GP   | 0.864     | 0.901      | 0.922      |
| SapBERT-XLM-R-large| 0.800     | 0.850      | 0.827      |
| RoBERTa biomedical | 0.600     | 0.668      | 0.609      |

*Results correspond to the cleaned gold-standard version (no "NO CODE" or "COMPOSITE").*

## 🧪 Usage

```python
from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("ICB-UMA/ClinLinker")
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker")

mention = "insuficiencia renal aguda"
inputs = tokenizer(mention, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)
```

For scalable retrieval, use [Faiss](https://github.com/facebookresearch/faiss) or the [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) class.

## ⚠️ Limitations
- The model is optimized for Spanish clinical data and may underperform outside this domain.
- Expert validation is advised in critical applications.

## 📚 Citation

> Gallego, F., López-García, G., Gasco-Sánchez, L., Krallinger, M., Veredas, F.J. (2024). ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish. In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. Lecture Notes in Computer Science, vol 14836. Springer, Cham. https://doi.org/10.1007/978-3-031-63775-9_19

## Authors

Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J Veredas