|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
<h1 align="center"> |
|
|
MedTok: Multimodal Medical Code Tokenizer |
|
|
</h1> |
|
|
|
|
|
## Overview of MedTok |
|
|
MEDTOK is a multimodal tokenizer of medical codes that combines text descriptions of codes with graph-based representations of dependencies between codes derived from clinical ontologies and standard medical terminologies. MEDTOK is a general-purpose tokenizer that can be integrated into any transformer-based model or system that requires tokenization. |
|
|
|
|
|
## How to use MedTok? |
|
|
```bash |
|
|
from transformers import AutoTokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("mims-harvard/MedTok", trust_remote_code=True) |
|
|
tokens = tokenizer("E11.9") |
|
|
embed = tokenizer.embed("E11.9") |
|
|
``` |
|
|
- embed means the quantized embedding for this input medical code. |
|
|
|
|
|
If you want to use the tokenized embedding for each medical code, please download it from [mims-harvard/MedTok](https://huggingface.co/mims-harvard/MedTok) or [code2embeddings.json.zip](https://doi.org/10.7910/DVN/7XNT3M) directly. And the downloaded embedding file could be put into 'MedTok/embedding.npy' to run EHR or QA tasks based on MedTok. |
|
|
|
|
|
### 🏥MedTok for EHR & MedicalQA |
|
|
Please reference our github repo [MedTok](https://github.com/mims-harvard/MedTok) |
|
|
|
|
|
### Note |
|
|
MedTok tokenizer V1.0 now only supports those medical codes adopted in our paper. For those unseen codes, the output will be '<unk>' token. We will also continue to update our MedTok to make it apply to more coding system and tokenize medical code dynamically. |
|
|
|
|
|
## Citation |
|
|
```bash |
|
|
@article{su2025multimodal, |
|
|
title={Multimodal Medical Code Tokenizer}, |
|
|
author={Su, Xiaorui and Messica, Shvat and Huang, Yepeng and Johnson, Ruth and Fesser, Lukas and Gao, Shanghua and Sahneh, Faryad and Zitnik, Marinka}, |
|
|
journal={International Conference on Machine Learning, ICML}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
Thank you for your support! |
|
|
If you have any questions or suggestions, please email [Xiaorui Su](xiaorui_su@hms.harvard.edu) and [Marinka Zitnik](marinka@hms.harvard.edu). |
|
|
</details> |