mims-harvard
/

MedTok

Model card Files Files and versions

MedTok / README.md

Blair1213's picture

Update README.md

14772f0 verified 6 months ago

|

history blame contribute delete

2.02 kB

	---
	license: mit
	---
	<h1 align="center">
	MedTok: Multimodal Medical Code Tokenizer
	</h1>

	## Overview of MedTok
	MEDTOK is a multimodal tokenizer of medical codes that combines text descriptions of codes with graph-based representations of dependencies between codes derived from clinical ontologies and standard medical terminologies. MEDTOK is a general-purpose tokenizer that can be integrated into any transformer-based model or system that requires tokenization.

	## How to use MedTok?
	```bash
	from transformers import AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("mims-harvard/MedTok", trust_remote_code=True)
	tokens = tokenizer("E11.9")
	embed = tokenizer.embed("E11.9")
	```
	- embed means the quantized embedding for this input medical code.

	If you want to use the tokenized embedding for each medical code, please download it from [mims-harvard/MedTok](https://huggingface.co/mims-harvard/MedTok) or [code2embeddings.json.zip](https://doi.org/10.7910/DVN/7XNT3M) directly. And the downloaded embedding file could be put into 'MedTok/embedding.npy' to run EHR or QA tasks based on MedTok.

	### 🏥MedTok for EHR & MedicalQA
	Please reference our github repo [MedTok](https://github.com/mims-harvard/MedTok)

	### Note
	MedTok tokenizer V1.0 now only supports those medical codes adopted in our paper. For those unseen codes, the output will be '<unk>' token. We will also continue to update our MedTok to make it apply to more coding system and tokenize medical code dynamically.

	## Citation
	```bash
	@article{su2025multimodal,
	title={Multimodal Medical Code Tokenizer},
	author={Su, Xiaorui and Messica, Shvat and Huang, Yepeng and Johnson, Ruth and Fesser, Lukas and Gao, Shanghua and Sahneh, Faryad and Zitnik, Marinka},
	journal={International Conference on Machine Learning, ICML},
	year={2025}
	}
	```

	## Contact
	Thank you for your support!
	If you have any questions or suggestions, please email [Xiaorui Su](xiaorui_su@hms.harvard.edu) and [Marinka Zitnik](marinka@hms.harvard.edu).
	</details>