Blair1213 commited on
Commit
e20ccc2
·
verified ·
1 Parent(s): 6957bb5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ <h1 align="center">
5
+ MedTok: Multimodal Medical Code Tokenizer
6
+ </h1>
7
+
8
+ ## Overview of MedTok
9
+ MEDTOK is a multimodal tokenizer of medical codes that combines text descriptions of codes with graph-based representations of dependencies between codes derived from clinical ontologies and standard medical terminologies. MEDTOK is a general-purpose tokenizer that can be integrated into any transformer-based model or system that requires tokenization.
10
+
11
+ ## How to use MedTok?
12
+ ```bash
13
+ from transformers import AutoTokenizer
14
+ tokenizer = AutoTokenizer.from_pretrained("mims-harvard/MedTok")
15
+ tokens = tokenizer.tokenize("E11.9")
16
+ ids = tokenizer.encode("E11.9")
17
+ embed = tokenizer.embed("E11.9")
18
+ ```
19
+
20
+ If you want to use the tokenized embedding for each medical code, please download it from [mims-harvard/MedTok](https://huggingface.co/mims-harvard/MedTok) or [code2embeddings.json.zip](https://doi.org/10.7910/DVN/7XNT3M) directly. And the downloaded embedding file could be put into 'MedTok/embedding.npy' to run EHR or QA tasks based on MedTok.
21
+
22
+ ### 🏥MedTok for EHR & MedicalQA
23
+ Please reference our github repo [MedTok](https://github.com/mims-harvard/MedTok)
24
+
25
+ ## Citation
26
+ ```bash
27
+ @article{su2025multimodal,
28
+ title={Multimodal Medical Code Tokenizer},
29
+ author={Su, Xiaorui and Messica, Shvat and Huang, Yepeng and Johnson, Ruth and Fesser, Lukas and Gao, Shanghua and Sahneh, Faryad and Zitnik, Marinka},
30
+ journal={International Conference on Machine Learning, ICML},
31
+ year={2025}
32
+ }
33
+ ```
34
+ </details>