Add comprehensive model card for EHR-R1-1.7B

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +112 -0
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ tags:
6
+ - medical
7
+ - healthcare
8
+ - ehr
9
+ - reasoning
10
+ - qwen
11
+ ---
12
+
13
+ # EHR-R1-1.7B: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis
14
+
15
+ This repository contains the **EHR-R1-1.7B** model, part of the **EHR-R1** series, as presented in the paper [EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis](https://huggingface.co/papers/2510.25628).
16
+
17
+ **EHR-R1** is a family of reasoning-enhanced Large Language Models (LLMs) specifically tailored for Electronic Health Record (EHR) analysis. It is developed based on **EHR-Ins**, a large-scale, comprehensive EHR reasoning instruction dataset, and is trained through a multi-stage paradigm including domain adaptation, reasoning enhancement, and reinforcement learning. This approach systematically acquires domain knowledge and diverse reasoning capabilities, enabling accurate and robust EHR analysis. The project also introduces **EHR-Bench**, a new benchmark curated from MIMIC-IV for comprehensive assessment across 42 distinct EHR tasks.
18
+
19
+ * **Paper**: [https://huggingface.co/papers/2510.25628](https://huggingface.co/papers/2510.25628)
20
+ * **GitHub Repository**: [https://github.com/MAGIC-AI4Med/EHR-R1](https://github.com/MAGIC-AI4Med/EHR-R1)
21
+
22
+ <p align="center">
23
+ <img src="https://github.com/MAGIC-AI4Med/EHR-R1/raw/main/assets/teaser.png" alt="EHR-R1 Teaser Image" width="800">
24
+ </p>
25
+
26
+ ## 💡 Key Highlights
27
+ * We open-source a large-scale instruction dataset [**EHR-Ins**](data_url), including 3.5M non-reasoning data and 300k reasoning data.
28
+ * We open-source a comprehensive benchmark [**EHR-Bench**](data_url), which covers 42 distinct EHR analysis tasks.
29
+ * We open-source EHR reasoning-enhanced LLMs **EHR-R1**, including [**EHR-R1-1.7B**](https://huggingface.co/BlueZeros/EHR-R1-1.7B), [**EHR-R1-8B**](https://huggingface.co/BlueZeros/EHR-R1-8B), and [**EHR-R1-72B**](https://huggingface.co/BlueZeros/EHR-R1-72B).
30
+ * We open-source the "thinking-graph" pipeline, which can synthesize reasoning chains for EHR analysis tasks according to the relation of EHR entities.
31
+
32
+ ## ⚡ Directly Use
33
+
34
+ ### EHR Input Format
35
+ For any EHR data, keep the EHR input with markdown format as below:
36
+ * For the event with single record:
37
+ ```markdown
38
+ ## Evant Name [Event Time (YYYY-MM-DD HH:MM:SS)]
39
+ - ItemKey_1: ItemValue_1
40
+ - ItemKey_2: ItemValue_2
41
+ - ItemKey_3: ItemValue_3
42
+ ```
43
+ * For the event with multiple records (like labevents):
44
+ ```markdown
45
+ ## Evant Name [Event Time (YYYY-MM-DD HH:MM:SS)]
46
+ | ItemKey_1 | ItemKey_2 | ItemKey_3 |
47
+ | --------- | --------- | --------- |
48
+ | ItemValue_1 | ItemValue_2 | ItemValue_3 |
49
+ | ItemValue_1 | ItemValue_2 | ItemValue_3 |
50
+ | ItemValue_1 | ItemValue_2 | ItemValue_3 |
51
+ ```
52
+
53
+ ### Models Inference with Transformers
54
+ ```python
55
+ from transformers import AutoModelForCausalLM, AutoTokenizer
56
+ import torch
57
+
58
+ model_name = "BlueZeros/EHR-R1-1.7B" # This specific EHR-R1-1.7B model
59
+ model = AutoModelForCausalLM.from_pretrained(
60
+ model_name,
61
+ torch_dtype="auto",
62
+ device_map="auto"
63
+ )
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
65
+
66
+ ehr_input = "{YOUR FOMATTED EHR INPUT}"
67
+ instruction = "{YOUR TASK INSTRUCTION}"
68
+ messages = [
69
+ {"role": "system", "content": "You are a helpful assistant."},
70
+ {"role": "user", "content": ehr_input + "
71
+ " + instruction}
72
+ ]
73
+
74
+ # For EHR-R1-1.7B & EHR-R1-8B, control the reasoning mode by setting enable_thinking
75
+ text = tokenizer.apply_chat_template(
76
+ messages,
77
+ tokenize=False,
78
+ add_generation_prompt=True,
79
+ enable_thinking=False,
80
+ ).to(model.device)
81
+ # For EHR-R1-72B, you can manually add <think>
82
+
83
+ </think>
84
+ at the end of the model_inputs to close the reasoning modes.
85
+ text += "<think>
86
+
87
+ </think>
88
+ "
89
+
90
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
91
+ generated_ids = model.generate(
92
+ **model_inputs,
93
+ max_new_tokens=2048,
94
+ temperature=0.0
95
+ )
96
+ generated_ids = [
97
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
98
+ ]
99
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
100
+ print(response)
101
+ ```
102
+
103
+ ## 📖 Citation
104
+ If you find our work helpful or inspiring, please feel free to cite it:
105
+ ```bib
106
+ @article{liao2025ehrr1,
107
+ title={{EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis}},
108
+ author={Liao, Yusheng and Wu, Chaoyi and Liu, Junwei and Jiang, Shuyang and Qiu, Pengcheng and Wang, Haowen and Yue, Yun and Zhen, Shuai and Wang, Jian and Fan, Qianrui and Gu, Jinjie and Zhang, Ya and Wang, Yanfeng and Wang, Yu and Xie, Weidi},
109
+ journal={arXiv preprint arXiv:2510.25628},
110
+ year={2025}
111
+ }
112
+ ```