hamsinimk commited on
Commit
2516548
·
verified ·
1 Parent(s): fd3e401

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -109
README.md CHANGED
@@ -18,9 +18,18 @@ This model takes in doctor's notes as inputs and summarizes them into patient-fr
18
 
19
  ## Introduction
20
 
21
- Healthcare communication and health literacy is a large gap that exists in the healthcare industry between physicians and patients. I often run into the issue of reading long doctor’s notes that are uploaded to my patient portal which I cannot understand. This is definitely a frustrating experience as a patient because the notes are not only long, but also include a lot of technical jargon surrounding the diagnosis which is overwhelming. I often look to doctors in my family to translate doctor’s notes and understand if / what the next steps are. Instead of having a middle man translate the notes, I’m hoping that the LLM can take doctor’s notes as input and summarize them into short, simple notes. I think current LLMs do need training in this as it is a niche topic and I would want to ensure that accuracy and key details are preserved when providing simple summaries to users/patients. Current LLMs may brush over key details if they haven’t been trained specifically in clinical/doctor’s notes or a large enough dataset to understand the context and style of writing. In my own experience, LLMs are great at summarizing, but can lack specificity or leave out information at times. As mentioned in Medium post by Sahil Ahmed (Data Scientist), LLMs, in general, as well as ones that implement RAG systems are not without their disadvantages. Ahmed notes one such failure point as “context limitation” which happens when many documents are passed through the LLM model which forces the system to “consolidate them to fit the LLM’s input limits, which may lead to truncation or selective prioritization, potentially leaving out crucial information” (Sahin Ahmed, 2024). In this medical use case, it is extremely important to maintain the accuracy for the patient such that key details are not brushed over so the model’s summarized output can be relied on for next steps. To ensure this accuracy, I think developing a LLM that is dedicated to this use case and has been trained specifically on doctor’s notes and summaries is key. This way, any noise from other unrelated training data can be avoided. I think current LLMs do need training in this as it is a niche topic and I would want to ensure that accuracy and key details are preserved when providing simple summaries to users/patients. Current LLMs may brush over key details if they haven’t been trained specifically in clinical/doctor’s notes or a large enough dataset to understand the context and style of writing. In my own experience, LLMs are great at summarizing, but can lack specificity or leave out information at times. As mentioned in Medium post by Sahil Ahmed (Data Scientist), LLMs, in general, as well as ones that implement RAG systems are not without their disadvantages. Ahmed notes one such failure point as “context limitation” which happens when many documents are passed through the LLM model which forces the system to “consolidate them to fit the LLM’s input limits, which may lead to truncation or selective prioritization, potentially leaving out crucial information” (Sahin Ahmed, 2024). In this medical use case, it is extremely important to maintain the accuracy for the patient such that key details are not brushed over so the model’s summarized output can be relied on for next steps. To ensure this accuracy, I think developing a LLM that is dedicated to this use case and has been trained specifically on doctor’s notes and summaries is key. This way, any noise from other unrelated training data can be avoided.
22
 
 
23
 
 
 
 
 
 
 
 
 
24
 
25
  ### Model Description
26
 
@@ -48,136 +57,33 @@ This is the model card of a 🤗 transformers model that has been pushed on the
48
 
49
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
50
 
51
- ### Direct Use
52
-
53
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
54
-
55
- [More Information Needed]
56
-
57
- ### Downstream Use [optional]
58
-
59
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
60
-
61
- [More Information Needed]
62
-
63
- ### Out-of-Scope Use
64
 
65
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
66
 
67
- [More Information Needed]
68
 
69
- ## Bias, Risks, and Limitations
70
 
71
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
72
 
73
  [More Information Needed]
74
 
75
- ### Recommendations
76
-
77
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
78
 
79
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
80
 
81
- ## How to Get Started with the Model
82
 
83
  Use the code below to get started with the model.
84
 
85
  [More Information Needed]
86
 
87
- ## Training Details
88
-
89
- ### Training Data
90
-
91
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
92
-
93
- [More Information Needed]
94
-
95
- ### Training Procedure
96
-
97
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
98
-
99
- #### Preprocessing [optional]
100
-
101
- [More Information Needed]
102
 
103
 
104
- #### Training Hyperparameters
105
 
106
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
107
-
108
- #### Speeds, Sizes, Times [optional]
109
-
110
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
111
-
112
- [More Information Needed]
113
-
114
- ## Evaluation
115
-
116
- <!-- This section describes the evaluation protocols and provides the results. -->
117
-
118
- ### Testing Data, Factors & Metrics
119
-
120
- #### Testing Data
121
-
122
- <!-- This should link to a Dataset Card if possible. -->
123
-
124
- [More Information Needed]
125
 
126
- #### Factors
127
 
128
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
129
-
130
- [More Information Needed]
131
-
132
- #### Metrics
133
-
134
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
135
-
136
- [More Information Needed]
137
-
138
- ### Results
139
-
140
- [More Information Needed]
141
-
142
- #### Summary
143
-
144
-
145
-
146
- ## Model Examination [optional]
147
-
148
- <!-- Relevant interpretability work for the model goes here -->
149
-
150
- [More Information Needed]
151
-
152
- ## Environmental Impact
153
-
154
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
155
-
156
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
157
-
158
- - **Hardware Type:** [More Information Needed]
159
- - **Hours used:** [More Information Needed]
160
- - **Cloud Provider:** [More Information Needed]
161
- - **Compute Region:** [More Information Needed]
162
- - **Carbon Emitted:** [More Information Needed]
163
-
164
- ## Technical Specifications [optional]
165
-
166
- ### Model Architecture and Objective
167
-
168
- [More Information Needed]
169
-
170
- ### Compute Infrastructure
171
-
172
- [More Information Needed]
173
-
174
- #### Hardware
175
-
176
- [More Information Needed]
177
-
178
- #### Software
179
-
180
- [More Information Needed]
181
 
182
  ## Citation [optional]
183
 
 
18
 
19
  ## Introduction
20
 
21
+ Healthcare communication and health literacy is a large gap that exists in the healthcare industry between physicians and patients. I often run into the issue of reading long doctor’s notes that are uploaded to my patient portal which I cannot understand. This is definitely a frustrating experience as a patient because the notes are not only long, but also include a lot of technical jargon surrounding the diagnosis which is overwhelming. I often look to doctors in my family to translate doctor’s notes and understand if / what the next steps are. Instead of having a middle man translate the notes, I’m hoping that the LLM can take doctor’s notes as input and summarize them into short, simple notes. I think current LLMs do need training in this as it is a niche topic and I would want to ensure that accuracy and key details are preserved when providing simple summaries to users/patients. Current LLMs may brush over key details if they haven’t been trained specifically in clinical/doctor’s notes or a large enough dataset to understand the context and style of writing. In my own experience, LLMs are great at summarizing, but can lack specificity or leave out information at times. As mentioned in Medium post by Sahil Ahmed (Data Scientist), LLMs, in general, as well as ones that implement RAG systems are not without their disadvantages. Ahmed notes one such failure point as “context limitation” which happens when many documents are passed through the LLM model which forces the system to “consolidate them to fit the LLM’s input limits, which may lead to truncation or selective prioritization, potentially leaving out crucial information” (Sahin Ahmed, 2024). In this medical use case, it is extremely important to maintain the accuracy for the patient such that key details are not brushed over so the model’s summarized output can be relied on for next steps. To ensure this accuracy, I think developing a LLM that is dedicated to this use case and has been trained specifically on doctor’s notes and summaries is key to avoid noise from other unrelated training data as well.
22
 
23
+ ## Data
24
 
25
+ After looking more into the training data generation, I noticed that it is tough to find long doctor notes and patient summary pairs. Due to this, I performed synthetic data generation to generate the summaries from existing doctor’s notes. I found a huggingface dataset of 30,000 doctor’s notes (PMCpatientsdata) which I then subset to 1000 rows. As a note, I used google/gemma-3-4b-it as my model for data generation and training. For data generation, I prompted the model as follows:
26
+ System Prompt: Imagine you are a useful medical assistant that is trying to summarize doctor notes that were taken during patient visits into patient friendly summaries that are 3-5 sentences long. The goal is just to summarize the given doctor's note and output a 3-5 sentence summary that captures key details of the note without too much medical jargon."
27
+ User Prompt: Now provide a 3–5 sentence summary for the doctor's note written for a patient's understanding. Doctor's note:{row["Doctor's Note"]}
28
+ I used a subset of the PMC-patients dataset (1000 rows) and set up a for loop to loop through each doctor’s note and generate a summary based on given prompt instructions. After generating the summaries, I saved the doctor’s notes and summary pairs to a .csv file to use later for training purposes. I employed 80/10/10 split for the training-validation-test data to ensure adequate training and evaluation. Essentially, I trained the model on 800 doctor’s notes + summary pairs and then validated / tested on a total of 200 notes + summary pairs.
29
+
30
+ ## Methodology
31
+
32
+ Based on previous experimentation with LoRA, I think the method does decently well to increase the model's ability to perform medical reasoning through the finetuning process based on the accuracy increasing for a medical training task. With LoRA, the model is able to actually change how it thinks and reasons, rather than just reiterating/finetuning the context that the training task is medical at hand (which is what prompt tuning does). Since it focuses on updating a subset of weight matrices using low rank adaptation, the model is able to better understand reasoning patterns to properly answer more complex questions as the method modifies the attention heads. To prevent catastrophic forgetting and possibly increase overall accuracy, the number of epochs for training was increased to 3. Based on these factors, I would say for a complex medical dataset / reasoning task, I would choose LoRA as the appropriate finetuning method. After trying 3 different hyperparameter combinations (low capacity, medium capacity, and high capacity LoRA), the medium and high capacity LoRA performed very similarly in terms of validation and training loss. It didn’t make sense to add more parameters with high capacity LoRA, so I went forward with medium capacity LoRA as it provided essentially the same performance. Medium capacity LoRA had r at 32, alpha at 64, and dropout at 15%. During training, the number of epochs was set to 3, learning rate was 0.00001, and the number of evaluation steps was 200. Furthermore, the auto_find_batch_size parameter was not used and instead, the per_device_train_batch_size and per_device_eval_batch_size was set to 1. As a note, the number of evaluation steps at 200 means that validation and training loss will be calculated every 200 steps up to 800 (training data size) per epoch (2400 steps total across all 3 epochs).
33
 
34
  ### Model Description
35
 
 
57
 
58
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
59
 
60
+ ## Evaluation
 
 
 
 
 
 
 
 
 
 
 
 
61
 
 
62
 
 
63
 
64
+ ## Usage and Intended Uses
65
 
66
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
67
 
68
  [More Information Needed]
69
 
70
+ ## Prompt Format
 
71
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
 
73
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
 
75
+ ## Expected Output Format
76
 
77
  Use the code below to get started with the model.
78
 
79
  [More Information Needed]
80
 
81
+ ## Limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
 
 
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
 
86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  ## Citation [optional]
89