row56 commited on
Commit
a909fe8
·
verified ·
1 Parent(s): 0df4742

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -1
README.md CHANGED
@@ -97,6 +97,127 @@ Additionally, the model achieves high transferability on **i2b2** data (1,118 ad
97
 
98
  ## Repository Structure
99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
 
102
- ProtoPatient/ ├── proto_model/ │ ├── proto.py │ ├── utils.py │ ├── metrics.py │ └── init.py ├── config.json ├── model.safetensors ├── tokenizer.json ├── tokenizer_config.json ├── vocab.txt ├── README.md └── .gitattributes
 
97
 
98
  ## Repository Structure
99
 
100
+ ProtoPatient/
101
+ ├── proto_model/
102
+ │ ├── proto.py
103
+ │ ├── utils.py
104
+ │ ├── metrics.py
105
+ │ └── __init__.py
106
+ ├── config.json
107
+ ├── model.safetensors
108
+ ├── tokenizer.json
109
+ ├── tokenizer_config.json
110
+ ├── vocab.txt
111
+ ├── README.md
112
+ └── .gitattributes
113
+
114
+
115
+ ## How to Use the Model
116
+
117
+ ### 1. Install Dependencies
118
+
119
+ ```bash
120
+ pip install transformers torch
121
+ ```
122
+
123
+ ### 2. Load the Model via Hugging Face
124
+
125
+ ```python
126
+ from transformers import AutoTokenizer, AutoModel
127
+
128
+ repo_id = "row56/ProtoPatient"
129
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
130
+ model = AutoModel.from_pretrained(repo_id)
131
+ model.eval()
132
+
133
+ sample_text = "This patient presents with severe headaches and nausea..."
134
+ inputs = tokenizer(sample_text, return_tensors="pt")
135
+ outputs = model(**inputs)
136
+ print("Output shape:", outputs.last_hidden_state.shape)
137
+ ```
138
+
139
+ ## 3. Interpreting Outputs
140
+
141
+ For a full prototypical classification workflow, use the custom modules in `proto_model/` (e.g., `ProtoForMultiLabelClassification`) to inspect:
142
+ - Which tokens receive high attention for each diagnosis.
143
+ - Which prototypical patients are retrieved as similar examples.
144
+
145
+ Using the standard `AutoModel` returns raw embeddings; the custom code is required for full label-wise attention and prototype retrieval.
146
+
147
+ ---
148
+
149
+ ## 4. (Optional) Hugging Face Pipelines
150
+
151
+ Integrate the model into a pipeline for feature extraction:
152
+
153
+ ```python
154
+ from transformers import pipeline
155
+
156
+ extractor = pipeline("feature-extraction", model=repo_id, tokenizer=repo_id)
157
+ embeddings = extractor("Severe headaches and vomiting...")
158
+ print(len(embeddings), len(embeddings[0])) # Token-level feature vectors
159
+ ```
160
+
161
+ # Intended Use, Limitations & Ethical Considerations
162
+
163
+ ## Intended Use
164
+
165
+ - **Research & Education:**
166
+ ProtoPatient is designed primarily for academic research and educational purposes in clinical NLP.
167
+
168
+ - **Interpretability Demonstration:**
169
+ The model demonstrates how prototype-based methods can provide interpretable multi-label classification on clinical admission notes.
170
+
171
+ ## Limitations
172
+
173
+ - **Generalization:**
174
+ The model was trained on public ICU datasets (MIMIC-III, i2b2) and may not generalize to other patient populations.
175
+
176
+ - **Prototype Scope:**
177
+ The current version uses a single prototype per diagnosis, though some diagnoses might have multiple typical presentations—this is an area for future improvement.
178
+
179
+ - **Inter-diagnosis Relationships:**
180
+ The model does not explicitly model relationships (e.g., conflicts or comorbidities) between different diagnoses.
181
+
182
+ ## Ethical & Regulatory Considerations
183
+
184
+ - **Not for Direct Clinical Use:**
185
+ This model is not intended for direct clinical decision-making. Always consult healthcare professionals.
186
+
187
+ - **Bias and Fairness:**
188
+ Users should be aware of potential biases in the training data; rare conditions might still be misclassified.
189
+
190
+ - **Patient Privacy:**
191
+ When applying the model to real clinical data, patient privacy must be strictly maintained.
192
+
193
+ ---
194
+
195
+ # Example Interpretability Output
196
+
197
+ Based on the approach described in the paper (see Section 5 and Table 5):
198
+
199
+ - **Highlighted Tokens:**
200
+ Tokens such as “worst headache of her life,” “vomiting,” “fever,” and “infiltrate” strongly indicate specific diagnoses.
201
+
202
+ - **Prototypical Sample:**
203
+ A snippet from a training patient with similar text segments provides a rationale for the prediction.
204
+
205
+ *This interpretability output aids clinicians in understanding the model's reasoning – for example: "The system suggests intracerebral hemorrhage because the patient's note closely resembles typical cases with that diagnosis."*
206
+
207
+ ---
208
+
209
+ # Recommended Citation
210
+
211
+ If you use ProtoPatient in your research, please cite:
212
+
213
+ ```bibtex
214
+ @misc{vanaken2022this,
215
+ title={This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Clinical Text},
216
+ author={van Aken, Betty and Papaioannou, Jens-Michalis and Naik, Marcel G. and Eleftheriadis, Georgios and Nejdl, Wolfgang and Gers, Felix A. and L{\"o}ser, Alexander},
217
+ year={2022},
218
+ eprint={2210.08500},
219
+ archivePrefix={arXiv},
220
+ primaryClass={cs.CL}
221
+ }
222
 
223