Spaces:
Sleeping
Sleeping
Upload 6 files
Browse files- app.py +1342 -0
- config.py +86 -0
- create_sample_data.py +251 -0
- preembed_trials.py +445 -0
- requirements.txt +19 -0
- sample_patient_notes.csv +1312 -0
app.py
ADDED
|
@@ -0,0 +1,1342 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
# -*- coding: utf-8 -*-
|
| 3 |
+
|
| 4 |
+
"""
|
| 5 |
+
Clinical Trial Matching Pipeline - Gradio Web Interface
|
| 6 |
+
|
| 7 |
+
This interface allows users to:
|
| 8 |
+
1. Configure models (tagger, embedder, LLM)
|
| 9 |
+
2. Upload trial space database OR load pre-embedded trials
|
| 10 |
+
3. Upload patient notes or enter patient summary
|
| 11 |
+
4. Get ranked trial recommendations with eligibility predictions
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import gradio as gr
|
| 15 |
+
import pandas as pd
|
| 16 |
+
import numpy as np
|
| 17 |
+
import torch
|
| 18 |
+
import re
|
| 19 |
+
import os
|
| 20 |
+
import json
|
| 21 |
+
from typing import List, Tuple, Optional, Dict
|
| 22 |
+
from pathlib import Path
|
| 23 |
+
import tempfile
|
| 24 |
+
|
| 25 |
+
# HuggingFace imports
|
| 26 |
+
from transformers import (
|
| 27 |
+
AutoTokenizer,
|
| 28 |
+
AutoModelForSequenceClassification,
|
| 29 |
+
pipeline
|
| 30 |
+
)
|
| 31 |
+
from sentence_transformers import SentenceTransformer
|
| 32 |
+
from datasets import load_dataset
|
| 33 |
+
|
| 34 |
+
# Try to import configuration
|
| 35 |
+
try:
|
| 36 |
+
import config
|
| 37 |
+
HAS_CONFIG = True
|
| 38 |
+
print("✓ Found config.py - will auto-load models on startup")
|
| 39 |
+
except ImportError:
|
| 40 |
+
HAS_CONFIG = False
|
| 41 |
+
print("○ No config.py found - using manual model loading")
|
| 42 |
+
|
| 43 |
+
# Global state to hold loaded models and embedded trials
|
| 44 |
+
class AppState:
|
| 45 |
+
def __init__(self):
|
| 46 |
+
self.tagger_model = None
|
| 47 |
+
self.tagger_tokenizer = None
|
| 48 |
+
self.embedder_model = None
|
| 49 |
+
self.embedder_tokenizer = None
|
| 50 |
+
self.llm_model = None
|
| 51 |
+
self.llm_tokenizer = None
|
| 52 |
+
self.trial_checker_model = None
|
| 53 |
+
self.trial_checker_tokenizer = None
|
| 54 |
+
self.boilerplate_checker_model = None
|
| 55 |
+
self.boilerplate_checker_tokenizer = None
|
| 56 |
+
|
| 57 |
+
self.trial_spaces_df = None
|
| 58 |
+
self.trial_embeddings = None
|
| 59 |
+
|
| 60 |
+
self.trial_preview_df = None
|
| 61 |
+
|
| 62 |
+
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 63 |
+
|
| 64 |
+
# Store auto-load status messages to display in UI
|
| 65 |
+
self.auto_load_status = {
|
| 66 |
+
"tagger": "",
|
| 67 |
+
"embedder": "",
|
| 68 |
+
"llm": "",
|
| 69 |
+
"trial_checker": "",
|
| 70 |
+
"boilerplate_checker": "",
|
| 71 |
+
"trials": ""
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
def reset_trials(self):
|
| 75 |
+
self.trial_spaces_df = None
|
| 76 |
+
self.trial_embeddings = None
|
| 77 |
+
self.trial_preview_df = None
|
| 78 |
+
|
| 79 |
+
state = AppState()
|
| 80 |
+
|
| 81 |
+
# ============================================================================
|
| 82 |
+
# UTILITY FUNCTIONS
|
| 83 |
+
# ============================================================================
|
| 84 |
+
|
| 85 |
+
def split_into_excerpts(text: str) -> List[str]:
|
| 86 |
+
"""Split text into sentence-level excerpts."""
|
| 87 |
+
if not text or pd.isna(text):
|
| 88 |
+
return []
|
| 89 |
+
t = re.sub(r'[\n\r]+', ' ', text.strip())
|
| 90 |
+
t = re.sub(r'\s+', ' ', t)
|
| 91 |
+
if not t:
|
| 92 |
+
return []
|
| 93 |
+
t2 = t.replace(". ", "<excerpt break>")
|
| 94 |
+
parts = [p.strip() for p in t2.split("<excerpt break>") if p.strip()]
|
| 95 |
+
return parts
|
| 96 |
+
|
| 97 |
+
def truncate_text(text: str, tokenizer, max_tokens: int = 1500) -> str:
|
| 98 |
+
"""Truncate text to a maximum number of tokens."""
|
| 99 |
+
return tokenizer.decode(
|
| 100 |
+
tokenizer.encode(text, add_special_tokens=True, truncation=True, max_length=max_tokens),
|
| 101 |
+
skip_special_tokens=True
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
def format_probability_visual(val, is_exclusion=False):
|
| 105 |
+
"""
|
| 106 |
+
Helper to format probabilities with visual indicators (emojis) for the dataframe.
|
| 107 |
+
"""
|
| 108 |
+
try:
|
| 109 |
+
val_float = float(val)
|
| 110 |
+
except:
|
| 111 |
+
return val
|
| 112 |
+
|
| 113 |
+
# Logic for Eligibility (High is good)
|
| 114 |
+
if not is_exclusion:
|
| 115 |
+
if val_float >= 0.8:
|
| 116 |
+
return f"🟢 **{val_float:.2f}**"
|
| 117 |
+
elif val_float >= 0.5:
|
| 118 |
+
return f"🟡 {val_float:.2f}"
|
| 119 |
+
else:
|
| 120 |
+
return f"🔴 {val_float:.2f}"
|
| 121 |
+
|
| 122 |
+
# Logic for Exclusion (High is bad)
|
| 123 |
+
else:
|
| 124 |
+
if val_float >= 0.5:
|
| 125 |
+
return f"🔴 **{val_float:.2f}**" # High exclusion prob is bad
|
| 126 |
+
elif val_float >= 0.2:
|
| 127 |
+
return f"🟡 {val_float:.2f}"
|
| 128 |
+
else:
|
| 129 |
+
return f"🟢 {val_float:.2f}" # Low exclusion prob is good
|
| 130 |
+
|
| 131 |
+
# ============================================================================
|
| 132 |
+
# TRIAL SPACE EXTRACTION CONSTANTS
|
| 133 |
+
# ============================================================================
|
| 134 |
+
|
| 135 |
+
MAX_EMBEDDER_SEQ_LEN = 2500
|
| 136 |
+
MAX_LONGTEXT_SEQ_LEN = 110000
|
| 137 |
+
MAX_TRIAL_CHECKER_LENGTH = 4096
|
| 138 |
+
MAX_BOILERPLATE_CHECKER_LENGTH = 3192
|
| 139 |
+
REASONING_MARKER = "assistantfinal"
|
| 140 |
+
BOILERPLATE_MARKER = "Boilerplate"
|
| 141 |
+
|
| 142 |
+
TRIAL_SPACE_PROMPT_HEADER = (
|
| 143 |
+
"You are an expert clinical oncologist with an encyclopedic knowledge of cancer and its treatments.\n"
|
| 144 |
+
"Your job is to review a clinical trial document and extract a list of structured clinical spaces that are eligible for that trial.\n"
|
| 145 |
+
"A clinical space is defined as a unique combination of patient age range, sex (if any sex criteria), cancer primary site, histology, which treatments a patient must have received, "
|
| 146 |
+
"which treatments a patient must not have received, cancer burden (eg presence of metastatic disease; this also includes cancer type-specific prognostic scores, risk indices, or categories), tumor biomarkers (such as "
|
| 147 |
+
"germline or somatic gene mutations or alterations, or protein expression on tumor), that a patient must have or must not have to "
|
| 148 |
+
"be eligible for the trial. \n"
|
| 149 |
+
"With respect to sex criteria: For cancers originating in organs only present in one sex, you must assume the sex criteria even if not stated explicitly.\n"
|
| 150 |
+
"For example, a trial space for uterine, ovarian, vulvar, vaginal, or fallopian tube cancer must be assumed to be for female patients.\n"
|
| 151 |
+
"Similarly, a trial space for testicular, penile, or prostate cancer must be assumed to be for male patients.\n"
|
| 152 |
+
"For all other cancer types (including breast cancer), you shoulud assume the trial is open to both sexes unless the clinical trial document states otherwise.\n"
|
| 153 |
+
"Trials often specify that a particular treatment is excluded only if it was given within a short period of time, for example 14 days, "
|
| 154 |
+
"one month, etc , prior to trial start. This is called a washout period. Do not include this type of time-specific treatment washout "
|
| 155 |
+
"eligibility criteria in your output at all.\n"
|
| 156 |
+
"Some trials have only one space, while others have several. Do not output a space that contains multiple cancer types and/or histologies. "
|
| 157 |
+
"Instead, generate separate spaces for each cancer type/histology combination.\n"
|
| 158 |
+
"CRITICAL: Each trial space must contain all information necessary to define that space on its own. It may not refer to other previously "
|
| 159 |
+
"defined spaces for the same trial, since for later use, the spaces will be extracted and separated from each other. YOU MAY NOT include "
|
| 160 |
+
"text describing a given space that refers to a previous space; eg, \"Same as above\"-style output is not allowed!\n"
|
| 161 |
+
"For biomarkers, if the trial specifies whether the biomarker will be assessed during screening, note that.\n"
|
| 162 |
+
"Spell out cancer types; do not abbreviate them. For example, write \"non-small cell lung cancer\" rather than \"NSCLC\".\n"
|
| 163 |
+
"Structure your output like this, as a list of spaces, with spaces separated by newlines, as below. STRICTLY adhere to the formatting.\n"
|
| 164 |
+
"1. Age range allowed: <age_range_allowed>. Sex allowed: <sex_allowed>. Cancer type allowed: <cancer_type_allowed>. Histology allowed: <histology_allowed>. Cancer burden allowed: <cancer_burden_allowed>. Prior treatment required: <prior_treatments_requred>. Prior treatment excluded: <prior_treatments_excluded>. Biomarkers required: <biomarkers_required>. Biomarkers excluded: <biomarkers_excluded>. \n"
|
| 165 |
+
"2. Cancer type allowed: <cancer_type_allowed>, etc.\n"
|
| 166 |
+
"If a concept is not relevant, such as if there are no prior treatents required, simply output NA for that concept.\n"
|
| 167 |
+
"CRITICAL: Anytime you provide a list for a particular concept, you must be completely clear on whether \"or\" versus \"and\" logic applies "
|
| 168 |
+
"to the list. For example, do not output \"EGFR L858R mutant, TP53 mutant\"; if both are required, output \"EGFR L858R mutant and TP53 mutant\". "
|
| 169 |
+
"As another example, do not output \"ER+, PR+\"; if the patient can have either an ER or a PR positive tumor, output \"ER+ or PR+\".\n"
|
| 170 |
+
"NEVER put a newline within a single trial space.\n"
|
| 171 |
+
"After you output the trial spaces, output a newline, then the text \"Boilerplate exclusions:\" VERBATIM, then another newline.\n"
|
| 172 |
+
"Then, list exclusion criteria described in the trial text that are unrelated to the trial space definitions. Such exclusions tend to be common "
|
| 173 |
+
"to clinical trials in general.\n"
|
| 174 |
+
"Common boilerplate exclusion criteria include a history of pneumonitis, heart failure, renal dysfunction, liver dysfunction, uncontrolled brain "
|
| 175 |
+
"metastases, HIV or hepatitis, and poor performance status.\n"
|
| 176 |
+
"ALWAYS output plain text only. NEVER output unicode, Markdown, or tables.\n"
|
| 177 |
+
)
|
| 178 |
+
|
| 179 |
+
TRIAL_SPACE_PROMPT_SUFFIX = (
|
| 180 |
+
"Now, generate your list of the trial space(s), followed by any boilerplate exclusions, formatted as above.\n"
|
| 181 |
+
"Do not provide any introductory, explanatory, concluding, or disclaimer text.\n"
|
| 182 |
+
"Reminder: Treatment history is an important component of trial space definitions, but treatment history \"washout\" requirements that are "
|
| 183 |
+
"described as applying only in a given period of time prior to trial treatment MUST BE IGNORED.\n"
|
| 184 |
+
"CRITICAL: A given trial space MUST NEVER refer to another previously defined space. You must NEVER output text like \"same as #1\" or "
|
| 185 |
+
"\"same criteria as above.\" Instead, you MUST REPEAT all relevant criteria for each new space SO THAT IT STANDS ON ITS OWN. A user who later "
|
| 186 |
+
"looks at the text for one space will not have access to text for other spaces, and so output like \"Same criteria as #1...\" renders a space useless!"
|
| 187 |
+
)
|
| 188 |
+
|
| 189 |
+
|
| 190 |
+
# ============================================================================
|
| 191 |
+
# AUTO-LOADING FROM CONFIG
|
| 192 |
+
# ============================================================================
|
| 193 |
+
|
| 194 |
+
def auto_load_models_from_config():
|
| 195 |
+
"""Auto-load models specified in config.py"""
|
| 196 |
+
if not HAS_CONFIG:
|
| 197 |
+
return
|
| 198 |
+
|
| 199 |
+
print("\n" + "="*70)
|
| 200 |
+
print("AUTO-LOADING MODELS FROM CONFIG")
|
| 201 |
+
print("="*70)
|
| 202 |
+
|
| 203 |
+
# Load tagger
|
| 204 |
+
if config.MODEL_CONFIG.get("tagger"):
|
| 205 |
+
print(f"\n[1/5] Loading tagger: {config.MODEL_CONFIG['tagger']}")
|
| 206 |
+
status, _ = load_tagger_model(config.MODEL_CONFIG["tagger"])
|
| 207 |
+
state.auto_load_status["tagger"] = status
|
| 208 |
+
print(status)
|
| 209 |
+
|
| 210 |
+
# Load embedder
|
| 211 |
+
if config.MODEL_CONFIG.get("embedder"):
|
| 212 |
+
print(f"\n[2/5] Loading embedder: {config.MODEL_CONFIG['embedder']}")
|
| 213 |
+
status, _, _ = load_embedder_model(config.MODEL_CONFIG["embedder"])
|
| 214 |
+
state.auto_load_status["embedder"] = status
|
| 215 |
+
print(status)
|
| 216 |
+
|
| 217 |
+
# Load LLM
|
| 218 |
+
if config.MODEL_CONFIG.get("llm"):
|
| 219 |
+
print(f"\n[3/5] Loading LLM: {config.MODEL_CONFIG['llm']}")
|
| 220 |
+
status, _ = load_llm_model(config.MODEL_CONFIG["llm"])
|
| 221 |
+
state.auto_load_status["llm"] = status
|
| 222 |
+
print(status)
|
| 223 |
+
|
| 224 |
+
# Load trial checker
|
| 225 |
+
if config.MODEL_CONFIG.get("trial_checker"):
|
| 226 |
+
print(f"\n[4/5] Loading trial checker: {config.MODEL_CONFIG['trial_checker']}")
|
| 227 |
+
status, _ = load_trial_checker(config.MODEL_CONFIG["trial_checker"])
|
| 228 |
+
state.auto_load_status["trial_checker"] = status
|
| 229 |
+
print(status)
|
| 230 |
+
|
| 231 |
+
# Load boilerplate checker
|
| 232 |
+
if config.MODEL_CONFIG.get("boilerplate_checker"):
|
| 233 |
+
print(f"\n[5/5] Loading boilerplate checker: {config.MODEL_CONFIG['boilerplate_checker']}")
|
| 234 |
+
status, _ = load_boilerplate_checker(config.MODEL_CONFIG["boilerplate_checker"])
|
| 235 |
+
state.auto_load_status["boilerplate_checker"] = status
|
| 236 |
+
print(status)
|
| 237 |
+
|
| 238 |
+
print("\n" + "="*70)
|
| 239 |
+
print("MODEL AUTO-LOADING COMPLETE")
|
| 240 |
+
print("="*70 + "\n")
|
| 241 |
+
|
| 242 |
+
def auto_load_trials_from_config():
|
| 243 |
+
"""Auto-load trial database from config.py - prefers pre-embedded over fresh embedding."""
|
| 244 |
+
if not HAS_CONFIG:
|
| 245 |
+
return
|
| 246 |
+
|
| 247 |
+
# Check for pre-embedded trials first (much faster)
|
| 248 |
+
if hasattr(config, 'PREEMBEDDED_TRIALS') and config.PREEMBEDDED_TRIALS:
|
| 249 |
+
preembed_path = config.PREEMBEDDED_TRIALS
|
| 250 |
+
|
| 251 |
+
print("\n" + "="*70)
|
| 252 |
+
print(f"AUTO-LOADING PRE-EMBEDDED TRIALS: {preembed_path}")
|
| 253 |
+
print("="*70)
|
| 254 |
+
|
| 255 |
+
status, preview = load_preembedded_trials(preembed_path)
|
| 256 |
+
state.auto_load_status["trials"] = status
|
| 257 |
+
# Store the preview so it can be displayed in the UI
|
| 258 |
+
state.trial_preview_df = preview
|
| 259 |
+
|
| 260 |
+
print("="*70)
|
| 261 |
+
print("PRE-EMBEDDED TRIALS AUTO-LOADING COMPLETE")
|
| 262 |
+
print("="*70 + "\n")
|
| 263 |
+
return
|
| 264 |
+
|
| 265 |
+
# Fall back to fresh embedding if no pre-embedded trials specified
|
| 266 |
+
if not hasattr(config, 'DEFAULT_TRIAL_DB') or not config.DEFAULT_TRIAL_DB:
|
| 267 |
+
print("○ No trial database specified in config")
|
| 268 |
+
return
|
| 269 |
+
|
| 270 |
+
if not os.path.exists(config.DEFAULT_TRIAL_DB):
|
| 271 |
+
print(f"✗ Default trial database not found: {config.DEFAULT_TRIAL_DB}")
|
| 272 |
+
state.auto_load_status["trials"] = f"✗ Trial database file not found: {config.DEFAULT_TRIAL_DB}"
|
| 273 |
+
return
|
| 274 |
+
|
| 275 |
+
if state.embedder_model is None:
|
| 276 |
+
print("○ Embedder not loaded yet - skipping trial database auto-load")
|
| 277 |
+
state.auto_load_status["trials"] = "○ Waiting for embedder model to be loaded..."
|
| 278 |
+
return
|
| 279 |
+
|
| 280 |
+
print("\n" + "="*70)
|
| 281 |
+
print(f"AUTO-LOADING TRIAL DATABASE: {config.DEFAULT_TRIAL_DB}")
|
| 282 |
+
print("="*70)
|
| 283 |
+
|
| 284 |
+
# Create a temporary file-like object
|
| 285 |
+
class FilePath:
|
| 286 |
+
def __init__(self, path):
|
| 287 |
+
self.name = path
|
| 288 |
+
|
| 289 |
+
status, preview = load_and_embed_trials(FilePath(config.DEFAULT_TRIAL_DB), show_progress=True)
|
| 290 |
+
state.auto_load_status["trials"] = status
|
| 291 |
+
# Store the preview so it can be displayed in the UI
|
| 292 |
+
state.trial_preview_df = preview
|
| 293 |
+
|
| 294 |
+
print("="*70)
|
| 295 |
+
print("TRIAL DATABASE AUTO-LOADING COMPLETE")
|
| 296 |
+
print("="*70 + "\n")
|
| 297 |
+
|
| 298 |
+
# ============================================================================
|
| 299 |
+
# MODEL LOADING FUNCTIONS
|
| 300 |
+
# ============================================================================
|
| 301 |
+
|
| 302 |
+
def load_tagger_model(model_path: str) -> Tuple[str, str]:
|
| 303 |
+
"""Load TinyBERT tagger model."""
|
| 304 |
+
try:
|
| 305 |
+
state.tagger_tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 306 |
+
state.tagger_model = pipeline(
|
| 307 |
+
"text-classification",
|
| 308 |
+
model=model_path,
|
| 309 |
+
tokenizer=state.tagger_tokenizer,
|
| 310 |
+
device=0 if state.device == "cuda" else -1,
|
| 311 |
+
truncation=True,
|
| 312 |
+
padding="max_length",
|
| 313 |
+
max_length=128
|
| 314 |
+
)
|
| 315 |
+
return f"✓ Tagger model loaded from {model_path}", ""
|
| 316 |
+
except Exception as e:
|
| 317 |
+
return f"✗ Error loading tagger model: {str(e)}", str(e)
|
| 318 |
+
|
| 319 |
+
def load_embedder_model(model_path: str) -> Tuple[str, str, str]:
|
| 320 |
+
"""Load sentence transformer embedder model."""
|
| 321 |
+
try:
|
| 322 |
+
# Check if trials are already loaded
|
| 323 |
+
will_need_reembed = state.trial_spaces_df is not None and len(state.trial_spaces_df) > 0
|
| 324 |
+
|
| 325 |
+
if will_need_reembed:
|
| 326 |
+
warning_msg = f"\n⚠️ Warning: {len(state.trial_spaces_df)} trials are currently loaded. They will need to be re-embedded with the new model."
|
| 327 |
+
else:
|
| 328 |
+
warning_msg = ""
|
| 329 |
+
|
| 330 |
+
state.embedder_model = SentenceTransformer(model_path, device=state.device, trust_remote_code=True)
|
| 331 |
+
state.embedder_tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 332 |
+
|
| 333 |
+
# Set the instruction prompt
|
| 334 |
+
try:
|
| 335 |
+
state.embedder_model.prompts['query'] = (
|
| 336 |
+
"Instruct: Given a cancer patient summary, retrieve clinical trial options "
|
| 337 |
+
"that are reasonable for that patient; or, given a clinical trial option, "
|
| 338 |
+
"retrieve cancer patients who are reasonable candidates for that trial."
|
| 339 |
+
)
|
| 340 |
+
except:
|
| 341 |
+
pass
|
| 342 |
+
|
| 343 |
+
try:
|
| 344 |
+
state.embedder_model.max_seq_length = MAX_EMBEDDER_SEQ_LEN
|
| 345 |
+
except:
|
| 346 |
+
pass
|
| 347 |
+
|
| 348 |
+
success_msg = f"✓ Embedder model loaded from {model_path}{warning_msg}"
|
| 349 |
+
|
| 350 |
+
# If trials were loaded, invalidate embeddings
|
| 351 |
+
if will_need_reembed:
|
| 352 |
+
state.trial_embeddings = None
|
| 353 |
+
success_msg += "\n→ Trial embeddings cleared. Please reload trial database to re-embed."
|
| 354 |
+
|
| 355 |
+
return success_msg, "", warning_msg
|
| 356 |
+
except Exception as e:
|
| 357 |
+
return f"✗ Error loading embedder model: {str(e)}", str(e), ""
|
| 358 |
+
|
| 359 |
+
def load_llm_model(model_path: str) -> Tuple[str, str]:
|
| 360 |
+
"""Load LLM for patient summarization."""
|
| 361 |
+
try:
|
| 362 |
+
# Check if vLLM is available
|
| 363 |
+
try:
|
| 364 |
+
from vllm import LLM, SamplingParams
|
| 365 |
+
|
| 366 |
+
# Determine tensor parallel size
|
| 367 |
+
gpu_count = torch.cuda.device_count()
|
| 368 |
+
tp_size = min(gpu_count, 4) if gpu_count > 1 else 1
|
| 369 |
+
|
| 370 |
+
state.llm_model = LLM(
|
| 371 |
+
model=model_path,
|
| 372 |
+
tensor_parallel_size=tp_size,
|
| 373 |
+
gpu_memory_utilization=0.60,
|
| 374 |
+
max_model_len=15000
|
| 375 |
+
)
|
| 376 |
+
state.llm_tokenizer = state.llm_model.get_tokenizer()
|
| 377 |
+
return f"✓ LLM loaded from {model_path} (vLLM, tp={tp_size})", ""
|
| 378 |
+
except ImportError:
|
| 379 |
+
# Fallback to HuggingFace transformers
|
| 380 |
+
from transformers import AutoModelForCausalLM
|
| 381 |
+
state.llm_tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 382 |
+
state.llm_model = AutoModelForCausalLM.from_pretrained(
|
| 383 |
+
model_path,
|
| 384 |
+
torch_dtype=torch.float16 if state.device == "cuda" else torch.float32,
|
| 385 |
+
device_map="auto",
|
| 386 |
+
trust_remote_code=True
|
| 387 |
+
)
|
| 388 |
+
return f"✓ LLM loaded from {model_path} (HuggingFace)", ""
|
| 389 |
+
except Exception as e:
|
| 390 |
+
return f"✗ Error loading LLM: {str(e)}", str(e)
|
| 391 |
+
|
| 392 |
+
def load_trial_checker(model_path: str) -> Tuple[str, str]:
|
| 393 |
+
"""Load ModernBERT trial checker."""
|
| 394 |
+
try:
|
| 395 |
+
state.trial_checker_tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 396 |
+
state.trial_checker_model = AutoModelForSequenceClassification.from_pretrained(
|
| 397 |
+
model_path,
|
| 398 |
+
torch_dtype=torch.float16 if state.device == "cuda" else torch.float32
|
| 399 |
+
).to(state.device)
|
| 400 |
+
state.trial_checker_model.eval()
|
| 401 |
+
return f"✓ Trial checker loaded from {model_path}", ""
|
| 402 |
+
except Exception as e:
|
| 403 |
+
return f"✗ Error loading trial checker: {str(e)}", str(e)
|
| 404 |
+
|
| 405 |
+
def load_boilerplate_checker(model_path: str) -> Tuple[str, str]:
|
| 406 |
+
"""Load ModernBERT boilerplate checker."""
|
| 407 |
+
try:
|
| 408 |
+
state.boilerplate_checker_tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 409 |
+
state.boilerplate_checker_model = AutoModelForSequenceClassification.from_pretrained(
|
| 410 |
+
model_path,
|
| 411 |
+
torch_dtype=torch.float16 if state.device == "cuda" else torch.float32
|
| 412 |
+
).to(state.device)
|
| 413 |
+
state.boilerplate_checker_model.eval()
|
| 414 |
+
return f"✓ Boilerplate checker loaded from {model_path}", ""
|
| 415 |
+
except Exception as e:
|
| 416 |
+
return f"✗ Error loading boilerplate checker: {str(e)}", str(e)
|
| 417 |
+
|
| 418 |
+
# ============================================================================
|
| 419 |
+
# TRIAL SPACE PROCESSING (WITH PRE-EMBEDDING SUPPORT)
|
| 420 |
+
# ============================================================================
|
| 421 |
+
|
| 422 |
+
def load_preembedded_trials(path_or_url: str) -> Tuple[str, pd.DataFrame]:
|
| 423 |
+
"""Load pre-embedded trial database from a local parquet file or a Huggingface URL."""
|
| 424 |
+
try:
|
| 425 |
+
|
| 426 |
+
print(f"\n{'='*70}")
|
| 427 |
+
print(f"LOADING PRE-EMBEDDED TRIALS")
|
| 428 |
+
print(f"{'='*70}")
|
| 429 |
+
print(f"Loading from: {path_or_url}")
|
| 430 |
+
|
| 431 |
+
# Check if it's a URL or a local path
|
| 432 |
+
if path_or_url.startswith("http"):
|
| 433 |
+
# It's a URL, load from Huggingface
|
| 434 |
+
print("Detected URL, loading from Huggingface Hub...")
|
| 435 |
+
dataset = load_dataset("parquet", data_files=path_or_url, split='train')
|
| 436 |
+
df = dataset.to_pandas()
|
| 437 |
+
print(f"✓ Loaded {len(df)} trials from Hub")
|
| 438 |
+
|
| 439 |
+
|
| 440 |
+
else:
|
| 441 |
+
# It's a local path
|
| 442 |
+
parquet_path = path_or_url
|
| 443 |
+
if not parquet_path.endswith('.parquet'):
|
| 444 |
+
parquet_path = parquet_path + '.parquet'
|
| 445 |
+
|
| 446 |
+
# Check file exists
|
| 447 |
+
if not os.path.exists(parquet_path):
|
| 448 |
+
return f"✗ Pre-embedded parquet file not found: {parquet_path}", None
|
| 449 |
+
|
| 450 |
+
# Load parquet file
|
| 451 |
+
print(f"Loading trial dataframe with embeddings...")
|
| 452 |
+
df = pd.read_parquet(parquet_path)
|
| 453 |
+
print(f"✓ Loaded {len(df)} trials from local file")
|
| 454 |
+
|
| 455 |
+
# Check for embedding column
|
| 456 |
+
if 'embedding' not in df.columns:
|
| 457 |
+
return f"✗ Parquet file missing 'embedding' column: {path_or_url}", None
|
| 458 |
+
|
| 459 |
+
# Extract embeddings from the column and convert to numpy array
|
| 460 |
+
print(f"Extracting embeddings...")
|
| 461 |
+
embeddings = np.array(df['embedding'].tolist())
|
| 462 |
+
print(f"✓ Extracted embeddings: {embeddings.shape}")
|
| 463 |
+
|
| 464 |
+
# Remove embedding column from dataframe (not needed in the df itself)
|
| 465 |
+
df_without_embeddings = df.drop(columns=['embedding'])
|
| 466 |
+
|
| 467 |
+
# Store in state
|
| 468 |
+
state.trial_spaces_df = df_without_embeddings
|
| 469 |
+
state.trial_embeddings = embeddings
|
| 470 |
+
|
| 471 |
+
print(f"{'='*70}")
|
| 472 |
+
print(f"PRE-EMBEDDED TRIALS LOADED SUCCESSFULLY")
|
| 473 |
+
print(f"{'='*70}\n")
|
| 474 |
+
|
| 475 |
+
preview = df_without_embeddings[['nct_id', 'this_space']].head(10)
|
| 476 |
+
return f"✓ Loaded {len(df)} pre-embedded trials from {path_or_url}", preview
|
| 477 |
+
|
| 478 |
+
except Exception as e:
|
| 479 |
+
import traceback
|
| 480 |
+
traceback.print_exc()
|
| 481 |
+
return f"✗ Error loading pre-embedded trials: {str(e)}", None
|
| 482 |
+
|
| 483 |
+
def load_and_embed_trials(file, show_progress: bool = False) -> Tuple[str, pd.DataFrame]:
|
| 484 |
+
"""Load trial spaces CSV/Excel and embed them."""
|
| 485 |
+
try:
|
| 486 |
+
if state.embedder_model is None:
|
| 487 |
+
return "✗ Please load the embedder model first!", None
|
| 488 |
+
|
| 489 |
+
# Read file
|
| 490 |
+
if file.name.endswith('.csv'):
|
| 491 |
+
df = pd.read_csv(file.name)
|
| 492 |
+
elif file.name.endswith(('.xlsx', '.xls')):
|
| 493 |
+
df = pd.read_excel(file.name)
|
| 494 |
+
else:
|
| 495 |
+
return "✗ Unsupported file format. Use CSV or Excel.", None
|
| 496 |
+
|
| 497 |
+
# Check required columns
|
| 498 |
+
required_cols = ['nct_id', 'this_space', 'trial_text', 'trial_boilerplate_text']
|
| 499 |
+
missing = [col for col in required_cols if col not in df.columns]
|
| 500 |
+
if missing:
|
| 501 |
+
return f"✗ Missing required columns: {', '.join(missing)}", None
|
| 502 |
+
|
| 503 |
+
# Clean data
|
| 504 |
+
df = df[~df['this_space'].isnull()].copy()
|
| 505 |
+
df['trial_boilerplate_text'] = df['trial_boilerplate_text'].fillna('')
|
| 506 |
+
|
| 507 |
+
# Prepare texts for embedding
|
| 508 |
+
df['this_space_trunc'] = df['this_space'].apply(
|
| 509 |
+
lambda x: truncate_text(str(x), state.embedder_tokenizer, max_tokens=MAX_EMBEDDER_SEQ_LEN)
|
| 510 |
+
)
|
| 511 |
+
|
| 512 |
+
# Add instruction prefix
|
| 513 |
+
prefix = (
|
| 514 |
+
"Instruct: Given a cancer patient summary, retrieve clinical trial options "
|
| 515 |
+
"that are reasonable for that patient; or, given a clinical trial option, "
|
| 516 |
+
"retrieve cancer patients who are reasonable candidates for that trial. "
|
| 517 |
+
)
|
| 518 |
+
texts_to_embed = [prefix + txt for txt in df['this_space_trunc'].tolist()]
|
| 519 |
+
|
| 520 |
+
# Embed with progress
|
| 521 |
+
if not show_progress:
|
| 522 |
+
gr.Info(f"Embedding {len(df)} trial spaces...")
|
| 523 |
+
else:
|
| 524 |
+
print(f"Embedding {len(df)} trial spaces...")
|
| 525 |
+
|
| 526 |
+
with torch.no_grad():
|
| 527 |
+
embeddings = state.embedder_model.encode(
|
| 528 |
+
texts_to_embed,
|
| 529 |
+
batch_size=64,
|
| 530 |
+
convert_to_tensor=True,
|
| 531 |
+
normalize_embeddings=True,
|
| 532 |
+
show_progress_bar=show_progress,
|
| 533 |
+
prompt='query'
|
| 534 |
+
)
|
| 535 |
+
|
| 536 |
+
# Store in state
|
| 537 |
+
state.trial_spaces_df = df
|
| 538 |
+
state.trial_embeddings = embeddings.cpu().numpy()
|
| 539 |
+
|
| 540 |
+
preview = df[['nct_id', 'this_space']].head(10)
|
| 541 |
+
|
| 542 |
+
success_msg = f"✓ Loaded and embedded {len(df)} trial spaces"
|
| 543 |
+
if show_progress:
|
| 544 |
+
print(success_msg)
|
| 545 |
+
|
| 546 |
+
return success_msg, preview
|
| 547 |
+
|
| 548 |
+
except Exception as e:
|
| 549 |
+
return f"✗ Error processing trials: {str(e)}", None
|
| 550 |
+
|
| 551 |
+
# ============================================================================
|
| 552 |
+
# PATIENT NOTE PROCESSING
|
| 553 |
+
# ============================================================================
|
| 554 |
+
|
| 555 |
+
def process_patient_notes(file, prob_threshold: float = 0.1) -> Tuple[str, str]:
|
| 556 |
+
"""Process patient notes through tagger and create long note."""
|
| 557 |
+
try:
|
| 558 |
+
if state.tagger_model is None:
|
| 559 |
+
return "✗ Please load the tagger model first!", ""
|
| 560 |
+
|
| 561 |
+
# Read file
|
| 562 |
+
if file.name.endswith('.csv'):
|
| 563 |
+
df = pd.read_csv(file.name)
|
| 564 |
+
elif file.name.endswith(('.xlsx', '.xls')):
|
| 565 |
+
df = pd.read_excel(file.name)
|
| 566 |
+
else:
|
| 567 |
+
return "✗ Unsupported file format. Use CSV or Excel.", ""
|
| 568 |
+
|
| 569 |
+
# Check required columns
|
| 570 |
+
if 'date' not in df.columns or 'text' not in df.columns:
|
| 571 |
+
return "✗ File must contain 'date' and 'text' columns", ""
|
| 572 |
+
|
| 573 |
+
# Sort by date
|
| 574 |
+
df['date'] = pd.to_datetime(df['date'], errors='coerce')
|
| 575 |
+
df = df.sort_values('date').reset_index(drop=True)
|
| 576 |
+
|
| 577 |
+
# Extract all excerpts
|
| 578 |
+
all_excerpts = []
|
| 579 |
+
all_dates = []
|
| 580 |
+
all_note_types = []
|
| 581 |
+
|
| 582 |
+
for idx, row in df.iterrows():
|
| 583 |
+
excerpts = split_into_excerpts(str(row['text']))
|
| 584 |
+
note_type = row.get('note_type', 'clinical_note')
|
| 585 |
+
|
| 586 |
+
for exc in excerpts:
|
| 587 |
+
all_excerpts.append(exc)
|
| 588 |
+
all_dates.append(row['date'])
|
| 589 |
+
all_note_types.append(note_type)
|
| 590 |
+
|
| 591 |
+
if not all_excerpts:
|
| 592 |
+
return "✗ No valid excerpts extracted from notes", ""
|
| 593 |
+
|
| 594 |
+
gr.Info(f"Tagging {len(all_excerpts)} excerpts...")
|
| 595 |
+
|
| 596 |
+
# Run tagger
|
| 597 |
+
predictions = state.tagger_model(all_excerpts, batch_size=256)
|
| 598 |
+
|
| 599 |
+
# Extract positive excerpts
|
| 600 |
+
excerpts_df = pd.DataFrame({
|
| 601 |
+
'excerpt': all_excerpts,
|
| 602 |
+
'date': all_dates,
|
| 603 |
+
'note_type': all_note_types,
|
| 604 |
+
'label': [p['label'] for p in predictions],
|
| 605 |
+
'score': [p['score'] for p in predictions]
|
| 606 |
+
})
|
| 607 |
+
|
| 608 |
+
# Calculate positive probability
|
| 609 |
+
excerpts_df['positive_prob'] = np.where(
|
| 610 |
+
excerpts_df['label'] == 'NEGATIVE',
|
| 611 |
+
1.0 - excerpts_df['score'],
|
| 612 |
+
excerpts_df['score']
|
| 613 |
+
)
|
| 614 |
+
|
| 615 |
+
# Filter by threshold
|
| 616 |
+
keep = excerpts_df[excerpts_df['positive_prob'] > prob_threshold].copy()
|
| 617 |
+
|
| 618 |
+
# FIX: Capture the raw count of excerpts *before* grouping
|
| 619 |
+
raw_keep_count = len(keep)
|
| 620 |
+
|
| 621 |
+
if len(keep) == 0:
|
| 622 |
+
return "✗ No excerpts passed the threshold", ""
|
| 623 |
+
|
| 624 |
+
# Group by date and note type
|
| 625 |
+
keep['date_str'] = keep['date'].dt.strftime('%Y-%m-%d')
|
| 626 |
+
keep = keep.groupby(['date_str', 'note_type'])['excerpt'].agg(lambda x: ' '.join(x)).reset_index()
|
| 627 |
+
|
| 628 |
+
keep['date_text'] = (
|
| 629 |
+
keep['date_str'] + " " +
|
| 630 |
+
keep['note_type'] + " " +
|
| 631 |
+
keep['excerpt']
|
| 632 |
+
)
|
| 633 |
+
|
| 634 |
+
# Create long note
|
| 635 |
+
long_note = "\n".join(keep['date_text'].tolist())
|
| 636 |
+
|
| 637 |
+
# FIX: Display the raw count in the stats message
|
| 638 |
+
stats = (
|
| 639 |
+
f"Processed {len(df)} notes → {len(all_excerpts)} excerpts → "
|
| 640 |
+
f"{raw_keep_count} relevant excerpts (threshold={prob_threshold})"
|
| 641 |
+
)
|
| 642 |
+
|
| 643 |
+
return stats, long_note
|
| 644 |
+
|
| 645 |
+
except Exception as e:
|
| 646 |
+
return f"✗ Error processing notes: {str(e)}", ""
|
| 647 |
+
|
| 648 |
+
def summarize_patient_history(long_note: str) -> Tuple[str, str]:
|
| 649 |
+
"""Summarize patient long note using LLM and split into summary and boilerplate sections."""
|
| 650 |
+
try:
|
| 651 |
+
if state.llm_model is None:
|
| 652 |
+
return "✗ Please load the LLM model first!", ""
|
| 653 |
+
|
| 654 |
+
if not long_note or len(long_note.strip()) == 0:
|
| 655 |
+
return "✗ No patient history to summarize", ""
|
| 656 |
+
|
| 657 |
+
# Truncate if needed
|
| 658 |
+
tokens = state.llm_tokenizer.encode(long_note, add_special_tokens=False)
|
| 659 |
+
max_tokens = MAX_LONGTEXT_SEQ_LEN # Leave room for prompt and response
|
| 660 |
+
|
| 661 |
+
if len(tokens) > max_tokens:
|
| 662 |
+
half = max_tokens // 2
|
| 663 |
+
first_part = state.llm_tokenizer.decode(tokens[:half])
|
| 664 |
+
last_part = state.llm_tokenizer.decode(tokens[-half:])
|
| 665 |
+
patient_text = first_part + " ... " + last_part
|
| 666 |
+
else:
|
| 667 |
+
patient_text = long_note
|
| 668 |
+
|
| 669 |
+
# Build prompt
|
| 670 |
+
messages = [
|
| 671 |
+
{'role': 'system', 'content': 'Reasoning: high'},
|
| 672 |
+
{'role': 'user', 'content': """You are an experienced clinical oncology history summarization bot.
|
| 673 |
+
Your job is to construct a summary of the cancer history for a patient based on an excerpt of the patient's electronic health record. The text in the excerpt is provided in chronological order. Each paragraph in the excerpt represents a summary of a clinical document written on the date indicated in the paragraph.
|
| 674 |
+
Document the patient's most recent age; sex; cancer type/primary site (eg breast cancer, lung cancer, etc); histology (eg adenocarcinoma, squamous carcinoma, etc); current extent (localized, advanced, metastatic, etc); biomarkers (genomic results, protein expression, etc); and treatment history (surgery, radiation, chemotherapy/targeted therapy/immunotherapy, etc, including start and stop dates and best response if known).
|
| 675 |
+
Do not consider localized basal cell or squamous carcinomas of the skin, or colon polyps, to be cancers for your purposes.
|
| 676 |
+
Do not include the patient's name, but do include relevant dates whenever documented, including dates of diagnosis and start/stop dates of each treatment.
|
| 677 |
+
If a patient has a history of more than one cancer, document the cancers one at a time.
|
| 678 |
+
CRITICAL: Format your response as free text ONLY. Do NOT output markdown, Unicode, or tables.
|
| 679 |
+
Also document any history of conditions that might meet "boilerplate" exclusion criteria, including uncontrolled brain metastases, lack of measurable disease, congestive heart failure, pneumonitis, renal dysfunction, liver dysfunction, and HIV or hepatitis infection. For each of these, present the evidence from the history that the patient has a history of such a condition, including dates.
|
| 680 |
+
Clearly separate the "boilerplate" section by labeling it "Boilerplate: " before describing any such conditions.
|
| 681 |
+
Here is an example of the desired output format:
|
| 682 |
+
|
| 683 |
+
Age: 70
|
| 684 |
+
Sex: Male
|
| 685 |
+
Cancer type: Lung cancer
|
| 686 |
+
Histology: Adenocarcinoma
|
| 687 |
+
Current extent: Metastatic
|
| 688 |
+
Biomarkers: PD-L1 75%, KRAS G12C mutant
|
| 689 |
+
Treatment history:
|
| 690 |
+
# 1/5/2020-2/5/2021: carboplatin/pemetrexed/pembrolizumab
|
| 691 |
+
# 1/2021: Palliative radiation to progressive spinal metastases
|
| 692 |
+
# 3/2021-present: docetaxel
|
| 693 |
+
Boilerplate:
|
| 694 |
+
No evidence of common boilerplate exclusion criteria
|
| 695 |
+
""" + "The excerpt for you to summarize is:\n" + patient_text + """\nNow, write your summary. Do not add preceding text before the abstraction, and do not add notes or commentary afterwards. This will not be used for clinical care, so do not write any disclaimers or cautionary notes."""}
|
| 696 |
+
]
|
| 697 |
+
|
| 698 |
+
gr.Info("Summarizing patient history with LLM...")
|
| 699 |
+
|
| 700 |
+
# Check if using vLLM or HuggingFace
|
| 701 |
+
if hasattr(state.llm_model, 'generate') and hasattr(state.llm_model, 'get_tokenizer'):
|
| 702 |
+
# vLLM
|
| 703 |
+
from vllm import SamplingParams
|
| 704 |
+
|
| 705 |
+
prompt = state.llm_tokenizer.apply_chat_template(
|
| 706 |
+
conversation=messages,
|
| 707 |
+
add_generation_prompt=True,
|
| 708 |
+
tokenize=False
|
| 709 |
+
)
|
| 710 |
+
|
| 711 |
+
response = state.llm_model.generate(
|
| 712 |
+
[prompt],
|
| 713 |
+
SamplingParams(
|
| 714 |
+
temperature=0.0,
|
| 715 |
+
top_k=1,
|
| 716 |
+
max_tokens=7500,
|
| 717 |
+
repetition_penalty=1.2
|
| 718 |
+
)
|
| 719 |
+
)
|
| 720 |
+
|
| 721 |
+
output = response[0].outputs[0].text
|
| 722 |
+
else:
|
| 723 |
+
# HuggingFace
|
| 724 |
+
input_ids = state.llm_tokenizer.apply_chat_template(
|
| 725 |
+
conversation=messages,
|
| 726 |
+
add_generation_prompt=True,
|
| 727 |
+
return_tensors="pt"
|
| 728 |
+
).to(state.device)
|
| 729 |
+
|
| 730 |
+
with torch.no_grad():
|
| 731 |
+
outputs = state.llm_model.generate(
|
| 732 |
+
input_ids,
|
| 733 |
+
max_new_tokens=7500,
|
| 734 |
+
temperature=0.00,
|
| 735 |
+
do_sample=True,
|
| 736 |
+
repetition_penalty=1.2
|
| 737 |
+
)
|
| 738 |
+
|
| 739 |
+
output = state.llm_tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 740 |
+
|
| 741 |
+
|
| 742 |
+
if REASONING_MARKER in output:
|
| 743 |
+
cleaned = output.split(REASONING_MARKER, 1)[-1]
|
| 744 |
+
else:
|
| 745 |
+
cleaned = output
|
| 746 |
+
|
| 747 |
+
|
| 748 |
+
# 2. Handle Boilerplate Marker (Updated to split by line)
|
| 749 |
+
lines = cleaned.splitlines(keepends=True)
|
| 750 |
+
|
| 751 |
+
marker_line_index = -1
|
| 752 |
+
|
| 753 |
+
# Find which line contains the marker
|
| 754 |
+
for i, line in enumerate(lines):
|
| 755 |
+
if BOILERPLATE_MARKER in line:
|
| 756 |
+
marker_line_index = i
|
| 757 |
+
break
|
| 758 |
+
|
| 759 |
+
if marker_line_index != -1:
|
| 760 |
+
# Join lines before the marker line
|
| 761 |
+
pre = "".join(lines[:marker_line_index])
|
| 762 |
+
# Join lines after the marker line (skipping the marker line itself)
|
| 763 |
+
post = "".join(lines[marker_line_index + 1:])
|
| 764 |
+
|
| 765 |
+
summary_text = pre
|
| 766 |
+
boilerplate_text = post
|
| 767 |
+
else:
|
| 768 |
+
# If marker not found, the full text goes to both
|
| 769 |
+
summary_text = cleaned
|
| 770 |
+
boilerplate_text = cleaned
|
| 771 |
+
|
| 772 |
+
|
| 773 |
+
return summary_text, boilerplate_text
|
| 774 |
+
|
| 775 |
+
except Exception as e:
|
| 776 |
+
return f"✗ Error summarizing: {str(e)}", ""
|
| 777 |
+
|
| 778 |
+
# ============================================================================
|
| 779 |
+
# TRIAL SPACE EXTRACTION
|
| 780 |
+
# ============================================================================
|
| 781 |
+
|
| 782 |
+
def extract_trial_spaces(trial_text: str) -> str:
|
| 783 |
+
"""Extract trial spaces and boilerplate criteria from trial text using LLM."""
|
| 784 |
+
try:
|
| 785 |
+
if state.llm_model is None:
|
| 786 |
+
return "✗ Please load the LLM model first!"
|
| 787 |
+
|
| 788 |
+
if not trial_text or len(trial_text.strip()) == 0:
|
| 789 |
+
return "✗ No trial text provided"
|
| 790 |
+
|
| 791 |
+
# Build prompt messages
|
| 792 |
+
messages = [
|
| 793 |
+
{"role": "system", "content": "Reasoning: high."},
|
| 794 |
+
{
|
| 795 |
+
"role": "user",
|
| 796 |
+
"content": (
|
| 797 |
+
TRIAL_SPACE_PROMPT_HEADER
|
| 798 |
+
+ "Here is a clinical trial document:\n"
|
| 799 |
+
+ str(trial_text)
|
| 800 |
+
+ "\n"
|
| 801 |
+
+ TRIAL_SPACE_PROMPT_SUFFIX
|
| 802 |
+
),
|
| 803 |
+
},
|
| 804 |
+
]
|
| 805 |
+
|
| 806 |
+
gr.Info("Extracting trial spaces with LLM...")
|
| 807 |
+
|
| 808 |
+
# Check if using vLLM or HuggingFace
|
| 809 |
+
if hasattr(state.llm_model, 'generate') and hasattr(state.llm_model, 'get_tokenizer'):
|
| 810 |
+
# vLLM
|
| 811 |
+
from vllm import SamplingParams
|
| 812 |
+
|
| 813 |
+
prompt = state.llm_tokenizer.apply_chat_template(
|
| 814 |
+
conversation=messages,
|
| 815 |
+
add_generation_prompt=True,
|
| 816 |
+
tokenize=False
|
| 817 |
+
)
|
| 818 |
+
|
| 819 |
+
response = state.llm_model.generate(
|
| 820 |
+
[prompt],
|
| 821 |
+
SamplingParams(
|
| 822 |
+
temperature=0.0,
|
| 823 |
+
top_k=1,
|
| 824 |
+
max_tokens=7500,
|
| 825 |
+
repetition_penalty=1.3
|
| 826 |
+
)
|
| 827 |
+
)
|
| 828 |
+
|
| 829 |
+
output = response[0].outputs[0].text
|
| 830 |
+
else:
|
| 831 |
+
# HuggingFace
|
| 832 |
+
input_ids = state.llm_tokenizer.apply_chat_template(
|
| 833 |
+
conversation=messages,
|
| 834 |
+
add_generation_prompt=True,
|
| 835 |
+
return_tensors="pt"
|
| 836 |
+
).to(state.device)
|
| 837 |
+
|
| 838 |
+
with torch.no_grad():
|
| 839 |
+
outputs = state.llm_model.generate(
|
| 840 |
+
input_ids,
|
| 841 |
+
max_new_tokens=7500,
|
| 842 |
+
temperature=0.0,
|
| 843 |
+
do_sample=False,
|
| 844 |
+
repetition_penalty=1.3
|
| 845 |
+
)
|
| 846 |
+
|
| 847 |
+
output = state.llm_tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 848 |
+
# Extract just the assistant response
|
| 849 |
+
if REASONING_MARKER in output:
|
| 850 |
+
output = output.split(REASONING_MARKER)[-1]
|
| 851 |
+
|
| 852 |
+
# Clean up reasoning markers if present
|
| 853 |
+
if REASONING_MARKER in output:
|
| 854 |
+
output = output.split(REASONING_MARKER, 1)[-1]
|
| 855 |
+
|
| 856 |
+
output = output.strip()
|
| 857 |
+
|
| 858 |
+
return output
|
| 859 |
+
|
| 860 |
+
except Exception as e:
|
| 861 |
+
return f"✗ Error extracting trial spaces: {str(e)}"
|
| 862 |
+
|
| 863 |
+
# ============================================================================
|
| 864 |
+
# TRIAL MATCHING
|
| 865 |
+
# ============================================================================
|
| 866 |
+
|
| 867 |
+
def match_trials(patient_summary: str, patient_boilerplate: str, top_k: int = 20) -> pd.DataFrame:
|
| 868 |
+
"""Match patient to trials and run checkers."""
|
| 869 |
+
try:
|
| 870 |
+
if state.embedder_model is None:
|
| 871 |
+
raise ValueError("Embedder model not loaded")
|
| 872 |
+
if state.trial_embeddings is None:
|
| 873 |
+
raise ValueError("Trial spaces not loaded")
|
| 874 |
+
if state.trial_checker_model is None:
|
| 875 |
+
raise ValueError("Trial checker model not loaded")
|
| 876 |
+
if state.boilerplate_checker_model is None:
|
| 877 |
+
raise ValueError("Boilerplate checker model not loaded")
|
| 878 |
+
|
| 879 |
+
# Embed patient summary
|
| 880 |
+
prefix = (
|
| 881 |
+
"Instruct: Given a cancer patient summary, retrieve clinical trial options "
|
| 882 |
+
"that are reasonable for that patient; or, given a clinical trial option, "
|
| 883 |
+
"retrieve cancer patients who are reasonable candidates for that trial. "
|
| 884 |
+
)
|
| 885 |
+
|
| 886 |
+
patient_text = truncate_text(patient_summary, state.embedder_tokenizer, max_tokens=MAX_EMBEDDER_SEQ_LEN)
|
| 887 |
+
patient_text_with_prefix = prefix + patient_text
|
| 888 |
+
|
| 889 |
+
gr.Info("Embedding patient summary...")
|
| 890 |
+
|
| 891 |
+
with torch.no_grad():
|
| 892 |
+
patient_emb = state.embedder_model.encode(
|
| 893 |
+
[patient_text_with_prefix],
|
| 894 |
+
convert_to_tensor=True,
|
| 895 |
+
normalize_embeddings=True,
|
| 896 |
+
prompt='query'
|
| 897 |
+
)
|
| 898 |
+
|
| 899 |
+
# Calculate similarities
|
| 900 |
+
patient_emb_np = patient_emb.cpu().numpy()
|
| 901 |
+
similarities = np.dot(state.trial_embeddings, patient_emb_np.T).squeeze()
|
| 902 |
+
|
| 903 |
+
# Get top-k
|
| 904 |
+
top_indices = np.argsort(similarities)[::-1][:top_k]
|
| 905 |
+
|
| 906 |
+
# Get top trials
|
| 907 |
+
top_trials = state.trial_spaces_df.iloc[top_indices].copy()
|
| 908 |
+
top_trials['similarity_score'] = similarities[top_indices]
|
| 909 |
+
|
| 910 |
+
gr.Info(f"Running eligibility checks on top {len(top_trials)} trials...")
|
| 911 |
+
|
| 912 |
+
# Run trial checker
|
| 913 |
+
trial_check_inputs = [
|
| 914 |
+
f"{row['this_space']}\nNow here is the patient summary:{patient_summary}"
|
| 915 |
+
for _, row in top_trials.iterrows()
|
| 916 |
+
]
|
| 917 |
+
|
| 918 |
+
trial_check_encodings = state.trial_checker_tokenizer(
|
| 919 |
+
trial_check_inputs,
|
| 920 |
+
truncation=True,
|
| 921 |
+
max_length=MAX_TRIAL_CHECKER_LENGTH,
|
| 922 |
+
padding=True,
|
| 923 |
+
return_tensors='pt'
|
| 924 |
+
).to(state.device)
|
| 925 |
+
|
| 926 |
+
with torch.no_grad():
|
| 927 |
+
trial_check_outputs = state.trial_checker_model(**trial_check_encodings)
|
| 928 |
+
trial_probs = torch.softmax(trial_check_outputs.logits, dim=1)[:, 1].cpu().numpy()
|
| 929 |
+
|
| 930 |
+
top_trials['eligibility_probability'] = trial_probs
|
| 931 |
+
|
| 932 |
+
# Run boilerplate checker
|
| 933 |
+
boilerplate_check_inputs = [
|
| 934 |
+
f"Patient history: {patient_boilerplate}\nTrial exclusions:{row['trial_boilerplate_text']}"
|
| 935 |
+
for _, row in top_trials.iterrows()
|
| 936 |
+
]
|
| 937 |
+
|
| 938 |
+
boilerplate_check_encodings = state.boilerplate_checker_tokenizer(
|
| 939 |
+
boilerplate_check_inputs,
|
| 940 |
+
truncation=True,
|
| 941 |
+
max_length=MAX_BOILERPLATE_CHECKER_LENGTH,
|
| 942 |
+
padding=True,
|
| 943 |
+
return_tensors='pt'
|
| 944 |
+
).to(state.device)
|
| 945 |
+
|
| 946 |
+
with torch.no_grad():
|
| 947 |
+
boilerplate_check_outputs = state.boilerplate_checker_model(**boilerplate_check_encodings)
|
| 948 |
+
boilerplate_probs = torch.softmax(boilerplate_check_outputs.logits, dim=1)[:, 1].cpu().numpy()
|
| 949 |
+
|
| 950 |
+
top_trials['exclusion_probability'] = boilerplate_probs
|
| 951 |
+
|
| 952 |
+
# Sort by eligibility probability
|
| 953 |
+
top_trials = top_trials.sort_values('eligibility_probability', ascending=False)
|
| 954 |
+
|
| 955 |
+
# Apply visual formatting for the display table
|
| 956 |
+
top_trials['eligibility_display'] = top_trials['eligibility_probability'].apply(
|
| 957 |
+
lambda x: format_probability_visual(x, is_exclusion=False)
|
| 958 |
+
)
|
| 959 |
+
top_trials['exclusion_display'] = top_trials['exclusion_probability'].apply(
|
| 960 |
+
lambda x: format_probability_visual(x, is_exclusion=True)
|
| 961 |
+
)
|
| 962 |
+
top_trials['similarity_display'] = top_trials['similarity_score'].apply(
|
| 963 |
+
lambda x: f"{x:.3f}"
|
| 964 |
+
)
|
| 965 |
+
|
| 966 |
+
# Select columns for display - use the Display versions for the UI
|
| 967 |
+
display_cols = [
|
| 968 |
+
'nct_id',
|
| 969 |
+
'eligibility_display',
|
| 970 |
+
'exclusion_display',
|
| 971 |
+
'similarity_display',
|
| 972 |
+
'this_space'
|
| 973 |
+
]
|
| 974 |
+
|
| 975 |
+
result_df = top_trials[display_cols].reset_index(drop=True)
|
| 976 |
+
|
| 977 |
+
# Rename columns for better UI reading
|
| 978 |
+
result_df.columns = [
|
| 979 |
+
'NCT ID',
|
| 980 |
+
'Eligibility',
|
| 981 |
+
'Exclusion',
|
| 982 |
+
'Similarity',
|
| 983 |
+
'Criteria Space'
|
| 984 |
+
]
|
| 985 |
+
|
| 986 |
+
return result_df
|
| 987 |
+
|
| 988 |
+
except Exception as e:
|
| 989 |
+
gr.Error(f"Error matching trials: {str(e)}")
|
| 990 |
+
return pd.DataFrame()
|
| 991 |
+
|
| 992 |
+
def get_trial_details(df: pd.DataFrame, evt: gr.SelectData) -> str:
|
| 993 |
+
"""Get full trial details when user clicks on a row."""
|
| 994 |
+
try:
|
| 995 |
+
if df is None or len(df) == 0:
|
| 996 |
+
return "No trial selected"
|
| 997 |
+
|
| 998 |
+
row_idx = evt.index[0]
|
| 999 |
+
# Map renamed columns back to logic
|
| 1000 |
+
nct_id = df.iloc[row_idx]['NCT ID']
|
| 1001 |
+
this_space = df.iloc[row_idx]['Criteria Space']
|
| 1002 |
+
|
| 1003 |
+
# Find the specific trial space in original dataframe
|
| 1004 |
+
# Match both NCT ID and the exact trial space text
|
| 1005 |
+
matching_rows = state.trial_spaces_df[
|
| 1006 |
+
(state.trial_spaces_df['nct_id'] == nct_id) &
|
| 1007 |
+
(state.trial_spaces_df['this_space'] == this_space)
|
| 1008 |
+
]
|
| 1009 |
+
|
| 1010 |
+
if len(matching_rows) == 0:
|
| 1011 |
+
return f"Error: Could not find matching trial space for {nct_id}"
|
| 1012 |
+
|
| 1013 |
+
trial_row = matching_rows.iloc[0]
|
| 1014 |
+
|
| 1015 |
+
# Create clinicaltrials.gov link
|
| 1016 |
+
ct_gov_link = f"https://clinicaltrials.gov/study/{nct_id}"
|
| 1017 |
+
|
| 1018 |
+
details = f"""
|
| 1019 |
+
# Trial Details: {nct_id}
|
| 1020 |
+
|
| 1021 |
+
**🔗 [View on ClinicalTrials.gov]({ct_gov_link})**
|
| 1022 |
+
|
| 1023 |
+
---
|
| 1024 |
+
|
| 1025 |
+
## Eligibility Criteria Summary (Selected Space)
|
| 1026 |
+
{trial_row['this_space']}
|
| 1027 |
+
|
| 1028 |
+
## Full Trial Text
|
| 1029 |
+
{trial_row['trial_text']}
|
| 1030 |
+
|
| 1031 |
+
## Boilerplate Exclusions
|
| 1032 |
+
{trial_row['trial_boilerplate_text']}
|
| 1033 |
+
"""
|
| 1034 |
+
return details
|
| 1035 |
+
|
| 1036 |
+
except Exception as e:
|
| 1037 |
+
return f"Error retrieving trial details: {str(e)}"
|
| 1038 |
+
|
| 1039 |
+
# ============================================================================
|
| 1040 |
+
# GRADIO INTERFACE
|
| 1041 |
+
# ============================================================================
|
| 1042 |
+
|
| 1043 |
+
def create_interface():
|
| 1044 |
+
|
| 1045 |
+
# Custom theme and CSS for a cleaner, modern look
|
| 1046 |
+
theme = gr.themes.Soft(
|
| 1047 |
+
primary_hue="blue",
|
| 1048 |
+
secondary_hue="slate",
|
| 1049 |
+
).set(
|
| 1050 |
+
body_background_fill="*neutral_50",
|
| 1051 |
+
block_background_fill="white",
|
| 1052 |
+
block_border_width="1px",
|
| 1053 |
+
block_label_background_fill="*primary_50",
|
| 1054 |
+
)
|
| 1055 |
+
|
| 1056 |
+
custom_css = """
|
| 1057 |
+
.gradio-container { font-family: 'Inter', Arial, sans-serif !important; }
|
| 1058 |
+
.model-status { min-height: 80px !important; font-size: 0.9em; }
|
| 1059 |
+
.status-box { background: #f9fafb; border: 1px solid #e5e7eb; border-radius: 8px; padding: 10px; }
|
| 1060 |
+
h1 { color: #1e3a8a; }
|
| 1061 |
+
"""
|
| 1062 |
+
|
| 1063 |
+
with gr.Blocks(title="MatchMiner-AI", theme=theme, css=custom_css) as demo:
|
| 1064 |
+
|
| 1065 |
+
with gr.Row(variant="panel"):
|
| 1066 |
+
with gr.Column(scale=4):
|
| 1067 |
+
gr.Markdown("""
|
| 1068 |
+
# 🏥 MatchMiner-AI
|
| 1069 |
+
**Clinical Trial Matching Pipeline**
|
| 1070 |
+
""")
|
| 1071 |
+
with gr.Column(scale=1):
|
| 1072 |
+
pass
|
| 1073 |
+
|
| 1074 |
+
with gr.Tabs():
|
| 1075 |
+
# ============= TAB 1: PATIENT INPUT =============
|
| 1076 |
+
with gr.Tab("1️⃣ Patient Input"):
|
| 1077 |
+
|
| 1078 |
+
with gr.Tab("Option A: Upload Clinical Notes"):
|
| 1079 |
+
with gr.Group():
|
| 1080 |
+
gr.Markdown("### 📄 Upload Records")
|
| 1081 |
+
notes_file = gr.File(
|
| 1082 |
+
label="Upload Patient Notes (CSV or Excel)",
|
| 1083 |
+
file_types=[".csv", ".xlsx", ".xls"]
|
| 1084 |
+
)
|
| 1085 |
+
|
| 1086 |
+
# Hidden advanced option
|
| 1087 |
+
with gr.Accordion("Advanced Options", open=False):
|
| 1088 |
+
prob_threshold = gr.Slider(
|
| 1089 |
+
minimum=0.0, maximum=1.0, value=0.5, step=0.05,
|
| 1090 |
+
label="Tagger Threshold",
|
| 1091 |
+
info="Probability threshold for including excerpts"
|
| 1092 |
+
)
|
| 1093 |
+
|
| 1094 |
+
process_notes_btn = gr.Button("Process Notes", variant="primary")
|
| 1095 |
+
|
| 1096 |
+
with gr.Row():
|
| 1097 |
+
with gr.Column():
|
| 1098 |
+
notes_status = gr.Textbox(label="Processing Status", interactive=False)
|
| 1099 |
+
with gr.Column():
|
| 1100 |
+
pass # Removed summary button from here
|
| 1101 |
+
|
| 1102 |
+
long_note_output = gr.Textbox(
|
| 1103 |
+
label="Extracted Patient History (Long Note)",
|
| 1104 |
+
lines=10,
|
| 1105 |
+
interactive=False,
|
| 1106 |
+
show_copy_button=True
|
| 1107 |
+
)
|
| 1108 |
+
|
| 1109 |
+
process_notes_btn.click(
|
| 1110 |
+
fn=process_patient_notes,
|
| 1111 |
+
inputs=[notes_file, prob_threshold],
|
| 1112 |
+
outputs=[notes_status, long_note_output]
|
| 1113 |
+
)
|
| 1114 |
+
|
| 1115 |
+
with gr.Tab("Option B: Enter Patient Summary"):
|
| 1116 |
+
gr.Markdown("Enter a patient summary directly (skip note processing)")
|
| 1117 |
+
|
| 1118 |
+
# Shared summary fields in a visual group
|
| 1119 |
+
with gr.Group():
|
| 1120 |
+
gr.Markdown("### 📝 Patient Summary & Boilerplate")
|
| 1121 |
+
|
| 1122 |
+
# FIX: Moved Summarize button here, right above the summary box
|
| 1123 |
+
summarize_btn = gr.Button("Summarize Patient History (from Long Note)", variant="secondary")
|
| 1124 |
+
|
| 1125 |
+
with gr.Row():
|
| 1126 |
+
patient_summary = gr.Textbox(
|
| 1127 |
+
label="Patient Summary",
|
| 1128 |
+
lines=12,
|
| 1129 |
+
placeholder="Enter or generate patient summary here...",
|
| 1130 |
+
info="Age, sex, Cancer type, histology, extent, biomarkers, treatment history",
|
| 1131 |
+
show_copy_button=True
|
| 1132 |
+
)
|
| 1133 |
+
patient_boilerplate = gr.Textbox(
|
| 1134 |
+
label="Patient Boilerplate Text",
|
| 1135 |
+
lines=12,
|
| 1136 |
+
placeholder="Mentions of exclusion criteria (brain mets, etc.)",
|
| 1137 |
+
info="Evidence of potential boilerplate exclusions",
|
| 1138 |
+
show_copy_button=True
|
| 1139 |
+
)
|
| 1140 |
+
|
| 1141 |
+
# Wire up summarization to output to BOTH textboxes
|
| 1142 |
+
summarize_btn.click(
|
| 1143 |
+
fn=summarize_patient_history,
|
| 1144 |
+
inputs=[long_note_output],
|
| 1145 |
+
outputs=[patient_summary, patient_boilerplate]
|
| 1146 |
+
)
|
| 1147 |
+
|
| 1148 |
+
# ============= TAB 2: TRIAL DATABASE =============
|
| 1149 |
+
with gr.Tab("2️⃣ Trial Database"):
|
| 1150 |
+
with gr.Row():
|
| 1151 |
+
with gr.Column(scale=1):
|
| 1152 |
+
gr.Markdown("""
|
| 1153 |
+
### 🗃️ Upload Trial Space Database
|
| 1154 |
+
|
| 1155 |
+
Upload a CSV or Excel file containing trial information.
|
| 1156 |
+
|
| 1157 |
+
**Required Columns:** `nct_id`, `this_space`, `trial_text`, `trial_boilerplate_text`
|
| 1158 |
+
|
| 1159 |
+
**💡 TIP:** For faster loading, use pre-embedded trials defined in `config.py`.
|
| 1160 |
+
""")
|
| 1161 |
+
|
| 1162 |
+
trial_file = gr.File(
|
| 1163 |
+
label="Upload Trial Database",
|
| 1164 |
+
file_types=[".csv", ".xlsx", ".xls"]
|
| 1165 |
+
)
|
| 1166 |
+
|
| 1167 |
+
trial_upload_btn = gr.Button("Load and Embed Trials", variant="primary")
|
| 1168 |
+
|
| 1169 |
+
trial_status = gr.Textbox(
|
| 1170 |
+
label="Status",
|
| 1171 |
+
interactive=False,
|
| 1172 |
+
value=state.auto_load_status.get("trials", "")
|
| 1173 |
+
)
|
| 1174 |
+
|
| 1175 |
+
with gr.Column(scale=2):
|
| 1176 |
+
gr.Markdown("### Preview")
|
| 1177 |
+
trial_preview = gr.Dataframe(
|
| 1178 |
+
label="Preview (first 10 trials)",
|
| 1179 |
+
interactive=False,
|
| 1180 |
+
value=state.trial_preview_df,
|
| 1181 |
+
wrap=True
|
| 1182 |
+
)
|
| 1183 |
+
|
| 1184 |
+
trial_upload_btn.click(
|
| 1185 |
+
fn=load_and_embed_trials,
|
| 1186 |
+
inputs=[trial_file],
|
| 1187 |
+
outputs=[trial_status, trial_preview]
|
| 1188 |
+
)
|
| 1189 |
+
|
| 1190 |
+
# ============= TAB 3: TRIAL MATCHING =============
|
| 1191 |
+
with gr.Tab("3️⃣ Trial Matching"):
|
| 1192 |
+
|
| 1193 |
+
with gr.Row():
|
| 1194 |
+
with gr.Column(scale=1):
|
| 1195 |
+
match_btn = gr.Button("🔍 Find Matching Trials", variant="primary", size="lg")
|
| 1196 |
+
with gr.Column(scale=3):
|
| 1197 |
+
# Hidden advanced option for sliders
|
| 1198 |
+
with gr.Accordion("Search Settings", open=False):
|
| 1199 |
+
top_k_slider = gr.Slider(
|
| 1200 |
+
minimum=5, maximum=50, value=20, step=5,
|
| 1201 |
+
label="Number of Top Trials to Check",
|
| 1202 |
+
info="How many top-ranked trials to run eligibility checks on"
|
| 1203 |
+
)
|
| 1204 |
+
|
| 1205 |
+
gr.Markdown("### 📊 Results")
|
| 1206 |
+
|
| 1207 |
+
with gr.Row():
|
| 1208 |
+
with gr.Column(scale=7):
|
| 1209 |
+
results_df = gr.Dataframe(
|
| 1210 |
+
label="Matched Trials",
|
| 1211 |
+
interactive=False,
|
| 1212 |
+
wrap=True,
|
| 1213 |
+
datatype=["str", "markdown", "markdown", "str", "str"], # Markdown enables colored text/emojis
|
| 1214 |
+
column_widths=["15%", "15%", "15%", "10%", "45%"]
|
| 1215 |
+
)
|
| 1216 |
+
|
| 1217 |
+
with gr.Column(scale=5):
|
| 1218 |
+
trial_details = gr.Markdown(
|
| 1219 |
+
label="Trial Details",
|
| 1220 |
+
value="<div style='text-align: center; padding: 50px; color: #666;'>👈 Click on a trial row to see full details here</div>"
|
| 1221 |
+
)
|
| 1222 |
+
|
| 1223 |
+
# Wire up matching
|
| 1224 |
+
match_btn.click(
|
| 1225 |
+
fn=match_trials,
|
| 1226 |
+
inputs=[patient_summary, patient_boilerplate, top_k_slider],
|
| 1227 |
+
outputs=[results_df]
|
| 1228 |
+
)
|
| 1229 |
+
|
| 1230 |
+
results_df.select(
|
| 1231 |
+
fn=get_trial_details,
|
| 1232 |
+
inputs=[results_df],
|
| 1233 |
+
outputs=[trial_details]
|
| 1234 |
+
)
|
| 1235 |
+
|
| 1236 |
+
# ============= TAB 4: MODEL CONFIGURATION =============
|
| 1237 |
+
with gr.Tab("4️⃣ Model Configuration"):
|
| 1238 |
+
gr.Markdown("### 🧠 Model Management")
|
| 1239 |
+
|
| 1240 |
+
status_msg = """
|
| 1241 |
+
**Config file detected** - Models will auto-load on startup.
|
| 1242 |
+
""" if HAS_CONFIG else """
|
| 1243 |
+
**No config file found** - Please load models manually below.
|
| 1244 |
+
"""
|
| 1245 |
+
gr.Info(status_msg)
|
| 1246 |
+
|
| 1247 |
+
with gr.Group():
|
| 1248 |
+
with gr.Row():
|
| 1249 |
+
with gr.Column():
|
| 1250 |
+
tagger_input = gr.Textbox(label="TinyBERT Tagger Model", placeholder="prajjwal1/bert-tiny")
|
| 1251 |
+
tagger_btn = gr.Button("Load Tagger")
|
| 1252 |
+
tagger_status = gr.Textbox(label="Status", interactive=False, value=state.auto_load_status.get("tagger", ""), elem_classes=["model-status"])
|
| 1253 |
+
|
| 1254 |
+
with gr.Column():
|
| 1255 |
+
embedder_input = gr.Textbox(label="Trial Space Embedder", placeholder="Qwen/Qwen3-Embedding-0.6B")
|
| 1256 |
+
embedder_btn = gr.Button("Load Embedder")
|
| 1257 |
+
embedder_status = gr.Textbox(label="Status", interactive=False, value=state.auto_load_status.get("embedder", ""), elem_classes=["model-status"])
|
| 1258 |
+
embedder_warning = gr.Textbox(visible=False)
|
| 1259 |
+
|
| 1260 |
+
with gr.Group():
|
| 1261 |
+
with gr.Row():
|
| 1262 |
+
with gr.Column():
|
| 1263 |
+
llm_input = gr.Textbox(label="LLM Model (Summarization)", placeholder="openai/gpt-oss-120b")
|
| 1264 |
+
llm_btn = gr.Button("Load LLM")
|
| 1265 |
+
llm_status = gr.Textbox(label="Status", interactive=False, value=state.auto_load_status.get("llm", ""), elem_classes=["model-status"])
|
| 1266 |
+
|
| 1267 |
+
with gr.Column():
|
| 1268 |
+
trial_checker_input = gr.Textbox(label="Trial Checker Model", placeholder="answerdotai/ModernBERT-large")
|
| 1269 |
+
trial_checker_btn = gr.Button("Load Trial Checker")
|
| 1270 |
+
trial_checker_status = gr.Textbox(label="Status", interactive=False, value=state.auto_load_status.get("trial_checker", ""), elem_classes=["model-status"])
|
| 1271 |
+
|
| 1272 |
+
with gr.Row():
|
| 1273 |
+
with gr.Column(scale=1):
|
| 1274 |
+
boilerplate_checker_input = gr.Textbox(label="Boilerplate Checker Model", placeholder="answerdotai/ModernBERT-large")
|
| 1275 |
+
boilerplate_checker_btn = gr.Button("Load Boilerplate Checker")
|
| 1276 |
+
boilerplate_checker_status = gr.Textbox(label="Status", interactive=False, value=state.auto_load_status.get("boilerplate_checker", ""), elem_classes=["model-status"])
|
| 1277 |
+
with gr.Column(scale=1):
|
| 1278 |
+
pass
|
| 1279 |
+
|
| 1280 |
+
# Wire up model loading
|
| 1281 |
+
tagger_btn.click(fn=load_tagger_model, inputs=[tagger_input], outputs=[tagger_status, gr.Textbox(visible=False)])
|
| 1282 |
+
embedder_btn.click(fn=load_embedder_model, inputs=[embedder_input], outputs=[embedder_status, gr.Textbox(visible=False), embedder_warning])
|
| 1283 |
+
llm_btn.click(fn=load_llm_model, inputs=[llm_input], outputs=[llm_status, gr.Textbox(visible=False)])
|
| 1284 |
+
trial_checker_btn.click(fn=load_trial_checker, inputs=[trial_checker_input], outputs=[trial_checker_status, gr.Textbox(visible=False)])
|
| 1285 |
+
boilerplate_checker_btn.click(fn=load_boilerplate_checker, inputs=[boilerplate_checker_input], outputs=[boilerplate_checker_status, gr.Textbox(visible=False)])
|
| 1286 |
+
|
| 1287 |
+
# ============= TAB 5: TRIAL SPACE EXTRACTION =============
|
| 1288 |
+
with gr.Tab("5️⃣ Trial Space Extraction"):
|
| 1289 |
+
gr.Markdown("""
|
| 1290 |
+
### 🧬 Extract Trial Spaces
|
| 1291 |
+
Paste clinical trial text (title + summary + eligibility) to extract structured spaces.
|
| 1292 |
+
""")
|
| 1293 |
+
|
| 1294 |
+
with gr.Row():
|
| 1295 |
+
with gr.Column():
|
| 1296 |
+
trial_text_input = gr.Textbox(
|
| 1297 |
+
label="Clinical Trial Text",
|
| 1298 |
+
placeholder="Paste text from ClinicalTrials.gov...",
|
| 1299 |
+
lines=15,
|
| 1300 |
+
)
|
| 1301 |
+
extract_btn = gr.Button("Extract Trial Spaces", variant="primary")
|
| 1302 |
+
|
| 1303 |
+
with gr.Column():
|
| 1304 |
+
trial_spaces_output = gr.Textbox(
|
| 1305 |
+
label="Extracted Results",
|
| 1306 |
+
lines=15,
|
| 1307 |
+
interactive=False,
|
| 1308 |
+
show_copy_button=True
|
| 1309 |
+
)
|
| 1310 |
+
|
| 1311 |
+
extract_btn.click(
|
| 1312 |
+
fn=extract_trial_spaces,
|
| 1313 |
+
inputs=[trial_text_input],
|
| 1314 |
+
outputs=[trial_spaces_output]
|
| 1315 |
+
)
|
| 1316 |
+
|
| 1317 |
+
return demo
|
| 1318 |
+
|
| 1319 |
+
# ============================================================================
|
| 1320 |
+
# MAIN
|
| 1321 |
+
# ============================================================================
|
| 1322 |
+
|
| 1323 |
+
if __name__ == "__main__":
|
| 1324 |
+
print(f"Device: {state.device}")
|
| 1325 |
+
print(f"GPU Available: {torch.cuda.is_available()}")
|
| 1326 |
+
if torch.cuda.is_available():
|
| 1327 |
+
print(f"GPU Count: {torch.cuda.device_count()}")
|
| 1328 |
+
|
| 1329 |
+
# Auto-load models from config if available
|
| 1330 |
+
if HAS_CONFIG:
|
| 1331 |
+
auto_load_models_from_config()
|
| 1332 |
+
|
| 1333 |
+
# Auto-load trials after embedder is ready
|
| 1334 |
+
if state.embedder_model is not None or (hasattr(config, 'PREEMBEDDED_TRIALS') and config.PREEMBEDDED_TRIALS):
|
| 1335 |
+
auto_load_trials_from_config()
|
| 1336 |
+
|
| 1337 |
+
demo = create_interface()
|
| 1338 |
+
demo.launch(
|
| 1339 |
+
server_name="0.0.0.0",
|
| 1340 |
+
server_port=7860,
|
| 1341 |
+
share=False
|
| 1342 |
+
)
|
config.py
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Configuration for Clinical Trial Matching Pipeline
|
| 2 |
+
#
|
| 3 |
+
# Edit the values below to set your default models and trial database.
|
| 4 |
+
# Models will auto-load on application startup.
|
| 5 |
+
|
| 6 |
+
# ============================================================================
|
| 7 |
+
# MODEL PATHS - Set your default models here
|
| 8 |
+
# ============================================================================
|
| 9 |
+
|
| 10 |
+
# Set to None to skip auto-loading, or provide model path/HuggingFace ID
|
| 11 |
+
MODEL_CONFIG = {
|
| 12 |
+
# TinyBERT tagger for extracting relevant excerpts
|
| 13 |
+
"tagger": "/ksg/kehl_mm_data/meta/2024/v20/v20_models/tagger", # e.g., "prajjwal1/bert-tiny" or "./auto-tiny-bert-tagger"
|
| 14 |
+
|
| 15 |
+
# Sentence transformer for embedding patient summaries and trials
|
| 16 |
+
"embedder": "/ksg/kehl_mm_data/meta/2024/v20/v20_models/trialspace", # e.g., "Qwen/Qwen3-Embedding-0.6B" or "./reranker_round2.model"
|
| 17 |
+
|
| 18 |
+
# Large language model for patient history summarization
|
| 19 |
+
"llm": "/ksg/kehl_mm_data/meta/2024/v20/v20_oncoreasoning_training/gcp_export/checkpoint-60000-for-export/",
|
| 20 |
+
#"llm": "openai/gpt-oss-120b",
|
| 21 |
+
|
| 22 |
+
# ModernBERT classifier for eligibility prediction
|
| 23 |
+
"trial_checker": "/ksg/kehl_mm_data/meta/2024/v20/v20_models/trialchecker", # e.g., "answerdotai/ModernBERT-large" or "./modernbert-trial-checker"
|
| 24 |
+
|
| 25 |
+
# ModernBERT classifier for boilerplate exclusion prediction
|
| 26 |
+
"boilerplate_checker": "/ksg/kehl_mm_data/meta/2024/v20/v20_models/boilerplatechecker", # e.g., "answerdotai/ModernBERT-large" or "./modernbert-boilerplate-checker"
|
| 27 |
+
}
|
| 28 |
+
|
| 29 |
+
# Example configuration with base models:
|
| 30 |
+
# MODEL_CONFIG = {
|
| 31 |
+
# "tagger": "prajjwal1/bert-tiny",
|
| 32 |
+
# "embedder": "Qwen/Qwen3-Embedding-0.6B",
|
| 33 |
+
# "llm": "microsoft/Phi-3-mini-4k-instruct",
|
| 34 |
+
# "trial_checker": "answerdotai/ModernBERT-large",
|
| 35 |
+
# "boilerplate_checker": "answerdotai/ModernBERT-large",
|
| 36 |
+
# }
|
| 37 |
+
|
| 38 |
+
# Example configuration with fine-tuned models:
|
| 39 |
+
# MODEL_CONFIG = {
|
| 40 |
+
# "tagger": "./auto-tiny-bert-tagger",
|
| 41 |
+
# "embedder": "./reranker_round2.model",
|
| 42 |
+
# "llm": "/data/models/gpt-oss-120b",
|
| 43 |
+
# "trial_checker": "./modernbert-trial-checker",
|
| 44 |
+
# "boilerplate_checker": "./modernbert-boilerplate-checker",
|
| 45 |
+
# }
|
| 46 |
+
|
| 47 |
+
# ============================================================================
|
| 48 |
+
# DEFAULT TRIAL DATABASE
|
| 49 |
+
# ============================================================================
|
| 50 |
+
|
| 51 |
+
# Path to default trial database CSV/Excel file
|
| 52 |
+
# Will auto-load and embed when embedder model is ready
|
| 53 |
+
# Set to None to disable auto-loading
|
| 54 |
+
DEFAULT_TRIAL_DB = "./trial_space_lineitems.csv" # e.g., "./my_trials.csv" or "./sample_trials.csv"
|
| 55 |
+
|
| 56 |
+
# ============================================================================
|
| 57 |
+
# PRE-EMBEDDED TRIALS (Recommended for faster startup)
|
| 58 |
+
# ============================================================================
|
| 59 |
+
|
| 60 |
+
# Path to pre-embedded trial database (parquet file with 'embedding' column)
|
| 61 |
+
# This is preferred over DEFAULT_TRIAL_DB as it loads instantly without re-embedding
|
| 62 |
+
# Generate with: python preembed_trials.py --trials trials.csv --embedder model --output trial_embeddings.parquet
|
| 63 |
+
# Set to None to disable pre-embedded loading (will fall back to DEFAULT_TRIAL_DB)
|
| 64 |
+
PREEMBEDDED_TRIALS = "https://huggingface.co/datasets/ksg-dfci/mmai-synthetic/resolve/main/trial_embeddings.parquet"
|
| 65 |
+
|
| 66 |
+
# ============================================================================
|
| 67 |
+
# USAGE NOTES
|
| 68 |
+
# ============================================================================
|
| 69 |
+
#
|
| 70 |
+
# 1. Set the model paths above to your preferred models
|
| 71 |
+
# 2. Optionally set DEFAULT_TRIAL_DB to your trial database file
|
| 72 |
+
# 3. For faster startup, pre-embed your trials:
|
| 73 |
+
# python preembed_trials.py --trials your_trials.csv --embedder your_model --output trial_embeddings.parquet
|
| 74 |
+
# Then set PREEMBEDDED_TRIALS = "trial_embeddings.parquet"
|
| 75 |
+
# 4. Save this file
|
| 76 |
+
# 5. Run: python app.py
|
| 77 |
+
# 6. Models will load automatically on startup
|
| 78 |
+
#
|
| 79 |
+
# You can still manually load different models through the web interface
|
| 80 |
+
# if you need to switch models during a session.
|
| 81 |
+
#
|
| 82 |
+
# PRE-EMBEDDED FORMAT:
|
| 83 |
+
# The parquet file contains all original trial columns plus an 'embedding' column
|
| 84 |
+
# where each row has a list of floats representing the trial's embedding vector.
|
| 85 |
+
# This format is compatible with HuggingFace Datasets for easy sharing.
|
| 86 |
+
#
|
create_sample_data.py
ADDED
|
@@ -0,0 +1,251 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Generate sample data for testing the Clinical Trial Matching Pipeline
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import pandas as pd
|
| 7 |
+
from datetime import datetime, timedelta
|
| 8 |
+
|
| 9 |
+
def create_sample_trials():
|
| 10 |
+
"""Create a sample trial database CSV."""
|
| 11 |
+
|
| 12 |
+
trials = [
|
| 13 |
+
{
|
| 14 |
+
'nct_id': 'NCT12345678',
|
| 15 |
+
'this_space': '''Metastatic non-small cell lung cancer (NSCLC) with EGFR exon 19 deletion or L858R mutation
|
| 16 |
+
Prior treatment: At least one prior platinum-based chemotherapy regimen
|
| 17 |
+
ECOG performance status: 0-2
|
| 18 |
+
Measurable disease per RECIST v1.1
|
| 19 |
+
Adequate organ function''',
|
| 20 |
+
'trial_text': '''Phase III randomized study of osimertinib versus platinum-based chemotherapy in patients with
|
| 21 |
+
EGFR-mutated metastatic NSCLC who have progressed on first-line EGFR TKI therapy. Primary endpoint is progression-free
|
| 22 |
+
survival. Secondary endpoints include overall survival, objective response rate, and quality of life.''',
|
| 23 |
+
'trial_boilerplate_text': '''No active brain metastases requiring immediate intervention
|
| 24 |
+
No prior treatment with third-generation EGFR TKIs
|
| 25 |
+
No interstitial lung disease or pneumonitis
|
| 26 |
+
No congestive heart failure NYHA class III-IV
|
| 27 |
+
No HIV, hepatitis B, or hepatitis C infection'''
|
| 28 |
+
},
|
| 29 |
+
{
|
| 30 |
+
'nct_id': 'NCT23456789',
|
| 31 |
+
'this_space': '''HER2-positive metastatic breast cancer
|
| 32 |
+
Prior treatment: Trastuzumab and pertuzumab in any setting
|
| 33 |
+
ECOG performance status: 0-1
|
| 34 |
+
Brain metastases allowed if treated and stable
|
| 35 |
+
LVEF ≥50%''',
|
| 36 |
+
'trial_text': '''Phase II study of trastuzumab deruxtecan in HER2-positive metastatic breast cancer patients
|
| 37 |
+
who have received prior trastuzumab and pertuzumab. Primary endpoint is objective response rate. Key secondary endpoints
|
| 38 |
+
include duration of response, progression-free survival, and safety.''',
|
| 39 |
+
'trial_boilerplate_text': '''No history of pneumonitis or interstitial lung disease
|
| 40 |
+
No concurrent cardiac dysfunction
|
| 41 |
+
No active hepatitis B or C infection
|
| 42 |
+
No pregnancy or breastfeeding'''
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
'nct_id': 'NCT34567890',
|
| 46 |
+
'this_space': '''Advanced melanoma with BRAF V600E or V600K mutation
|
| 47 |
+
Treatment-naive for metastatic disease (adjuvant therapy allowed if completed >6 months prior)
|
| 48 |
+
ECOG performance status: 0-1
|
| 49 |
+
No active autoimmune disease requiring systemic therapy
|
| 50 |
+
Adequate bone marrow, hepatic, and renal function''',
|
| 51 |
+
'trial_text': '''Phase III randomized trial comparing dabrafenib plus trametinib versus vemurafenib monotherapy
|
| 52 |
+
in previously untreated BRAF-mutant metastatic melanoma. Primary endpoint is overall survival. Secondary endpoints include
|
| 53 |
+
progression-free survival, response rate, and toxicity.''',
|
| 54 |
+
'trial_boilerplate_text': '''No prior systemic therapy for metastatic melanoma
|
| 55 |
+
No active brain metastases (treated and stable brain metastases allowed)
|
| 56 |
+
No history of inflammatory bowel disease
|
| 57 |
+
No significant cardiac disease
|
| 58 |
+
No HIV infection on antiretroviral therapy'''
|
| 59 |
+
},
|
| 60 |
+
{
|
| 61 |
+
'nct_id': 'NCT45678901',
|
| 62 |
+
'this_space': '''Microsatellite instability-high (MSI-H) or mismatch repair deficient (dMMR) advanced solid tumors
|
| 63 |
+
Progressive disease on or after prior standard therapy
|
| 64 |
+
ECOG performance status: 0-2
|
| 65 |
+
Measurable disease per RECIST v1.1
|
| 66 |
+
No prior checkpoint inhibitor therapy''',
|
| 67 |
+
'trial_text': '''Phase II basket study of pembrolizumab in patients with MSI-H/dMMR advanced solid tumors.
|
| 68 |
+
Primary endpoint is objective response rate by tumor type. Secondary endpoints include duration of response,
|
| 69 |
+
progression-free survival, and overall survival.''',
|
| 70 |
+
'trial_boilerplate_text': '''No active autoimmune disease requiring systemic therapy
|
| 71 |
+
No history of severe immune-related adverse events
|
| 72 |
+
No active pneumonitis or interstitial lung disease
|
| 73 |
+
No concurrent systemic corticosteroids (>10mg prednisone equivalent daily)
|
| 74 |
+
No HIV, hepatitis B, or hepatitis C infection'''
|
| 75 |
+
},
|
| 76 |
+
{
|
| 77 |
+
'nct_id': 'NCT56789012',
|
| 78 |
+
'this_space': '''Advanced or metastatic renal cell carcinoma (RCC), clear cell histology
|
| 79 |
+
No prior systemic therapy for advanced disease
|
| 80 |
+
Intermediate or poor risk per IMDC criteria
|
| 81 |
+
ECOG performance status: 0-1
|
| 82 |
+
Measurable disease per RECIST v1.1''',
|
| 83 |
+
'trial_text': '''Phase III randomized study of cabozantinib plus nivolumab versus sunitinib in previously
|
| 84 |
+
untreated advanced RCC. Primary endpoint is progression-free survival. Secondary endpoints include overall survival,
|
| 85 |
+
objective response rate, and safety.''',
|
| 86 |
+
'trial_boilerplate_text': '''No prior systemic therapy for metastatic RCC
|
| 87 |
+
No active brain metastases
|
| 88 |
+
No history of bowel perforation or fistula
|
| 89 |
+
No poorly controlled hypertension
|
| 90 |
+
No active hepatitis B or C infection
|
| 91 |
+
No significant cardiovascular disease'''
|
| 92 |
+
}
|
| 93 |
+
]
|
| 94 |
+
|
| 95 |
+
df = pd.DataFrame(trials)
|
| 96 |
+
df.to_csv('sample_trials.csv', index=False)
|
| 97 |
+
print(f"✓ Created sample_trials.csv with {len(df)} trials")
|
| 98 |
+
return df
|
| 99 |
+
|
| 100 |
+
def create_sample_patient_notes():
|
| 101 |
+
"""Create sample patient clinical notes CSV."""
|
| 102 |
+
|
| 103 |
+
base_date = datetime(2023, 1, 1)
|
| 104 |
+
|
| 105 |
+
notes = [
|
| 106 |
+
{
|
| 107 |
+
'date': base_date,
|
| 108 |
+
'text': 'Patient is a 67-year-old male with a 40 pack-year smoking history presenting with cough and weight loss. CT chest shows a 4.5 cm right upper lobe mass with mediastinal lymphadenopathy.',
|
| 109 |
+
'note_type': 'clinical_note'
|
| 110 |
+
},
|
| 111 |
+
{
|
| 112 |
+
'date': base_date + timedelta(days=7),
|
| 113 |
+
'text': 'CT-guided lung biopsy performed. Pathology shows adenocarcinoma, moderately differentiated.',
|
| 114 |
+
'note_type': 'pathology_report'
|
| 115 |
+
},
|
| 116 |
+
{
|
| 117 |
+
'date': base_date + timedelta(days=14),
|
| 118 |
+
'text': 'PET/CT shows FDG-avid right upper lobe mass (SUVmax 12.3), right hilar nodes (SUVmax 8.7), and mediastinal nodes (SUVmax 9.2). No distant metastatic disease identified.',
|
| 119 |
+
'note_type': 'imaging_report'
|
| 120 |
+
},
|
| 121 |
+
{
|
| 122 |
+
'date': base_date + timedelta(days=21),
|
| 123 |
+
'text': '''Next-generation sequencing (NGS) performed on lung biopsy specimen.
|
| 124 |
+
Results: EGFR exon 19 deletion (L747_A750delinsP) detected.
|
| 125 |
+
Other findings: TP53 p.R273H mutation, MYC amplification (copy number gain).
|
| 126 |
+
PD-L1 expression by immunohistochemistry: 75% tumor proportion score.
|
| 127 |
+
TMB: 4 mutations/Mb (low).
|
| 128 |
+
No ALK, ROS1, BRAF, MET, RET, or KRAS alterations detected.''',
|
| 129 |
+
'note_type': 'ngs_report'
|
| 130 |
+
},
|
| 131 |
+
{
|
| 132 |
+
'date': base_date + timedelta(days=28),
|
| 133 |
+
'text': 'Mediastinoscopy with biopsy of station 4R and 7 lymph nodes. Pathology confirms metastatic adenocarcinoma. Clinical stage: T2aN2M0, stage IIIA.',
|
| 134 |
+
'note_type': 'pathology_report'
|
| 135 |
+
},
|
| 136 |
+
{
|
| 137 |
+
'date': base_date + timedelta(days=42),
|
| 138 |
+
'text': 'Patient underwent concurrent chemoradiation with carboplatin/pemetrexed and 60 Gy radiation to primary tumor and mediastinum. Tolerated well with grade 2 esophagitis.',
|
| 139 |
+
'note_type': 'clinical_note'
|
| 140 |
+
},
|
| 141 |
+
{
|
| 142 |
+
'date': base_date + timedelta(days=112),
|
| 143 |
+
'text': 'Post-treatment CT chest shows near-complete response of primary tumor (now 1.2 cm) and resolution of lymphadenopathy. Started consolidation durvalumab.',
|
| 144 |
+
'note_type': 'imaging_report'
|
| 145 |
+
},
|
| 146 |
+
{
|
| 147 |
+
'date': base_date + timedelta(days=280),
|
| 148 |
+
'text': 'Surveillance CT shows new liver lesions (segment 6 and 7, largest 2.3 cm) and increase in size of lung primary to 3.1 cm. Progression of disease.',
|
| 149 |
+
'note_type': 'imaging_report'
|
| 150 |
+
},
|
| 151 |
+
{
|
| 152 |
+
'date': base_date + timedelta(days=287),
|
| 153 |
+
'text': 'Patient now has metastatic NSCLC (stage IV). ECOG performance status 1. Discussed treatment options. Given EGFR mutation, recommend EGFR TKI therapy.',
|
| 154 |
+
'note_type': 'clinical_note'
|
| 155 |
+
},
|
| 156 |
+
{
|
| 157 |
+
'date': base_date + timedelta(days=294),
|
| 158 |
+
'text': 'Started osimertinib 80 mg daily for EGFR-mutant metastatic NSCLC.',
|
| 159 |
+
'note_type': 'clinical_note'
|
| 160 |
+
},
|
| 161 |
+
{
|
| 162 |
+
'date': base_date + timedelta(days=378),
|
| 163 |
+
'text': 'Restaging CT shows partial response. Liver lesions decreased to 1.2 and 0.9 cm. Primary lung tumor stable at 2.8 cm. Tolerating osimertinib well with mild diarrhea and dry skin.',
|
| 164 |
+
'note_type': 'imaging_report'
|
| 165 |
+
},
|
| 166 |
+
{
|
| 167 |
+
'date': base_date + timedelta(days=560),
|
| 168 |
+
'text': 'Patient reports increased fatigue and back pain over past 3 weeks.',
|
| 169 |
+
'note_type': 'clinical_note'
|
| 170 |
+
},
|
| 171 |
+
{
|
| 172 |
+
'date': base_date + timedelta(days=567),
|
| 173 |
+
'text': '''CT chest/abdomen/pelvis shows:
|
| 174 |
+
- Progression of liver metastases (segment 6: 3.8 cm, previously 1.2 cm; segment 7: 2.9 cm, previously 0.9 cm)
|
| 175 |
+
- New liver lesions in segments 4 and 5
|
| 176 |
+
- Lung primary increased to 4.2 cm
|
| 177 |
+
- New small pleural effusion
|
| 178 |
+
Assessment: Progressive disease on osimertinib.''',
|
| 179 |
+
'note_type': 'imaging_report'
|
| 180 |
+
},
|
| 181 |
+
{
|
| 182 |
+
'date': base_date + timedelta(days=574),
|
| 183 |
+
'text': 'MRI brain with contrast shows no brain metastases. Patient has progressive EGFR-mutant NSCLC after first-line osimertinib. ECOG PS 1. Discussing clinical trial options for second-line therapy.',
|
| 184 |
+
'note_type': 'clinical_note'
|
| 185 |
+
}
|
| 186 |
+
]
|
| 187 |
+
|
| 188 |
+
df = pd.DataFrame(notes)
|
| 189 |
+
df.to_csv('sample_patient_notes.csv', index=False)
|
| 190 |
+
print(f"✓ Created sample_patient_notes.csv with {len(df)} notes")
|
| 191 |
+
return df
|
| 192 |
+
|
| 193 |
+
def create_sample_patient_summary():
|
| 194 |
+
"""Create a sample patient summary text file."""
|
| 195 |
+
|
| 196 |
+
summary = """Age: 67
|
| 197 |
+
Sex: Male
|
| 198 |
+
Cancer type: Non-small cell lung cancer (NSCLC)
|
| 199 |
+
Histology: Adenocarcinoma, moderately differentiated
|
| 200 |
+
Stage at diagnosis: Stage IIIA (T2aN2M0)
|
| 201 |
+
Current extent: Metastatic (stage IV) with liver metastases
|
| 202 |
+
|
| 203 |
+
Biomarkers:
|
| 204 |
+
- EGFR exon 19 deletion (L747_A750delinsP)
|
| 205 |
+
- TP53 p.R273H mutation
|
| 206 |
+
- MYC amplification
|
| 207 |
+
- PD-L1 75% TPS
|
| 208 |
+
- TMB: 4 mutations/Mb (low)
|
| 209 |
+
|
| 210 |
+
Treatment history:
|
| 211 |
+
# 1/28/2023 - 4/15/2023: Concurrent chemoradiation (carboplatin/pemetrexed with 60 Gy)
|
| 212 |
+
# 4/22/2023 - 10/5/2023: Consolidation durvalumab
|
| 213 |
+
# 10/19/2023 - present: Osimertinib 80 mg daily for metastatic disease
|
| 214 |
+
|
| 215 |
+
Disease course:
|
| 216 |
+
- Initial diagnosis: January 2023, stage IIIA
|
| 217 |
+
- Near-complete response to chemoradiation
|
| 218 |
+
- Progression to stage IV in September 2023 (liver metastases)
|
| 219 |
+
- Partial response to osimertinib
|
| 220 |
+
- Current progression on osimertinib (July 2024) after ~9 months of therapy
|
| 221 |
+
|
| 222 |
+
Current status:
|
| 223 |
+
- ECOG performance status: 1
|
| 224 |
+
- Progressive disease with liver metastases
|
| 225 |
+
- No brain metastases on recent MRI
|
| 226 |
+
|
| 227 |
+
Boilerplate:
|
| 228 |
+
No evidence of brain metastases (MRI brain 7/22/2024).
|
| 229 |
+
No history of pneumonitis, interstitial lung disease, congestive heart failure, HIV, or hepatitis infection documented.
|
| 230 |
+
Adequate performance status (ECOG 1).
|
| 231 |
+
"""
|
| 232 |
+
|
| 233 |
+
with open('sample_patient_summary.txt', 'w') as f:
|
| 234 |
+
f.write(summary)
|
| 235 |
+
|
| 236 |
+
print(f"✓ Created sample_patient_summary.txt")
|
| 237 |
+
return summary
|
| 238 |
+
|
| 239 |
+
if __name__ == "__main__":
|
| 240 |
+
print("Generating sample data for Clinical Trial Matching Pipeline...\n")
|
| 241 |
+
|
| 242 |
+
create_sample_trials()
|
| 243 |
+
create_sample_patient_notes()
|
| 244 |
+
create_sample_patient_summary()
|
| 245 |
+
|
| 246 |
+
print("\n✓ All sample files created successfully!")
|
| 247 |
+
print("\nFiles generated:")
|
| 248 |
+
print(" - sample_trials.csv (5 clinical trials)")
|
| 249 |
+
print(" - sample_patient_notes.csv (14 clinical notes)")
|
| 250 |
+
print(" - sample_patient_summary.txt (pre-made summary)")
|
| 251 |
+
print("\nYou can now use these files to test the Gradio application.")
|
preembed_trials.py
ADDED
|
@@ -0,0 +1,445 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
# -*- coding: utf-8 -*-
|
| 3 |
+
|
| 4 |
+
"""
|
| 5 |
+
Pre-embed Clinical Trials Script (Multi-GPU Support)
|
| 6 |
+
|
| 7 |
+
This script pre-processes and embeds a clinical trial database,
|
| 8 |
+
saving the results to a single parquet file for easy sharing on HuggingFace.
|
| 9 |
+
|
| 10 |
+
Usage:
|
| 11 |
+
# Single GPU
|
| 12 |
+
python preembed_trials.py --trials trials.csv --embedder path/to/embedder --output trial_embeddings.parquet --devices cuda:0
|
| 13 |
+
|
| 14 |
+
# Multi-GPU (parallel embedding)
|
| 15 |
+
python preembed_trials.py --trials trial_space_lineitems.csv --embedder ksg-dfci/TrialSpace-1225 --output trial_embeddings.parquet --devices cuda:2,cuda:3
|
| 16 |
+
|
| 17 |
+
This will create:
|
| 18 |
+
- trial_embeddings.parquet: Trial dataframe with 'embedding' column containing vectors
|
| 19 |
+
- trial_embeddings_metadata.json: Metadata about the embedding process (optional)
|
| 20 |
+
"""
|
| 21 |
+
|
| 22 |
+
import argparse
|
| 23 |
+
import pandas as pd
|
| 24 |
+
import numpy as np
|
| 25 |
+
import torch
|
| 26 |
+
import json
|
| 27 |
+
import os
|
| 28 |
+
from pathlib import Path
|
| 29 |
+
from datetime import datetime
|
| 30 |
+
from typing import Tuple, List
|
| 31 |
+
from transformers import AutoTokenizer
|
| 32 |
+
import multiprocessing as mp
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def truncate_text(text: str, tokenizer, max_tokens: int = 1500) -> str:
|
| 36 |
+
"""Truncate text to a maximum number of tokens."""
|
| 37 |
+
return tokenizer.decode(
|
| 38 |
+
tokenizer.encode(text, add_special_tokens=True, truncation=True, max_length=max_tokens),
|
| 39 |
+
skip_special_tokens=True
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def load_trials(file_path: str) -> pd.DataFrame:
|
| 44 |
+
"""Load trials from CSV or Excel file."""
|
| 45 |
+
print(f"\n{'='*70}")
|
| 46 |
+
print(f"Loading trial database from: {file_path}")
|
| 47 |
+
print(f"{'='*70}")
|
| 48 |
+
|
| 49 |
+
if file_path.endswith('.csv'):
|
| 50 |
+
df = pd.read_csv(file_path)
|
| 51 |
+
elif file_path.endswith(('.xlsx', '.xls')):
|
| 52 |
+
df = pd.read_excel(file_path)
|
| 53 |
+
else:
|
| 54 |
+
raise ValueError("Unsupported file format. Use CSV or Excel.")
|
| 55 |
+
|
| 56 |
+
# Check required columns
|
| 57 |
+
required_cols = ['nct_id', 'this_space', 'trial_text', 'trial_boilerplate_text']
|
| 58 |
+
missing = [col for col in required_cols if col not in df.columns]
|
| 59 |
+
if missing:
|
| 60 |
+
raise ValueError(f"Missing required columns: {', '.join(missing)}")
|
| 61 |
+
|
| 62 |
+
print(f"✓ Loaded {len(df)} trials")
|
| 63 |
+
print(f" Columns: {', '.join(df.columns.tolist())}")
|
| 64 |
+
|
| 65 |
+
# Clean data
|
| 66 |
+
original_count = len(df)
|
| 67 |
+
df = df[~df['this_space'].isnull()].copy()
|
| 68 |
+
df['trial_boilerplate_text'] = df['trial_boilerplate_text'].fillna('')
|
| 69 |
+
|
| 70 |
+
if len(df) < original_count:
|
| 71 |
+
print(f" ⚠ Removed {original_count - len(df)} trials with missing 'this_space'")
|
| 72 |
+
|
| 73 |
+
return df
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
def embed_chunk_on_device(args: Tuple[int, List[str], str, str]) -> Tuple[int, np.ndarray]:
|
| 77 |
+
"""
|
| 78 |
+
Worker function to embed a chunk of texts on a specific GPU.
|
| 79 |
+
|
| 80 |
+
Args:
|
| 81 |
+
args: Tuple of (chunk_index, texts_to_embed, embedder_path, device)
|
| 82 |
+
|
| 83 |
+
Returns:
|
| 84 |
+
Tuple of (chunk_index, embeddings_array)
|
| 85 |
+
"""
|
| 86 |
+
chunk_idx, texts, embedder_path, device = args
|
| 87 |
+
|
| 88 |
+
# Import here to ensure fresh CUDA context in spawned process
|
| 89 |
+
from sentence_transformers import SentenceTransformer
|
| 90 |
+
import torch
|
| 91 |
+
|
| 92 |
+
print(f" [GPU {device}] Loading model for chunk {chunk_idx} ({len(texts)} texts)...")
|
| 93 |
+
|
| 94 |
+
# Load model on specific device
|
| 95 |
+
embedder_model = SentenceTransformer(embedder_path, device=device, trust_remote_code=True)
|
| 96 |
+
|
| 97 |
+
# Set the instruction prompt
|
| 98 |
+
try:
|
| 99 |
+
embedder_model.prompts['query'] = (
|
| 100 |
+
"Instruct: Given a cancer patient summary, retrieve clinical trial options "
|
| 101 |
+
"that are reasonable for that patient; or, given a clinical trial option, "
|
| 102 |
+
"retrieve cancer patients who are reasonable candidates for that trial."
|
| 103 |
+
)
|
| 104 |
+
except:
|
| 105 |
+
pass
|
| 106 |
+
|
| 107 |
+
try:
|
| 108 |
+
embedder_model.max_seq_length = 2500
|
| 109 |
+
except:
|
| 110 |
+
pass
|
| 111 |
+
|
| 112 |
+
print(f" [GPU {device}] Embedding {len(texts)} texts...")
|
| 113 |
+
|
| 114 |
+
# Embed
|
| 115 |
+
with torch.no_grad():
|
| 116 |
+
embeddings = embedder_model.encode(
|
| 117 |
+
texts,
|
| 118 |
+
batch_size=64,
|
| 119 |
+
convert_to_tensor=True,
|
| 120 |
+
normalize_embeddings=True,
|
| 121 |
+
show_progress_bar=True,
|
| 122 |
+
prompt='query'
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
embeddings_np = embeddings.cpu().numpy()
|
| 126 |
+
print(f" [GPU {device}] ✓ Chunk {chunk_idx} complete: {embeddings_np.shape}")
|
| 127 |
+
|
| 128 |
+
# Explicitly clean up to free GPU memory
|
| 129 |
+
del embedder_model
|
| 130 |
+
del embeddings
|
| 131 |
+
torch.cuda.empty_cache()
|
| 132 |
+
|
| 133 |
+
return chunk_idx, embeddings_np
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
def embed_trials_multi_gpu(df: pd.DataFrame, embedder_path: str, devices: List[str]) -> Tuple[np.ndarray, str]:
|
| 137 |
+
"""Embed trials using multiple GPUs in parallel."""
|
| 138 |
+
print(f"\n{'='*70}")
|
| 139 |
+
print(f"MULTI-GPU EMBEDDING")
|
| 140 |
+
print(f"{'='*70}")
|
| 141 |
+
print(f"Embedder model: {embedder_path}")
|
| 142 |
+
print(f"Devices: {', '.join(devices)}")
|
| 143 |
+
print(f"Total trials: {len(df)}")
|
| 144 |
+
|
| 145 |
+
# Load tokenizer for text preparation (on CPU)
|
| 146 |
+
print(f"\nPreparing texts...")
|
| 147 |
+
embedder_tokenizer = AutoTokenizer.from_pretrained(embedder_path, trust_remote_code=True)
|
| 148 |
+
|
| 149 |
+
# Prepare texts for embedding
|
| 150 |
+
df['this_space_trunc'] = df['this_space'].apply(
|
| 151 |
+
lambda x: truncate_text(str(x), embedder_tokenizer, max_tokens=1500)
|
| 152 |
+
)
|
| 153 |
+
|
| 154 |
+
# Add instruction prefix
|
| 155 |
+
prefix = (
|
| 156 |
+
"Instruct: Given a cancer patient summary, retrieve clinical trial options "
|
| 157 |
+
"that are reasonable for that patient; or, given a clinical trial option, "
|
| 158 |
+
"retrieve cancer patients who are reasonable candidates for that trial. "
|
| 159 |
+
)
|
| 160 |
+
all_texts = [prefix + txt for txt in df['this_space_trunc'].tolist()]
|
| 161 |
+
|
| 162 |
+
print(f" Text length stats:")
|
| 163 |
+
print(f" Mean: {np.mean([len(t) for t in all_texts]):.0f} chars")
|
| 164 |
+
print(f" Max: {max([len(t) for t in all_texts])} chars")
|
| 165 |
+
|
| 166 |
+
# Split texts into chunks for each GPU
|
| 167 |
+
num_gpus = len(devices)
|
| 168 |
+
chunk_size = len(all_texts) // num_gpus
|
| 169 |
+
chunks = []
|
| 170 |
+
|
| 171 |
+
for i, device in enumerate(devices):
|
| 172 |
+
start_idx = i * chunk_size
|
| 173 |
+
# Last GPU gets any remainder
|
| 174 |
+
end_idx = len(all_texts) if i == num_gpus - 1 else (i + 1) * chunk_size
|
| 175 |
+
chunk_texts = all_texts[start_idx:end_idx]
|
| 176 |
+
chunks.append((i, chunk_texts, embedder_path, device))
|
| 177 |
+
print(f" Chunk {i} -> {device}: indices {start_idx}-{end_idx} ({len(chunk_texts)} texts)")
|
| 178 |
+
|
| 179 |
+
print(f"\n{'='*70}")
|
| 180 |
+
print(f"Starting parallel embedding on {num_gpus} GPUs...")
|
| 181 |
+
print(f"{'='*70}")
|
| 182 |
+
|
| 183 |
+
# Run embedding in parallel using multiprocessing with spawn context
|
| 184 |
+
ctx = mp.get_context('spawn')
|
| 185 |
+
with ctx.Pool(processes=num_gpus) as pool:
|
| 186 |
+
results = pool.map(embed_chunk_on_device, chunks)
|
| 187 |
+
|
| 188 |
+
# Sort results by chunk index and concatenate
|
| 189 |
+
results.sort(key=lambda x: x[0])
|
| 190 |
+
embeddings_list = [r[1] for r in results]
|
| 191 |
+
embeddings_np = np.vstack(embeddings_list)
|
| 192 |
+
|
| 193 |
+
print(f"\n{'='*70}")
|
| 194 |
+
print(f"✓ Embedding complete")
|
| 195 |
+
print(f" Final shape: {embeddings_np.shape}")
|
| 196 |
+
print(f" Dtype: {embeddings_np.dtype}")
|
| 197 |
+
print(f"{'='*70}")
|
| 198 |
+
|
| 199 |
+
return embeddings_np, embedder_path
|
| 200 |
+
|
| 201 |
+
|
| 202 |
+
def embed_trials_single_gpu(df: pd.DataFrame, embedder_path: str, device: str) -> Tuple[np.ndarray, str]:
|
| 203 |
+
"""Embed trials using a single GPU (original behavior)."""
|
| 204 |
+
from sentence_transformers import SentenceTransformer
|
| 205 |
+
|
| 206 |
+
print(f"\n{'='*70}")
|
| 207 |
+
print(f"Loading embedder model: {embedder_path}")
|
| 208 |
+
print(f"{'='*70}")
|
| 209 |
+
print(f"Device: {device}")
|
| 210 |
+
|
| 211 |
+
# Load embedder
|
| 212 |
+
embedder_model = SentenceTransformer(embedder_path, device=device, trust_remote_code=True)
|
| 213 |
+
embedder_tokenizer = AutoTokenizer.from_pretrained(embedder_path, trust_remote_code=True)
|
| 214 |
+
|
| 215 |
+
print(f"✓ Embedder loaded")
|
| 216 |
+
|
| 217 |
+
# Set the instruction prompt
|
| 218 |
+
try:
|
| 219 |
+
embedder_model.prompts['query'] = (
|
| 220 |
+
"Instruct: Given a cancer patient summary, retrieve clinical trial options "
|
| 221 |
+
"that are reasonable for that patient; or, given a clinical trial option, "
|
| 222 |
+
"retrieve cancer patients who are reasonable candidates for that trial."
|
| 223 |
+
)
|
| 224 |
+
except:
|
| 225 |
+
pass
|
| 226 |
+
|
| 227 |
+
try:
|
| 228 |
+
embedder_model.max_seq_length = 2500
|
| 229 |
+
except:
|
| 230 |
+
pass
|
| 231 |
+
|
| 232 |
+
print(f"\n{'='*70}")
|
| 233 |
+
print(f"Embedding {len(df)} trials")
|
| 234 |
+
print(f"{'='*70}")
|
| 235 |
+
|
| 236 |
+
# Prepare texts for embedding
|
| 237 |
+
df['this_space_trunc'] = df['this_space'].apply(
|
| 238 |
+
lambda x: truncate_text(str(x), embedder_tokenizer, max_tokens=1500)
|
| 239 |
+
)
|
| 240 |
+
|
| 241 |
+
# Add instruction prefix
|
| 242 |
+
prefix = (
|
| 243 |
+
"Instruct: Given a cancer patient summary, retrieve clinical trial options "
|
| 244 |
+
"that are reasonable for that patient; or, given a clinical trial option, "
|
| 245 |
+
"retrieve cancer patients who are reasonable candidates for that trial. "
|
| 246 |
+
)
|
| 247 |
+
texts_to_embed = [prefix + txt for txt in df['this_space_trunc'].tolist()]
|
| 248 |
+
|
| 249 |
+
print(f" Text length stats:")
|
| 250 |
+
print(f" Mean: {np.mean([len(t) for t in texts_to_embed]):.0f} chars")
|
| 251 |
+
print(f" Max: {max([len(t) for t in texts_to_embed])} chars")
|
| 252 |
+
|
| 253 |
+
# Embed with progress bar
|
| 254 |
+
with torch.no_grad():
|
| 255 |
+
embeddings = embedder_model.encode(
|
| 256 |
+
texts_to_embed,
|
| 257 |
+
batch_size=64,
|
| 258 |
+
convert_to_tensor=True,
|
| 259 |
+
normalize_embeddings=True,
|
| 260 |
+
show_progress_bar=True,
|
| 261 |
+
prompt='query'
|
| 262 |
+
)
|
| 263 |
+
|
| 264 |
+
embeddings_np = embeddings.cpu().numpy()
|
| 265 |
+
|
| 266 |
+
print(f"✓ Embedding complete")
|
| 267 |
+
print(f" Shape: {embeddings_np.shape}")
|
| 268 |
+
print(f" Dtype: {embeddings_np.dtype}")
|
| 269 |
+
|
| 270 |
+
return embeddings_np, embedder_path
|
| 271 |
+
|
| 272 |
+
|
| 273 |
+
def save_embeddings(df: pd.DataFrame, embeddings: np.ndarray, output_path: str, embedder_path: str, devices: List[str]):
|
| 274 |
+
"""Save trial data with embeddings to a single parquet file."""
|
| 275 |
+
print(f"\n{'='*70}")
|
| 276 |
+
print(f"Saving to: {output_path}")
|
| 277 |
+
print(f"{'='*70}")
|
| 278 |
+
|
| 279 |
+
# Ensure output directory exists
|
| 280 |
+
output_file = Path(output_path)
|
| 281 |
+
output_file.parent.mkdir(parents=True, exist_ok=True)
|
| 282 |
+
|
| 283 |
+
# Add embeddings as a column (convert each row to a list for parquet compatibility)
|
| 284 |
+
df_out = df.copy()
|
| 285 |
+
df_out['embedding'] = [emb.tolist() for emb in embeddings]
|
| 286 |
+
|
| 287 |
+
# Save to parquet
|
| 288 |
+
df_out.to_parquet(output_path, index=False)
|
| 289 |
+
print(f"✓ Saved parquet file: {output_path}")
|
| 290 |
+
print(f" Size: {output_file.stat().st_size / 1024 / 1024:.2f} MB")
|
| 291 |
+
print(f" Rows: {len(df_out)}")
|
| 292 |
+
print(f" Embedding dimension: {embeddings.shape[1]}")
|
| 293 |
+
|
| 294 |
+
# Save metadata alongside (optional, for reference)
|
| 295 |
+
metadata = {
|
| 296 |
+
"created_at": datetime.now().isoformat(),
|
| 297 |
+
"embedder_model": embedder_path,
|
| 298 |
+
"num_trials": len(df),
|
| 299 |
+
"embedding_dim": embeddings.shape[1],
|
| 300 |
+
"nct_ids_sample": df['nct_id'].tolist()[:10] + (["..."] if len(df) > 10 else []),
|
| 301 |
+
"embedding_dtype": str(embeddings.dtype),
|
| 302 |
+
"normalized": True,
|
| 303 |
+
"format": "parquet",
|
| 304 |
+
"embedding_column": "embedding",
|
| 305 |
+
"devices_used": devices
|
| 306 |
+
}
|
| 307 |
+
|
| 308 |
+
metadata_file = str(output_file.with_suffix('.metadata.json'))
|
| 309 |
+
with open(metadata_file, 'w') as f:
|
| 310 |
+
json.dump(metadata, f, indent=2)
|
| 311 |
+
print(f"✓ Saved metadata: {metadata_file}")
|
| 312 |
+
|
| 313 |
+
print(f"\n{'='*70}")
|
| 314 |
+
print(f"PRE-EMBEDDING COMPLETE")
|
| 315 |
+
print(f"{'='*70}")
|
| 316 |
+
print(f"\nTo use these pre-embedded trials in your app:")
|
| 317 |
+
print(f"1. Update config.py with:")
|
| 318 |
+
print(f" PREEMBEDDED_TRIALS = '{output_path}'")
|
| 319 |
+
print(f"2. Restart the application")
|
| 320 |
+
print(f"\nThe app will automatically load these embeddings on startup!")
|
| 321 |
+
print(f"\nTo share on HuggingFace:")
|
| 322 |
+
print(f" huggingface-cli upload your-username/dataset-name {output_path}")
|
| 323 |
+
|
| 324 |
+
|
| 325 |
+
def parse_devices(devices_str: str) -> List[str]:
|
| 326 |
+
"""Parse comma-separated device string into list of devices."""
|
| 327 |
+
if not devices_str:
|
| 328 |
+
return ["cuda" if torch.cuda.is_available() else "cpu"]
|
| 329 |
+
|
| 330 |
+
devices = [d.strip() for d in devices_str.split(',')]
|
| 331 |
+
|
| 332 |
+
# Validate devices
|
| 333 |
+
for device in devices:
|
| 334 |
+
if device.startswith('cuda'):
|
| 335 |
+
if ':' in device:
|
| 336 |
+
gpu_id = int(device.split(':')[1])
|
| 337 |
+
if gpu_id >= torch.cuda.device_count():
|
| 338 |
+
raise ValueError(f"GPU {gpu_id} not available. Only {torch.cuda.device_count()} GPUs found.")
|
| 339 |
+
elif not torch.cuda.is_available():
|
| 340 |
+
raise ValueError("CUDA not available")
|
| 341 |
+
|
| 342 |
+
return devices
|
| 343 |
+
|
| 344 |
+
|
| 345 |
+
def main():
|
| 346 |
+
parser = argparse.ArgumentParser(
|
| 347 |
+
description="Pre-embed clinical trials for faster loading (supports multi-GPU)",
|
| 348 |
+
formatter_class=argparse.RawDescriptionHelpFormatter,
|
| 349 |
+
epilog="""
|
| 350 |
+
Examples:
|
| 351 |
+
# Single GPU
|
| 352 |
+
python preembed_trials.py --trials data/trials.csv --embedder models/embedder --output trial_embeddings.parquet --devices cuda:0
|
| 353 |
+
|
| 354 |
+
# Multi-GPU (4 GPUs in parallel)
|
| 355 |
+
python preembed_trials.py --trials trials.csv --embedder Qwen/Qwen3-Embedding-0.6B --output trial_embeddings.parquet --devices cuda:0,cuda:1,cuda:2,cuda:3
|
| 356 |
+
|
| 357 |
+
# CPU only
|
| 358 |
+
python preembed_trials.py --trials trials.csv --embedder model --output trial_embeddings.parquet --devices cpu
|
| 359 |
+
"""
|
| 360 |
+
)
|
| 361 |
+
|
| 362 |
+
parser.add_argument(
|
| 363 |
+
'--trials',
|
| 364 |
+
type=str,
|
| 365 |
+
required=True,
|
| 366 |
+
help='Path to trial database (CSV or Excel)'
|
| 367 |
+
)
|
| 368 |
+
|
| 369 |
+
parser.add_argument(
|
| 370 |
+
'--embedder',
|
| 371 |
+
type=str,
|
| 372 |
+
required=True,
|
| 373 |
+
help='Path to embedder model or HuggingFace model name'
|
| 374 |
+
)
|
| 375 |
+
|
| 376 |
+
parser.add_argument(
|
| 377 |
+
'--output',
|
| 378 |
+
type=str,
|
| 379 |
+
required=True,
|
| 380 |
+
help='Output path for parquet file (e.g., "trial_embeddings.parquet")'
|
| 381 |
+
)
|
| 382 |
+
|
| 383 |
+
parser.add_argument(
|
| 384 |
+
'--devices',
|
| 385 |
+
type=str,
|
| 386 |
+
default=None,
|
| 387 |
+
help='Comma-separated list of devices (e.g., "cuda:0,cuda:1,cuda:2" or "cuda:0" or "cpu"). Default: auto-detect single GPU'
|
| 388 |
+
)
|
| 389 |
+
|
| 390 |
+
# Keep --device for backwards compatibility
|
| 391 |
+
parser.add_argument(
|
| 392 |
+
'--device',
|
| 393 |
+
type=str,
|
| 394 |
+
default=None,
|
| 395 |
+
help='(Deprecated) Use --devices instead. Single device to use for embedding.'
|
| 396 |
+
)
|
| 397 |
+
|
| 398 |
+
args = parser.parse_args()
|
| 399 |
+
|
| 400 |
+
# Handle backwards compatibility with --device
|
| 401 |
+
if args.device and not args.devices:
|
| 402 |
+
args.devices = args.device
|
| 403 |
+
|
| 404 |
+
# Parse devices
|
| 405 |
+
devices = parse_devices(args.devices)
|
| 406 |
+
|
| 407 |
+
# Ensure output has .parquet extension
|
| 408 |
+
output_path = args.output
|
| 409 |
+
if not output_path.endswith('.parquet'):
|
| 410 |
+
output_path = output_path + '.parquet'
|
| 411 |
+
|
| 412 |
+
print(f"\n{'='*70}")
|
| 413 |
+
print(f"CLINICAL TRIAL PRE-EMBEDDING SCRIPT")
|
| 414 |
+
print(f"{'='*70}")
|
| 415 |
+
print(f"Trial Database: {args.trials}")
|
| 416 |
+
print(f"Embedder Model: {args.embedder}")
|
| 417 |
+
print(f"Output File: {output_path}")
|
| 418 |
+
print(f"Devices: {', '.join(devices)}")
|
| 419 |
+
print(f"{'='*70}\n")
|
| 420 |
+
|
| 421 |
+
try:
|
| 422 |
+
# Load trials
|
| 423 |
+
df = load_trials(args.trials)
|
| 424 |
+
|
| 425 |
+
# Embed trials (choose single vs multi-GPU based on device count)
|
| 426 |
+
if len(devices) > 1:
|
| 427 |
+
embeddings, embedder_path = embed_trials_multi_gpu(df, args.embedder, devices)
|
| 428 |
+
else:
|
| 429 |
+
embeddings, embedder_path = embed_trials_single_gpu(df, args.embedder, devices[0])
|
| 430 |
+
|
| 431 |
+
# Save everything to parquet
|
| 432 |
+
save_embeddings(df, embeddings, output_path, embedder_path, devices)
|
| 433 |
+
|
| 434 |
+
print(f"\n✓ SUCCESS!")
|
| 435 |
+
|
| 436 |
+
except Exception as e:
|
| 437 |
+
print(f"\n✗ ERROR: {str(e)}")
|
| 438 |
+
import traceback
|
| 439 |
+
traceback.print_exc()
|
| 440 |
+
return 1
|
| 441 |
+
|
| 442 |
+
return 0
|
| 443 |
+
|
| 444 |
+
if __name__ == "__main__":
|
| 445 |
+
exit(main())
|
requirements.txt
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=4.0.0
|
| 2 |
+
pandas>=2.0.0
|
| 3 |
+
numpy>=1.24.0
|
| 4 |
+
torch>=2.0.0
|
| 5 |
+
transformers>=4.35.0
|
| 6 |
+
sentence-transformers>=2.2.0
|
| 7 |
+
openpyxl>=3.1.0
|
| 8 |
+
xlrd>=2.0.0
|
| 9 |
+
|
| 10 |
+
# Optional but recommended for faster LLM inference
|
| 11 |
+
vllm>=0.5.0
|
| 12 |
+
|
| 13 |
+
# For CUDA support (if using GPU)
|
| 14 |
+
# Install PyTorch with CUDA separately:
|
| 15 |
+
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
|
| 16 |
+
|
| 17 |
+
# For huggingface datasets
|
| 18 |
+
datasets>=2.0.0
|
| 19 |
+
pyarrow>=10.0.0
|
sample_patient_notes.csv
ADDED
|
@@ -0,0 +1,1312 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
,pseudo_mrn,date,text
|
| 2 |
+
0,74643,2020-01-01,"**PATHOLOGY REPORT**
|
| 3 |
+
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
### Patient Information
|
| 7 |
+
**Name:** David Kim **Sex/Age:** Male / 47 y/o
|
| 8 |
+
|
| 9 |
+
### Specimen Identification
|
| 10 |
+
- **Accession #:** LP‑47321‑BXC
|
| 11 |
+
- **Procedure Date:** [date withheld] – CT‑Guided Percutaneous Core Needle Biopsy
|
| 12 |
+
- **Anatomic Site:** Left Upper Lobe (segment S³⁄⁴), Lung Parenchymal Mass, C34.2
|
| 13 |
+
|
| 14 |
+
### Type of Specimen
|
| 15 |
+
Core needle biopsy specimens obtained under computed tomography guidance; 5 dedicated cores placed in formalin.
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Gross Description
|
| 20 |
+
|
| 21 |
+
| Item | Quantity | Dimensions (mm) | Appearance |
|
| 22 |
+
|------|----------|-----------------|------------|
|
| 23 |
+
| Core #1 | 1 | 18 × 2 × 2 | Tan–white, firm, focally hemorrhagic |
|
| 24 |
+
| Core #2 | 1 | 16 × 2 × 2 | Tan–white, gritty |
|
| 25 |
+
| Core #3 | 1 | 15 × 2 × 2 | Grayish-pink, soft |
|
| 26 |
+
| Core #4 | 1 | 13 × 2 × 2 | White, rubbery |
|
| 27 |
+
| Core #5 | 1 | 12 × 2 × 2 | Pink-tan, friable |
|
| 28 |
+
|
| 29 |
+
All cores were inked, bisected longitudinally, submitted entirely in cassettes labeled “LP‑47321‑BX”. Tissue fixed in neutral buffered formalin for ≥24 hours before processing.
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## Microscopic Examination
|
| 34 |
+
|
| 35 |
+
*Histopathology:* Sections reveal sheets and nests of markedly atypical malignant cells set against a background of desmoplastic stroma and foci of geographic necrosis. Tumor cells display moderate-to-large eosinophilic cytoplasm, vesicular nuclei with coarse clumped chromatin, conspicuous nucleoli (≥2 × size of adjacent lymphocyte nucleus), and frequent abnormal mitoses (>20/10 HPF). Occasional rosette-like arrangements and scant intracytoplasmic granules suggest neuroendocrine differentiation.
|
| 36 |
+
|
| 37 |
+
*Architectural pattern*: Predominantly solid growth with occasional trabecular formation; absence of glandular/tubular structures excludes adenocarcinoma component. No keratin pearls or squamous maturation observed.
|
| 38 |
+
|
| 39 |
+
*Margin evaluation*: All margins represented by peripheral fragments of normal alveolar parenchyma; unable to assess true radial margin due to limited sampling inherent to core biopsies.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## Immunohistochemistry & Ancillary Tests
|
| 44 |
+
|
| 45 |
+
| Marker | Result | Interpretation |
|
| 46 |
+
|--------|--------|----------------|
|
| 47 |
+
| Pan‑Cytokeratin (AE1/AE3) | Diffuse strong membranous staining | Confirms epithelial origin |
|
| 48 |
+
| Neuroendocrine Markers:<br>- Synaptophysin | Focal (+) in ≈30 % of tumor cells | Supports neuroendocrine phenotype |
|
| 49 |
+
| - Chromogranin A | Weak focal (+) | Consistent with neuroendocrine differentiation |
|
| 50 |
+
| - CD56 (NCAM) | Positive (moderate) | Reinforces neuroendocrine profile |
|
| 51 |
+
| TTF‑1 | Negative | Typical for large‑cell histotype lacking adenocarcinoma lineage |
|
| 52 |
+
| Napsin A | Negative | Excludes conventional adenocarcinoma |
|
| 53 |
+
| p40/p63 | Negative | Rules out squamous differentiation |
|
| 54 |
+
| Ki‑67 (MIB‑1) | Approximately 55 % labeling index (high) | High proliferative activity |
|
| 55 |
+
| PD‑L1 (22C3 pharmDx) | Tumor Proportion Score = 10 % (TPS) | Low-level expression |
|
| 56 |
+
|
| 57 |
+
*Molecular Testing:* None performed on this specimen; subsequent next‑generation sequencing ordered on parallel material (see separate record).
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Diagnostic Summary
|
| 62 |
+
|
| 63 |
+
**Primary Diagnosis:**
|
| 64 |
+
Large Cell Carcinoma of the Lung, left upper lobe, exhibiting neuroendocrine features, WHO grade III (high proliferative rate).
|
| 65 |
+
|
| 66 |
+
**Additional Comments:**
|
| 67 |
+
- The presence of focal neuroendocrine marker positivity indicates divergent differentiation but does not meet criteria for definitive small‑cell carcinoma.
|
| 68 |
+
- Ki‑67 index of 55 % underscores aggressive biology; however, therapeutic decisions should integrate staging data and molecular profile (KRAS, STK11, etc.) obtained subsequently.
|
| 69 |
+
- PD‑L1 expression measured at 10 % (22C3) suggests modest checkpoint inhibitor targetability; interpretation should consider overall tumor microenvironment and forthcoming systemic options.
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
**Prepared By:**
|
| 74 |
+
Dr. ***[Pathologist Name], MD***
|
| 75 |
+
Board Certified Anatomic Pathology
|
| 76 |
+
|
| 77 |
+
**Report Verified On:** [date withheld]
|
| 78 |
+
|
| 79 |
+
--- "
|
| 80 |
+
1,74643,2020-01-02,"**Next‑Generation Sequencing (NGS) Molecular Profiling Report**
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
### Patient Information
|
| 85 |
+
**Name:** David Kim **Sex/Age:** Male / 47 y/o
|
| 86 |
+
|
| 87 |
+
### Specimen Details
|
| 88 |
+
| Item | Description |
|
| 89 |
+
|------|-------------|
|
| 90 |
+
| **Specimen ID** | LR-LU‑00123 (Left Upper Lobe Core Biopsy) |
|
| 91 |
+
| **Date of Collection** | [date omitted] |
|
| 92 |
+
| **Source** | Computed tomography–guided core needle biopsy – left upper lobe pulmonary mass (≈5 cm) |
|
| 93 |
+
| **Histology (from accompanying H&E)** | Large‑cell carcinoma with focal neuroendocrine differentiation |
|
| 94 |
+
|
| 95 |
+
### Test Performed
|
| 96 |
+
*Targeted DNA & RNA hybrid capture panel covering >300 cancer‐relevant genes (including SNVs, indels, copy number alterations, splice site changes, selected gene fusions).*
|
| 97 |
+
|
| 98 |
+
Sequencing platform: Illumina NovaSeq 6000
|
| 99 |
+
Average depth of coverage (tumor): ~850×
|
| 100 |
+
Tumor cellularity estimated at 65 %.
|
| 101 |
+
|
| 102 |
+
### Detected Genomic Alterations
|
| 103 |
+
|
| 104 |
+
| Gene | Variant | Allele Frequency* | Interpretation |
|
| 105 |
+
|------|---------|-------------------|---------------|
|
| 106 |
+
| **KRAS** | c.35G>T (p.Gly12Val) | 38 % | Activating missense mutation typical of driver oncogene in NSCLC. |
|
| 107 |
+
| **STK11** | Whole‑gene deletion (loss) inferred from copy‑number analysis | NA | Tumor suppressor loss associated with aggressive phenotype and reduced response to immune checkpoint blockade. |
|
| 108 |
+
| **KEAP1** | c.1796_1800del (p.Leu599Serfs*13) | 32 % | Loss‑of‑function frameshift leading to NRF2 pathway activation; linked to resistance to oxidative stress and potentially poorer outcomes. |
|
| 109 |
+
| **TP53** | c.658A>G (p.Tyr220Cys) | 41 % | Missense mutation affecting DNA‑binding domain; classic tumor‑suppressor alteration conferring genomic instability. |
|
| 110 |
+
| **MET** | Exon 14 skipping transcript detected (low‑level) – junction reads supporting Δex14 | Approx. 5 % (estimated from RNA read count) | In-frame splicing variant resulting in impaired degradation of MET protein; low allelic burden suggests subclonal population. |
|
| 111 |
+
| **ALK** | Rearrangement – Not detected (RNA fusion assay negative) | — | No actionable ALK fusion identified. |
|
| 112 |
+
| **ROS1** | Rearrangement – Not detected (RNA fusion assay negative) | — | No actionable ROS1 fusion identified. |
|
| 113 |
+
|
| 114 |
+
\*Allele frequency reflects proportion of mutant reads relative to total reads at the locus (DNA‑based unless otherwise noted).
|
| 115 |
+
|
| 116 |
+
### Additional Technical Notes
|
| 117 |
+
|
| 118 |
+
- **Quality Metrics:** All target regions achieved ≥100× coverage; mean uniformity = 96 %. No evidence of sample contamination.
|
| 119 |
+
- **Microsatellite Instability (MSI):** Stable (no MSI‑high signature observed).
|
| 120 |
+
- **Tumor Mutational Burden (TMB):** Estimated 7 mut/Mb (intermediate range for NSCLC).
|
| 121 |
+
|
| 122 |
+
### Interpretative Summary
|
| 123 |
+
|
| 124 |
+
The comprehensive profiling of the left upper lobe large‑cell carcinoma reveals:
|
| 125 |
+
|
| 126 |
+
1. An *activating KRAS G12V* point mutation, establishing KRAS as the principal oncogenic driver. This alteration is known to confer sensitivity to emerging KRAS^G12C inhibitors (e.g., adagrasib, sotorasib) when the cysteine substitution is present; however, G12V does not fall under current FDA‑approved KRAS inhibitor indications, though investigational agents targeting non‑cysteine KRAS mutants exist.
|
| 127 |
+
|
| 128 |
+
2. Co‑occurrence of **STK11 loss**, **KEAP1 mutation**, and **TP53 Y220C**—a constellation frequently seen in “KP” (KRAS + TP53) or “KL” (KRAS + LKB1/STK11) molecular subsets of lung adenocarcinoma. The presence of STK11 loss and KEAP1 mutation has been correlated with diminished responsiveness to anti‑PD‑(L)1 immunotherapy and more aggressive clinical behavior.
|
| 129 |
+
|
| 130 |
+
3. Low‑frequency **MET exon 14 skipping** transcripts suggest a minor subclone harboring this alteration. While MET exon 14 skipping is an established actionable target (responsive to MET TKIs such as crizo, tepotinib, savolitinib), the subclonal nature (<10 %) raises uncertainty regarding therapeutic benefit.
|
| 131 |
+
|
| 132 |
+
4. Absence of **ALK** or **ROS1** rearrangements eliminates eligibility for approved ALK/ROS1 tyrosine kinase inhibitors.
|
| 133 |
+
|
| 134 |
+
Overall, the molecular landscape underscores KRAS-driven oncogenesis accompanied by additional tumor‑suppressor losses that may influence prognosis and selection of systemic therapy strategies.
|
| 135 |
+
|
| 136 |
+
### Limitations
|
| 137 |
+
|
| 138 |
+
- The detection limit for subclonal alterations approximates 5 %; very low‑abundance events below this threshold could remain undetected.
|
| 139 |
+
- Formal validation of MET exon 14 skipping via orthogonal methods (e.g., RT‑PCR) was not performed due to limited tissue availability.
|
| 140 |
+
|
| 141 |
+
### Reporting Physician
|
| 142 |
+
|
| 143 |
+
Dr. ***[Molecular Pathology Fellow]*
|
| 144 |
+
Department of Pathology & Laboratory Medicine
|
| 145 |
+
|
| 146 |
+
*(Signature electronically generated)*
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
---
|
| 151 |
+
|
| 152 |
+
*End of Report*"
|
| 153 |
+
2,74643,2020-01-03,"**PATIENT:** David Kim Sex: Male Age: 47
|
| 154 |
+
**REFERRING PHYSICIAN:** Pulmonology / Oncology Team
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
### EXAMINATION
|
| 159 |
+
**Modality:** Integrated whole‑body **^18F‑FDG Positron Emission Tomography / Computed Tomography (PET/CT)**
|
| 160 |
+
**Protocol Summary:** Patient fasted ≥6 h, blood glucose measured at 92 mg/dL before intravenous administration of 370 MBq (≈10 mCi) ^18F‑FDG. Uptake phase lasted ≈60 min. Low‑dose non‑contrast CT obtained for attenuation correction and anatomical localization, followed by standard‐duration helical PET acquisition from skull base through mid‑thigh. Images reconstructed with iterative algorithm; axial slice thickness 3–5 mm.
|
| 161 |
+
|
| 162 |
+
**Comparison:** None – this is the baseline pre‑therapy study following recent tissue diagnosis.
|
| 163 |
+
|
| 164 |
+
---
|
| 165 |
+
|
| 166 |
+
## FINDINGS
|
| 167 |
+
|
| 168 |
+
| Region | Observation |
|
| 169 |
+
|--------|-------------|
|
| 170 |
+
| **Thorax – Lungs** | • **Left Upper Lobe (segment S¹⁺²):** 5.0 × 4.3 × 4.0 cm irregular soft‑tissue mass centered in the posterior segment. On PET, markedly increased FDG accumulation with **maximum standardized uptake value (SUVₘₐₓ) = 12.0**, heterogeneous internal metabolism, peripheral rim of slightly lower activity likely reflecting necrosis.<br>• **Ipsilateral (left) hilar region (station 10L):** Single enlarged lymph node measuring 1.4 cm short axis, demonstrating focal FDG avidity (**SUVₘₐₓ ≈ 8.5**) concordant with metabolically active nodal disease.<br>• **Contralateral lung fields:** No discrete pulmonary nodules or areas of abnormally increased FDG uptake.<br>• **Pleura:** No pleural thickening or effusion. |
|
| 171 |
+
| **Mediastinum & Central Structures** | No additional FDG‑avid mediastinal lymph nodes beyond the aforementioned hilar node. Cardiac silhouette unremarkable. |
|
| 172 |
+
| **Upper Abdomen (liver, adrenal glands, pancreas, spleen)** | Normal physiologic hepatic uptake; no focal hypermetabolic foci in liver, bilateral adrenals, pancreas, or spleen. |
|
| 173 |
+
| **Lower Abdomen/Pelvis** | Physiologic bowel activity without focal abnormalities. No osseous lesions demonstrably hypermetabolic. |
|
| 174 |
+
| **Bone Survey** | No focal skeletal uptake above background suggesting metastatic disease. |
|
| 175 |
+
| **Soft Tissue** | Unremarkable musculature and skin. |
|
| 176 |
+
|
| 177 |
+
*Overall image quality satisfactory.*
|
| 178 |
+
|
| 179 |
+
---
|
| 180 |
+
|
| 181 |
+
## IMPRESSION
|
| 182 |
+
|
| 183 |
+
1. **Baseline metabolic characterization** of histologically proven left upper‑lobe large‑cell carcinoma with prominent FDG uptake (**SUVₘₐₓ = 12**), confirming high glycolytic activity typical of aggressive NSCLC.
|
| 184 |
+
2. **Single ipsilateral hilar lymph node** (station 10L) demonstrates significant FDG avidity (**SUVₘₐₓ ≈ 8.5**), supporting clinical N1 classification.
|
| 185 |
+
3. **Absence of FDG‑avid disease** elsewhere (contralateral lung, mediastinum, supraclavicular regions, abdomen, pelvis, skeleton) → no detectable distant metastasis (cM0).
|
| 186 |
+
4. Radiographic staging aligns with **clinical stage IIIA (cT2b N1 M0)** per AJCC 8ᵗʰ edition.
|
| 187 |
+
|
| 188 |
+
These findings provide essential baseline data for forthcoming multimodality therapy planning (induction chemoradiation followed by surgery). Continued close imaging surveillance is advised post‑treatment to assess response.
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
|
| 192 |
+
---
|
| 193 |
+
|
| 194 |
+
**Prepared By:** Dr. _______________________, MD
|
| 195 |
+
Radiology Department – Thoracic Imaging Service
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
*(Signature electronic)*"
|
| 199 |
+
3,74643,2020-01-04,"**Oncology Consultation Note – Multidisciplinary Tumor Board Recommendation**
|
| 200 |
+
|
| 201 |
+
---
|
| 202 |
+
|
| 203 |
+
### Patient Information
|
| 204 |
+
**Name:** David Kim **Age:** 47 **Sex:** Male
|
| 205 |
+
|
| 206 |
+
---
|
| 207 |
+
|
| 208 |
+
## Chief Complaint
|
| 209 |
+
Evaluation and management plan for newly diagnosed left‐upper‑lobe non–small cell lung cancer (NSCLC), stage IIIA (cT2b N1 M0), histologically classified as large‑cell carcinoma with neuroendocrine features.
|
| 210 |
+
|
| 211 |
+
---
|
| 212 |
+
|
| 213 |
+
## History of Present Illness
|
| 214 |
+
David Kim is a previously healthy 47‑year‑old man who presented three months ago following an incidentally detected pulmonary opacity on a screening computed tomography (CT) scan obtained for occupational health surveillance. He denies cough, hemoptysis, dyspnea, wheezing, fever, weight loss, night sweats, or chest pain. His performance status is estimated at Eastern Cooperative Oncology Group (ECOG) 0.
|
| 215 |
+
|
| 216 |
+
Bronchoscopy with CT‑guided core needle biopsy of the left upper lobe (segment 1) yielded a poorly differentiated large‑cell carcinoma exhibiting focal neuroendocrine differentiation. Immunohistochemistry demonstrated synaptophysin positivity in scattered cells. The proliferative index (Ki‑67) was elevated at 55 %; programmed death‑ligand 1 (PD‑L1) expression measured by 22C3 assay was 10 %.
|
| 217 |
+
|
| 218 |
+
Comprehensive next‑generation sequencing (NGS) performed on the same tissue sample identified:
|
| 219 |
+
|
| 220 |
+
| Gene | Alteration |
|
| 221 |
+
|------|------------|
|
| 222 |
+
| **KRAS** | G12V (activating) |
|
| 223 |
+
| **STK11** | Loss-of-function |
|
| 224 |
+
| **KEAP1** | Missense mutation |
|
| 225 |
+
| **TP53** | Y220C missense |
|
| 226 |
+
| **MET** | Low‑level exon 14 skipping |
|
| 227 |
+
|
| 228 |
+
No ALK or ROS1 rearrangements were detected. Fluorescence in situ hybridization (FISH) for EGFR amplification was negative.
|
| 229 |
+
|
| 230 |
+
Staging work‑up consisted of contrast‑enhanced chest CT demonstrating a 5.0 × 4.5 cm spiculated mass involving the posterior segment of the left upper lobe with associated atelectasis, and a solitary enlarged ipsilateral hilar lymph node measuring 1.8 cm. No additional intrathoracic adenopathy was visualized. Whole‑body ^18F‑FDG PET/CT disclosed intense metabolic activity confined to the primary lesion (SUV_max = 12) and the involved hilar node (SUV_max ≈ 8). Brain MRI was unremarkable. Overall staging is cT2b N1 M0 → stage IIIA (AJCC 8th edition).
|
| 231 |
+
|
| 232 |
+
He has never smoked cigarettes (never‑smoker), works as a software engineer, lives with his spouse, and drinks socially (<2 drinks/week). There is no personal or familial history of malignancy aside from a paternal uncle with colorectal cancer diagnosed at age 68.
|
| 233 |
+
|
| 234 |
+
---
|
| 235 |
+
|
| 236 |
+
## Review of Systems
|
| 237 |
+
*General:* Denies fevers, chills, recent infections, unintended weight change.
|
| 238 |
+
*Respiratory:* No chronic cough, sputum production, dyspnea, pleuritic chest pain, or hemoptysis.
|
| 239 |
+
*Cardiovascular:* No palpitations, orthopnea, edema.
|
| 240 |
+
*GI:* Normal appetite, bowel habits regular, no nausea/vomiting.
|
| 241 |
+
*Genitourinary:* No dysuria, frequency changes.
|
| 242 |
+
*Neurologic:* No headaches, dizziness, seizures.
|
| 243 |
+
*Skin:* No rashes or lesions.
|
| 244 |
+
All other systems reviewed and negative.
|
| 245 |
+
|
| 246 |
+
---
|
| 247 |
+
|
| 248 |
+
## Past Medical History
|
| 249 |
+
- Hypertension – well controlled on lisinopril 10 mg daily (diagnosed at age 42).
|
| 250 |
+
- Hyperlipidemia – atorvastatin 20 mg nightly.
|
| 251 |
+
- No known cardiac, hepatic, renal, or hematologic disorders.
|
| 252 |
+
|
| 253 |
+
## Surgical History
|
| 254 |
+
Appendectomy (age 23).
|
| 255 |
+
|
| 256 |
+
## Social History
|
| 257 |
+
Never smoker, occasional alcohol, no illicit drugs. Lives with wife, employed full‑time, exercises regularly (running 3 times/week).
|
| 258 |
+
|
| 259 |
+
## Family History
|
| 260 |
+
Father alive, hypertension; mother deceased (stroke at 78); brother healthy; paternal uncle colon CA at 68.
|
| 261 |
+
|
| 262 |
+
## Allergies
|
| 263 |
+
NKDA (no known drug allergies).
|
| 264 |
+
|
| 265 |
+
## Current Medications
|
| 266 |
+
- Lisinopril 10 mg PO QD
|
| 267 |
+
- Atorvastatin 20 mg PO HS
|
| 268 |
+
- Vitamin D₃ 2000 IU daily
|
| 269 |
+
|
| 270 |
+
---
|
| 271 |
+
|
| 272 |
+
## Physical Examination
|
| 273 |
+
**Vital Signs:** BP 122/76 mmHg, HR 72 bpm, RR 16/min, Temp 36.8°C, SpO₂ 98 % RA.
|
| 274 |
+
|
| 275 |
+
**General:** Well‑appearing, NAD, alert & oriented ×3.
|
| 276 |
+
|
| 277 |
+
**HEENT:** Normocephalic, atraumatic, mucous membranes moist.
|
| 278 |
+
|
| 279 |
+
**Neck:** Supple, no cervical adenopathy.
|
| 280 |
+
|
| 281 |
+
**Chest/Lungs:** Clear breath sounds bilaterally, no crackles, rhonchi, or egophony. Slight reduction of tactile fremitus over left apex correlates with radiographic abnormality.
|
| 282 |
+
|
| 283 |
+
**Heart:** Regular rate/rhythm, S1/S2 audible, no murmurs/gallops.
|
| 284 |
+
|
| 285 |
+
**Abdomen:** Soft, nondistended, normoactive bowel sounds, no hepatosplenomegaly.
|
| 286 |
+
|
| 287 |
+
**Extremities:** No clubbing, cyanosis, edema.
|
| 288 |
+
|
| 289 |
+
**Neurological:** Grossly intact cranial nerves II‑XII, strength 5/5 throughout, sensation preserved.
|
| 290 |
+
|
| 291 |
+
---
|
| 292 |
+
|
| 293 |
+
## Laboratory Data (most recent, drawn today)
|
| 294 |
+
|
| 295 |
+
| Test | Result | Reference |
|
| 296 |
+
|------|--------|-----------|
|
| 297 |
+
| CBC w/diff | WBC 6.2 ×10⁹/L, Hb 13.8 g/dL, Plts 240 ×10⁹/L | Normal |
|
| 298 |
+
| Comprehensive Metabolic Panel | Na 138, K 4.2, Cl 102, CO₂ 24, BUN 15, Cr 0.92, Glu 94, AST 21, ALT 19, Alk Phos 71, Total Bilirubin 0.6 | Within limits |
|
| 299 |
+
| LDH | 180 U/L | ≤250 |
|
| 300 |
+
| CEA | 2.1 ng/mL | ≤5 |
|
| 301 |
+
| Serum electrolytes, thyroid panel – pending | — | — |
|
| 302 |
+
|
| 303 |
+
---
|
| 304 |
+
|
| 305 |
+
## Radiology Summary
|
| 306 |
+
|
| 307 |
+
**Contrast Chest CT (Axial):** Left upper lobe posterior segment harbors a 5.0 × 4.5 cm irregular soft‑tissue mass causing adjacent bronchial narrowing; internal heterogeneity suggests necrosis. One ipsilateral hilar LN measures 1.8 cm short axis, mildly hyperattenuating. No contralateral hilar or mediastinal nodes >1 cm. No pleural effusion or chest wall invasion.
|
| 308 |
+
|
| 309 |
+
**^18F‑FDG PET/CT:** Intense FDG avidity localized to the aforementioned primary lesion (SUV_max = 12) and the left hilar node (SUV_max ≈ 8). No extrathoracic foci of increased metabolism.
|
| 310 |
+
|
| 311 |
+
Interpretation aligns with clinically staged cT2b N1 M0 disease.
|
| 312 |
+
|
| 313 |
+
---
|
| 314 |
+
|
| 315 |
+
## Assessment
|
| 316 |
+
|
| 317 |
+
1. **Stage III A (cT2b N1 M0) left‑sided large‑cell carcinoma with neuroendocrine features**, harboring KRAS G12V, STK11 loss, KEAP1, TP53 mutations, low‑level MET exon 14 skipping, PD‑L1 ≈10 %.
|
| 318 |
+
- High‑risk biologic profile (concurrent KRAS/STK11/KEAP1 alterations) predicts modest response to immunotherapy alone.
|
| 319 |
+
- Molecular landscape does not presently support FDA‑approved targeted agents beyond investigational KRAS inhibitors.
|
| 320 |
+
|
| 321 |
+
2. **Overall Health Status:** Excellent functional reserve (ECOG 0), adequate organ function, suitable candidate for multimodality curative intent therapy.
|
| 322 |
+
|
| 323 |
+
---
|
| 324 |
+
|
| 325 |
+
## Plan / Recommendations
|
| 326 |
+
|
| 327 |
+
After thorough interdisciplinary discussion among Thoracic Surgery, Radiation Oncology, Pulmonary Medicine, and Medical Oncology, consensus recommends **induction concurrent chemoradiation followed by surgical resection**—the standard approach for fit patients with potentially resectable stage IIIA NSCLC when mediastinal clearance cannot be guaranteed preoperatively.
|
| 328 |
+
|
| 329 |
+
1. **Radiation Oncology Referral**
|
| 330 |
+
- Initiate simulation for definitive thoracic irradiation targeting the primary tumor and involved hilar station with elective coverage of stations 5–7. Planned dose: **60 Gy in 30 fractions** (2 Gy/fraction).
|
| 331 |
+
- Discuss volumetric modulated arc therapy (VMAT) vs intensity‑modulated RT (IMRT) to minimize esophageal and heart exposure.
|
| 332 |
+
|
| 333 |
+
2. **Medical Oncology – Chemotherapy Regimen**
|
| 334 |
+
- Concurrent systemic therapy: **carbo AUC 2 IV weekly** + **Taxol 45 mg/m² IV weekly** administered on days 1, 8, 15, 22, 29 of radiation course. This regimen balances efficacy with tolerability in combined modality settings.
|
| 335 |
+
- Pre‑medication with dexamethasone, diphenhydramine, and famotidine per institutional protocol to mitigate hypersensitivity reactions.
|
| 336 |
+
- Baseline labs (CBC, CMP) repeated before each cycle; hold chemotherapy for ANC < 1500/mm³ or platelets < 100 000/mm³.
|
| 337 |
+
|
| 338 |
+
3. **Supportive Care Measures**
|
| 339 |
+
- Prophylactic antiemetics (ondansetron PRN) and stool softeners (docusate) to reduce GI toxicity.
|
| 340 |
+
- Nutritional counseling; consider oral cryotherapy during radiation sessions to lessen esophagitis risk.
|
| 341 |
+
- Encourage continued aerobic exercise; monitor weight and muscle mass.
|
| 342 |
+
|
| 343 |
+
4. **Pre‑operative Evaluation**
|
| 344 |
+
- Upon completion of chemoradiation (approximately 6‑8 weeks), repeat staging CT/PET to assess radiographic response.
|
| 345 |
+
- Reassess cardiopulmonary fitness (PFTs, VO₂ max) to confirm operative candidacy.
|
| 346 |
+
- Schedule video‑assisted thoracoscopic surgery (VATS) left upper lobectomy with systematic mediastinal lymphadenectomy (stations 5, 6, 7, 10L) contingent upon favorable restaging.
|
| 347 |
+
|
| 348 |
+
5. **Discussion of Alternatives & Risks**
|
| 349 |
+
- Reviewed alternative approaches including definitive chemoradiation without surgery versus upfront surgery followed by adjuvant therapy. Emphasized higher local control rates and potential survival benefit with trimodal strategy in appropriately selected patients.
|
| 350 |
+
- Informed consent regarding acute toxicities (esophagitis, pneumonitis, myelosuppression) and late sequelae (fibrosis, reduced pulmonary reserve).
|
| 351 |
+
|
| 352 |
+
6. **Clinical Trial Consideration**
|
| 353 |
+
- Offered referral to ongoing trials evaluating KRAS G12C/G12V selective inhibitors in combination with immune checkpoint blockade; patient expressed interest but prefers standard of care at this juncture.
|
| 354 |
+
|
| 355 |
+
7. **Follow‑Up Timeline**
|
| 356 |
+
- Weekly visits during chemoradiation for symptom assessment, toxicity grading (CTCAE v5.0), and laboratory monitoring.
|
| 357 |
+
- Post‑chemoradiation reassessment appointment scheduled 4 weeks after completing radiation to finalize surgical timing.
|
| 358 |
+
- Ongoing survivorship education concerning smoking avoidance, vaccination updates (influenza, COVID‑19, pneumococcal), and psychosocial resources.
|
| 359 |
+
|
| 360 |
+
**Signature:** _________________________
|
| 361 |
+
Dr. [Oncologist Name], MD
|
| 362 |
+
Board Certified Medical Oncologist
|
| 363 |
+
Date: *(to be inserted)*"
|
| 364 |
+
4,74643,2020-01-05,"**Oncology Progress Note – Follow‑Up After Definitive Chemoradiation**
|
| 365 |
+
|
| 366 |
+
**Patient:** David Kim
|
| 367 |
+
**MRN:** [redacted]
|
| 368 |
+
**DOB:** **[age 49]**
|
| 369 |
+
**Date:** *[to be inserted]*
|
| 370 |
+
|
| 371 |
+
---
|
| 372 |
+
|
| 373 |
+
### Chief Complaint
|
| 374 |
+
Routine interval follow‑up; review of post‑chemoradiation CT chest obtained today.
|
| 375 |
+
|
| 376 |
+
### History of Present Illness
|
| 377 |
+
David is a 49‑year‑old man who was diagnosed at age 47 with stage IIIA (cT2b N1 M0) large‑cell carcinoma of the left upper lobe (LUL). He completed concurrent chemoradiotherapy 6 weeks ago consisting of weekly carboplatin AUC 2 + paclitaxel 45 mg/m² together with 60 Gy in 30 fractions delivered via IMRT. During treatment he experienced Grade 2 esophagitis (managed conservatively with topical lidocaine slurry and dietary modification) and Grade 3 thrombocytopenia (platelets nadir 62 × 10⁹/L, required a brief transfusion course). Both toxicities fully resolved by week 4 post‑CRT.
|
| 378 |
+
|
| 379 |
+
He presents today for review of the most recent contrast‑enhanced CT chest performed 2 weeks after completion of CRT. The scan demonstrates marked shrinkage of the LUL mass from 5.0 cm to 2.8 cm (longest axial dimension) with central cavitation suggestive of necrosis. The previously avid ipsilateral hilar node (station 10L) is no longer visualized. No new parenchymal lesions, pleural effusions, or mediastinal adenopathy are seen. Radiographically this meets criteria for a Partial Response (PR) per RECIST 1.1.
|
| 380 |
+
|
| 381 |
+
He denies cough, dyspnea, hemoptysis, dysphagia, fevers, chills, weight change, or night sweats since finishing RT. His performance status remains ECOG 0. He feels “back to baseline” aside from occasional mild throat soreness that has improved.
|
| 382 |
+
|
| 383 |
+
### Review of Systems
|
| 384 |
+
| System | Positive / Negative |
|
| 385 |
+
|--------|----------------------|
|
| 386 |
+
| Constitutional | Denies fever, chills, unexplained weight loss |
|
| 387 |
+
| Respiratory | No cough, sputum, dyspnea, wheeze |
|
| 388 |
+
| Cardiovascular | No chest pain, palpitations |
|
| 389 |
+
| GI | No nausea/vomiting, bowel habits regular |
|
| 390 |
+
| GU | No dysuria, frequency |
|
| 391 |
+
| Neurologic | No headache, dizziness |
|
| 392 |
+
| Dermatologic | No rash |
|
| 393 |
+
| Endocrine | No polyuria/polydipsia |
|
| 394 |
+
|
| 395 |
+
### Past Medical History *(selected)*
|
| 396 |
+
- Large‑cell lung carcinoma, left upper lobe, diagnosed 24 months ago
|
| 397 |
+
- Hypertension, well controlled on lisinopril 20 mg daily
|
| 398 |
+
- Hyperlipidemia, on atorvastatin 40 mg nightly
|
| 399 |
+
|
| 400 |
+
### Surgical History
|
| 401 |
+
Appendectomy (childhood)
|
| 402 |
+
|
| 403 |
+
### Social History
|
| 404 |
+
Never smoker (never >100 cigarettes lifetime). Occasional alcohol (<2 drinks/week). Works as software engineer; physically active (jogging 3–4 times/wk). Lives with spouse; no illicit drugs.
|
| 405 |
+
|
| 406 |
+
### Family History
|
| 407 |
+
Father died of myocardial infarction at 68. Mother alive, hypertension. No known familial cancers.
|
| 408 |
+
|
| 409 |
+
### Allergies
|
| 410 |
+
No drug allergies documented.
|
| 411 |
+
|
| 412 |
+
### Medications
|
| 413 |
+
- Lisinopril 20 mg PO QD
|
| 414 |
+
- Atorvastatin 40 mg PO HS
|
| 415 |
+
- Vitamin D₃ 2000 IU PO Daily
|
| 416 |
+
|
| 417 |
+
*(Chemotherapy agents discontinued after last dose 6 weeks ago.)*
|
| 418 |
+
|
| 419 |
+
### Physical Examination
|
| 420 |
+
Vital signs: BP 122/78 mmHg, HR 72 bpm, RR 14/min, SpO₂ 98% RA, Temp 36.8°C.
|
| 421 |
+
General: Well‑appearing, NAD, alert, oriented ×3.
|
| 422 |
+
HEENT: Oropharynx clear, mild erythema of posterior pharyngeal wall (no ulceration).
|
| 423 |
+
Neck: Supple, no cervical LAD.
|
| 424 |
+
Cardiovascular: Regular rate/rhythm, S1/S2 audible, no murmurs.
|
| 425 |
+
Respiratory: Clear breath sounds bilaterally, no rales or wheezes.
|
| 426 |
+
Abdomen: Soft, non‑tender, normoactive BS.
|
| 427 |
+
Extremities: No edema, pulses intact.
|
| 428 |
+
Neurological: Grossly intact cranial nerves, strength 5/5 UE/LE, sensation preserved.
|
| 429 |
+
|
| 430 |
+
### Laboratory Data (drawn 3 days ago)
|
| 431 |
+
| Test | Result | Reference |
|
| 432 |
+
|------|--------|-----------|
|
| 433 |
+
| CBC w/diff | WBC 6.2 × 10³/µL, Hb 13.8 g/dL, Plts 210 × 10⁹/L | Normal |
|
| 434 |
+
| CMP | Na 138, K 4.2, Cl 102, CO₂ 25, BUN 15, Cr 0.92, AST 32, ALT 35, Alk Phos 88 | Within limits |
|
| 435 |
+
| LDH | 180 U/L (nl ≤250) |
|
| 436 |
+
| CEA | 2.1 ng/mL (nl <5) |
|
| 437 |
+
| Serum electrolytes & glucose – unremarkable |
|
| 438 |
+
|
| 439 |
+
*Prior labs during CRT reflected transient cytopenias; values have normalized.*
|
| 440 |
+
|
| 441 |
+
### Imaging Results (Contrast‑Enhanced Chest CT, 2 wks post‑CRT)
|
| 442 |
+
- **Primary Lesion (Left Upper Lobe):** Residual solid component measuring 2.8 cm (previous 5.0 cm). Central low attenuation suggests necrotic cavity. Margins smooth, without spiculations.
|
| 443 |
+
- **Hilar Nodes:** Station 10L node absent; no other enlarged mediastinal stations (5, 6, 7 negative).
|
| 444 |
+
- **Pleura:** No effusion or thickening.
|
| 445 |
+
- **Other Lung Parenchyma:** No additional nodules or infiltrates.
|
| 446 |
+
- **Upper Abdomen:** Unchanged hepatic steatosis; spleen normal.
|
| 447 |
+
|
| 448 |
+
Interpretation: Marked volumetric regression compatible with favorable biologic response to combined modality therapy. No evidence of progression.
|
| 449 |
+
|
| 450 |
+
### Assessment
|
| 451 |
+
1. **Stage IIIA large‑cell carcinoma of left upper lobe**, cT2b N1 M0 → now post‑CRT PR, pending pathological staging. Molecular profile: KRAS G12V, STK11 loss, KEAP1 mut, TP53 Y220C, low‑level METex14 skip; PD‑L1 10%.
|
| 452 |
+
- Current disease burden markedly reduced; patient maintains excellent functional reserve (ECOG 0, PFTs pending).
|
| 453 |
+
2. **Treatment‑related toxicities:** Resolved Grade 2 esophagitis, resolved Grade 3 thrombocytopenia. No ongoing sequelae.
|
| 454 |
+
3. **Hypertension, hyperlipidemia** – stable on current regimen.
|
| 455 |
+
|
| 456 |
+
### Plan
|
| 457 |
+
1. **Surgical Management**
|
| 458 |
+
- Proceed with curative intent left upper lobectomy with systematic mediastinal lymphadenectomy (stations 5, 6, 7, 10L) once pre‑operative work‑up cleared.
|
| 459 |
+
- Order Pulmonary Function Tests (spirometry, DLCO) within the next week; target FEV1 ≥80% predicted and DLCO ≥70%.
|
| 460 |
+
- Cardiac clearance (EKG ± stress test) due to age >45 despite lack of cardiac symptoms.
|
| 461 |
+
- Discuss operative timing with Thoracic Surgery—target operation ∼4–6 weeks post‑CRT to allow tissue recovery while avoiding undue delay.
|
| 462 |
+
|
| 463 |
+
2. **Adjuvant Considerations**
|
| 464 |
+
- Given complete metabolic response and absence of nodal disease on imaging, adjuvant chemotherapy is not planned unless intra‑operative pathology reveals unexpected residual disease (>pT2a or nodal positivity).
|
| 465 |
+
- Continue close surveillance per NCCN guidelines: CT chest q3–4 mo for the first year post‑resection.
|
| 466 |
+
|
| 467 |
+
3. **Supportive Care**
|
| 468 |
+
- Reinforce nutrition counseling; encourage protein‑rich diet to aid wound healing.
|
| 469 |
+
- Prescribe oral rinses (chlorhexidine gluconate) prophylactically for potential post‑op mucosal irritation.
|
| 470 |
+
- Encourage continued aerobic activity as tolerated; avoid heavy lifting >10 lb for 4 wk post‑op.
|
| 471 |
+
|
| 472 |
+
4. **Follow‑up Labs**
|
| 473 |
+
- Repeat CBC/CMP 1 week prior to scheduled surgery to confirm hematologic adequacy.
|
| 474 |
+
- Monitor fasting lipid panel annually; blood pressure check at each visit.
|
| 475 |
+
|
| 476 |
+
5. **Education & Counseling**
|
| 477 |
+
- Reviewed expected peri‑operative course, risks (bleeding, air leak, infection) and benefits of achieving R0 resection.
|
| 478 |
+
- Emphasized importance of smoking abstinence (patient never smoked) and avoidance of vaping/e-cigarettes.
|
| 479 |
+
- Provided printed material regarding postoperative pulmonary rehabilitation.
|
| 480 |
+
|
| 481 |
+
**Next Appointment:** Return in 2 weeks for PFT results and pre‑operative clearance documentation. Contact office sooner if new respiratory symptoms develop.
|
| 482 |
+
|
| 483 |
+
---
|
| 484 |
+
|
| 485 |
+
*Prepared by:* Dr. ***[Name], MD***
|
| 486 |
+
Medical Oncology – Thoracic Malignancies
|
| 487 |
+
Signature: ______________________ Date: ____________"
|
| 488 |
+
5,74643,2020-01-06,"**Oncology Progress Note – Post‑Operative Follow‑Up**
|
| 489 |
+
|
| 490 |
+
**Patient:** David Kim
|
| 491 |
+
**MRN:** [redacted]
|
| 492 |
+
**DOB:** [redacted]
|
| 493 |
+
|
| 494 |
+
---
|
| 495 |
+
|
| 496 |
+
### CHIEF COMPLAINT
|
| 497 |
+
Routine postoperative evaluation following left upper lobectomy with mediastinal lymphadenectomy for previously diagnosed large‑cell carcinoma of the left upper lobe.
|
| 498 |
+
|
| 499 |
+
---
|
| 500 |
+
|
| 501 |
+
### HISTORY OF PRESENT ILLNESS
|
| 502 |
+
David is a 49‑year‑old man who presents today for his first postoperative oncology appointment approximately **four weeks** after undergoing a left upper lobectomy with systematic mediastinal LN dissection (stations 5, 6, 7, 10L). He recovered uneventfully from surgery, reporting minimal incisional discomfort controlled with acetaminophen PRN. No dyspnea, cough, hemoptysis, chest pain, dysphagia, or constitutional symptoms were endorsed. His performance status remains excellent (ECOG 0).
|
| 503 |
+
|
| 504 |
+
He completed definitive concurrent chemoradiotherapy twelve months ago consisting of weekly carboplatin (AUC 2) and paclitaxel (45 mg/m²) combined with 60 Gy in 30 fractions. During that course he experienced grade 2 esophagitis and transient grade 3 thrombocytopenia, both resolved without sequelae. Pre‑surgical staging PET/CT documented a solitary left upper lobe mass (≈5 cm) with a single ipsilateral hilar node, staged clinically as cIII‑A (cT2b N1 M0). The operative pathology disclosed a residual focus of large‑cell carcinoma measuring 1.9 cm, negative radial margins (R0), and **no metastatic involvement in 12 examined mediastinal nodes** (ypT1b N0). Molecular profiling of the original biopsy had shown KRAS G12V, STK11 loss, KEAP1 mutation, TP53 Y220C, and low‑level MET exon 14 skipping; PD‑L1 expression was 10 %.
|
| 505 |
+
|
| 506 |
+
Given the absence of viable nodal disease, clear margins, and limited residual tumor size, the multidisciplinary team elected close radiographic surveillance rather than immediate adjuvant systemic therapy.
|
| 507 |
+
|
| 508 |
+
---
|
| 509 |
+
|
| 510 |
+
### REVIEW OF SYSTEMS
|
| 511 |
+
| System | Positive / Negative |
|
| 512 |
+
|--------|----------------------|
|
| 513 |
+
| Constitutional | Denies fever, chills, night sweats, weight change |
|
| 514 |
+
| Respiratory | No cough, sputum production, dyspnea, wheezing |
|
| 515 |
+
| Cardiovascular | No chest pressure, palpitations, edema |
|
| 516 |
+
| Gastrointestinal | Normal appetite, denies nausea/vomiting, abdominal pain |
|
| 517 |
+
| Genitourinary | No dysuria, frequency, hematuria |
|
| 518 |
+
| Neurologic | No headache, dizziness, focal deficits |
|
| 519 |
+
| Musculoskeletal | No myalgias, arthralgias |
|
| 520 |
+
| Dermatologic | No rash, pruritus |
|
| 521 |
+
| Endocrine | No polyuria/polydipsia |
|
| 522 |
+
|
| 523 |
+
Overall ROS unremarkable.
|
| 524 |
+
|
| 525 |
+
---
|
| 526 |
+
|
| 527 |
+
### PAST MEDICAL HISTORY
|
| 528 |
+
* Non–small cell lung cancer – Large‑cell carcinoma, left upper lobe, Stage IIIA (treated with CRT → lobectomy)
|
| 529 |
+
* Hypertension, well controlled on lisinopril 20 mg daily
|
| 530 |
+
* Hyperlipidemia, on atorvastatin 40 mg nightly
|
| 531 |
+
* Seasonal allergic rhinitis
|
| 532 |
+
|
| 533 |
+
### SURGERY
|
| 534 |
+
Left upper lobectomy with mediastinal lymph node dissection (stations 5, 6, 7, 10L) – 04/XX/XXXX (date omitted per protocol)
|
| 535 |
+
|
| 536 |
+
### SOCIAL HISTORY
|
| 537 |
+
* Former smoker: 15 pack‑years, quit 2 years ago
|
| 538 |
+
* Occasional alcohol (≤2 drinks/week)
|
| 539 |
+
* Lives with spouse, employed full‑time as software engineer
|
| 540 |
+
* Exercise: walks briskly 30 min most days
|
| 541 |
+
|
| 542 |
+
### FAMILY HISTORY
|
| 543 |
+
* Father deceased at 68 from myocardial infarction
|
| 544 |
+
* Mother alive, hypertension
|
| 545 |
+
* No known familial cancers
|
| 546 |
+
|
| 547 |
+
### ALLERGIES
|
| 548 |
+
* NKDA (No Known Drug Allergies)
|
| 549 |
+
|
| 550 |
+
### CURRENT MEDICATIONS
|
| 551 |
+
| Medication | Dose & Frequency |
|
| 552 |
+
|------------|------------------|
|
| 553 |
+
| Lisinopril | 20 mg PO QD |
|
| 554 |
+
| Atorvastatin | 40 mg PO HS |
|
| 555 |
+
| Acetaminophen (PRN incision pain) | 500 mg q6h PRN |
|
| 556 |
+
| Multivitamin | Daily |
|
| 557 |
+
|
| 558 |
+
---
|
| 559 |
+
|
| 560 |
+
### PHYSICAL EXAMINATION
|
| 561 |
+
Vital signs: BP 122/78 mmHg, HR 72 bpm, RR 14/min, Temp 36.8°C, SpO₂ 98% RA.
|
| 562 |
+
General: Well‑appearing, NAD, alert, oriented ×3.
|
| 563 |
+
HEENT: Normocephalic, atraumatic; oral mucosa moist, no thrush.
|
| 564 |
+
Neck: Supple, no cervical adenopathy.
|
| 565 |
+
Cardiovascular: Regular rate/rhythm, S₁/S₂ audible, no murmurs.
|
| 566 |
+
Respiratory: Clear breath sounds bilaterally, faint scar over left posterior axilla, no rales or wheezes.
|
| 567 |
+
Abdomen: Soft, non‑tender, normoactive bowel sounds.
|
| 568 |
+
Extremities: No clubbing, cyanosis, edema.
|
| 569 |
+
Neurological: Grossly intact cranial nerves II‑XII, strength 5/5 throughout, sensation preserved.
|
| 570 |
+
|
| 571 |
+
---
|
| 572 |
+
|
| 573 |
+
### LABORATORY RESULTS *(drawn today)*
|
| 574 |
+
| Test | Result | Reference |
|
| 575 |
+
|------|--------|-----------|
|
| 576 |
+
| CBC w/diff | WBC 6.2 ×10⁹/L, Hb 13.8 g/dL, Plt 210 ×10⁹/L | Normal |
|
| 577 |
+
| Comprehensive Metabolic Panel | Na 138 mmol/L, K 4.2 mmol/L, Cl 102 mmol/L, BUN 16 mg/dL, Cr 0.92 mg/dL, AST 24 U/L, ALT 27 U/L, Alk Phos 84 U/L, Total Bilirubin 0.6 mg/dL | Within limits |
|
| 578 |
+
| LDH | 165 U/L (nl ≤250) | nl |
|
| 579 |
+
| CEA | 1.8 ng/mL (nl ≤5) | nl |
|
| 580 |
+
| Serum glucose fasting | 96 mg/dL | nl |
|
| 581 |
+
|
| 582 |
+
All values within expected ranges; no evidence of cytopenias or hepatic dysfunction.
|
| 583 |
+
|
| 584 |
+
---
|
| 585 |
+
|
| 586 |
+
### IMAGING RESULT SUMMARY
|
| 587 |
+
Post‑operative contrast‑enhanced chest CT obtained 3 weeks post‑lobectomy (reviewed):
|
| 588 |
+
• Surgical bed demonstrates expected postsurgical changes with linear scarring; no residual soft tissue density.
|
| 589 |
+
• Mediastinum: No enlarged lymph nodes; stations sampled remain unchanged.
|
| 590 |
+
• No pleural effusion or pneumothorax.
|
| 591 |
+
Impression: Complete macroscopic resection (R0) with no detectable residual disease.
|
| 592 |
+
|
| 593 |
+
---
|
| 594 |
+
|
| 595 |
+
### ASSESSMENT
|
| 596 |
+
1. **Large‑cell carcinoma of left upper lobe – Status post neoadjuvant chemoradiation and curative‑intent left upper lobectomy (ypT1b N0, R0)**
|
| 597 |
+
- Pathology confirms negative margins and zero involved nodes.
|
| 598 |
+
- Current imaging reveals no residual tumor.
|
| 599 |
+
- Given favorable pathological response, observation is appropriate.
|
| 600 |
+
|
| 601 |
+
2. **Hypertension**, well controlled.
|
| 602 |
+
3. **Hyperlipidemia**, stable on statin.
|
| 603 |
+
|
| 604 |
+
---
|
| 605 |
+
|
| 606 |
+
### PLAN
|
| 607 |
+
1. **Surveillance Strategy**
|
| 608 |
+
* Chest CT with IV contrast every **3 months** for the next year, then spacing to **every 6 months** through Year 3, thereafter annually per NCCN guidelines for surgically resected NSCLC.
|
| 609 |
+
* Consider low‑dose screening CT annually beyond Year 3 if smoking history warrants continued vigilance.
|
| 610 |
+
|
| 611 |
+
2. **Laboratory Monitoring**
|
| 612 |
+
* Repeat CBC/CMP concurrently with imaging visits to detect early hematologic/hepatic toxicity—though none anticipated absent systemic therapy.
|
| 613 |
+
* Lipid panel and blood pressure check at each office visit.
|
| 614 |
+
|
| 615 |
+
3. **Vaccinations & Preventive Care**
|
| 616 |
+
* Annual influenza vaccine; COVID‑19 booster per CDC guidance.
|
| 617 |
+
* Pneumococcal vaccination series up to date (PCV13 + PPSV23).
|
| 618 |
+
|
| 619 |
+
4. **Lifestyle Counseling**
|
| 620 |
+
* Continue abstinence from tobacco; reinforce benefits of remaining smoke‑free.
|
| 621 |
+
* Maintain regular aerobic activity (>150 minutes moderate intensity per week) and balanced diet.
|
| 622 |
+
|
| 623 |
+
5. **Adjuvant Therapy Discussion**
|
| 624 |
+
* Reviewed rationale for observation versus adjuvant chemotherapy/immunotherapy. Consensus: Observation favored due to lack of nodal disease, negative margins, and adequate recovery from prior chemoradiation. Patient agrees.
|
| 625 |
+
|
| 626 |
+
6. **Follow‑up Appointment**
|
| 627 |
+
* Return to clinic in **3 months** for interval review and coordination of upcoming CT scan.
|
| 628 |
+
|
| 629 |
+
7. **Documentation**
|
| 630 |
+
* All pathology slides archived; molecular profile retained for potential future therapeutic decision‑making (KRAS G12V, STK11 loss, KEAP1 mutant, TP53 Y220C). Will reassess eligibility for emerging KRAS inhibitors should recurrence occur.
|
| 631 |
+
|
| 632 |
+
**Signature:** ___________________________
|
| 633 |
+
Dr. ***[Name], MD*** — Medical Oncology
|
| 634 |
+
Date: _____________ (to be auto‑populated)"
|
| 635 |
+
6,74643,2020-01-07,"**Imaging Report – Chest Computed Tomography**
|
| 636 |
+
|
| 637 |
+
---
|
| 638 |
+
|
| 639 |
+
**Patient:** David Kim
|
| 640 |
+
**Sex/Age:** Male / 50 years
|
| 641 |
+
|
| 642 |
+
**Study Requested By:** Oncology Service – Surveillance Imaging
|
| 643 |
+
|
| 644 |
+
**Exam Performed:** Multidetector computed tomography (MDCT) of the chest, intravenous iodinated contrast administered (unless contraindicated), axial images obtained with ≤1 mm slice thickness, reconstructed in coronal and sagittal planes. Standard lung window and soft‑tissue (mediastinal) window settings applied.
|
| 645 |
+
|
| 646 |
+
**Comparison:** Prior postoperative surveillance CT dated approximately 12 months earlier (post‑lobectomy, age 49).
|
| 647 |
+
|
| 648 |
+
---
|
| 649 |
+
|
| 650 |
+
### FINDINGS
|
| 651 |
+
|
| 652 |
+
#### **Pulmonary Parenchyma**
|
| 653 |
+
- Right Lung: No focal consolidations, ground‑glass opacities, cavitary lesions, or solid nodules identified. Airway patency preserved throughout segmental bronchi. No bronchial wall thickening.
|
| 654 |
+
- Left Lung (post‑left upper lobectomy): Surgical changes evident with complete removal of the left upper lobe. Post‑operative fibrosis and staple lines visualized without associated fluid collections. Residual left lower lobe appears aerated and free of discrete masses. No parenchymal scarring extending into adjacent segments.
|
| 655 |
+
- No new pulmonary nodules detected bilaterally. No residual mass related to previously treated left upper lobe lesion.
|
| 656 |
+
|
| 657 |
+
#### **Mediastinum & Hilar Structures**
|
| 658 |
+
- Central airways (trachea, mainstem bronchi) maintain normal caliber; no endobronchial obstruction.
|
| 659 |
+
- Mediastinal fat planes intact. Cardiac silhouette within normal limits for body habitus.
|
| 660 |
+
- No enlarged mediastinal, prevascular, or paraesophageal lymph nodes. All measured short axes ≤ 8 mm; none demonstrate abnormal enhancement.
|
| 661 |
+
- Bilateral hila appear unremarkable; no suspicious hilar adenopathy.
|
| 662 |
+
|
| 663 |
+
#### **Pleural Spaces**
|
| 664 |
+
- No pleural effusions, pneumothorax, or loculated fluid collections observed. Pleural surfaces smooth.
|
| 665 |
+
|
| 666 |
+
#### **Chest Wall & Upper Abdomen**
|
| 667 |
+
- Visualized portions of the ribs, vertebral bodies, and scapulae show normal bone density without destructive lesions or periosteal reaction.
|
| 668 |
+
- Subcentimeter calcified granuloma noted in the posterior right costophrenic angle—stable appearance compared with prior examination.
|
| 669 |
+
- Liver, adrenal glands, and spleen partially visualized through inferior slices; appear homogeneous with no focal hepatic lesions or adrenal enlargement.
|
| 670 |
+
|
| 671 |
+
#### **Upper Thoracic Vessels**
|
| 672 |
+
- Great vessels patent; no aneurysmal dilatation or intravascular filling defects suggestive of thrombus.
|
| 673 |
+
|
| 674 |
+
---
|
| 675 |
+
|
| 676 |
+
### IMPRESSION
|
| 677 |
+
1. **No evidence of recurrent or metastatic disease** in the chest. Absence of residual tumor in the operative bed and lack of new pulmonary nodules or mediastinal/hilar lymphadenopathy.
|
| 678 |
+
2. Postsurgical changes secondary to left upper lobectomy with expected postoperative anatomy; unchanged from prior imaging.
|
| 679 |
+
3. Incidental small calcified granuloma in the right posteroinferior costophrenic region—stable.
|
| 680 |
+
4. Overall chest CT demonstrates normal thoracic structures without acute abnormalities.
|
| 681 |
+
|
| 682 |
+
*Recommendation:* Continue routine oncology surveillance per institutional protocol. No immediate further imaging required at this interval."
|
| 683 |
+
7,74643,2020-01-08,"**Oncology Follow‑Up Visit Note**
|
| 684 |
+
*Patient:* David Kim *M/D.O.B.:* **[redacted]** *Age:* 51 *Sex:* Male
|
| 685 |
+
|
| 686 |
+
---
|
| 687 |
+
|
| 688 |
+
### Chief Complaint
|
| 689 |
+
Routine survivorship/follow‑up appointment. Patient denies any new symptoms.
|
| 690 |
+
|
| 691 |
+
### History of Present Illness (HPI)
|
| 692 |
+
David is a 51‑year‑old man who presents today for his scheduled post‑operative/on‑therapy follow‑up approximately 3 years after definitive management of a stage IIIA left upper‐lobe large‑cell lung carcinoma (ypT1bN0, R0). He reports feeling “well,” with return to baseline activities and exercise tolerance without limitation. Denies cough, dyspnea, wheeze, hemoptysis, chest pain, fever, night sweats, weight change, dysphagia, odynophagia, abdominal discomfort, nausea/vomiting, changes in bowel habits, polyuria/polydipsia, skin rash, neuropathy, or any other constitutional concerns.
|
| 693 |
+
|
| 694 |
+
He has been fully recovered from his combined modality therapy (concurrent carbo–PTX + 60 Gy thoracic RT) completed at age 48, followed by left upper‑lobectomy with mediastinal LN dissection at age 49. Post‑surgical pathology documented residual 1.9 cm large‑cell carcinoma, 0/12 nodes involved (ypT1bN0). Surveillance computed tomography of the chest obtained at age 50 demonstrated complete radiographic response with no detectable pulmonary masses or suspicious lymphadenopathy. There have been no intervening systemic therapies since then.
|
| 695 |
+
|
| 696 |
+
His performance status remains excellent (ECOG 0). He continues to work full‑time as a software engineer, exercises regularly (≈150 min/week moderate cardio), and adheres to a balanced diet. No tobacco use since quitting at age 46; occasional alcohol consumption (<2 drinks per week).
|
| 697 |
+
|
| 698 |
+
### Review of Systems (ROS)
|
| 699 |
+
|
| 700 |
+
| System | Positive / Negative |
|
| 701 |
+
|--------|----------------------|
|
| 702 |
+
| Constitutional | No fevers, chills, night sweats, unexplained weight loss or gain |
|
| 703 |
+
| HEENT | No sore throat, hoarseness, sinus congestion |
|
| 704 |
+
| Respiratory | No cough, sputum production, dyspnea, pleuritic pain |
|
| 705 |
+
| Cardiovascular | No chest pressure, palpitations, edema |
|
| 706 |
+
| Gastrointestinal | Normal appetite, no nausea, vomiting, dyspepsia, melena, hematochezia |
|
| 707 |
+
| Genitourinary | No dysuria, frequency, nocturia |
|
| 708 |
+
| Musculoskeletal | No myalgias, arthralgias, bone pain |
|
| 709 |
+
| Neurologic | No headaches, dizziness, peripheral neuropathy |
|
| 710 |
+
| Dermatologic | No rashes, pruritus |
|
| 711 |
+
| Endocrine | No polydipsia/polyuria, thyroid symptoms |
|
| 712 |
+
| Hematologic/Lymphatic | No bruising, bleeding, swollen glands |
|
| 713 |
+
|
| 714 |
+
Overall ROS is unremarkable.
|
| 715 |
+
|
| 716 |
+
### Physical Examination
|
| 717 |
+
|
| 718 |
+
- **Vitals:** BP 122/78 mmHg, HR 68 bpm, RR 14/min, Temp 36.8°C, SpO₂ 98% RA, BMI 26 kg/m².
|
| 719 |
+
- **General:** Well‑appearing, NAD, alert & oriented ×3, ECOG 0.
|
| 720 |
+
- **HEENT:** Normocephalic, atraumatic, mucous membranes moist.
|
| 721 |
+
- **Neck:** Supple, no cervical adenopathy.
|
| 722 |
+
- **Cardiovascular:** Regular rate/rhythm, S₁S₂ audible, no murmurs, rubs, gallops.
|
| 723 |
+
- **Respiratory:** Clear breath sounds bilaterally, no crackles, wheezes, or rhonchi. Surgical scar over left posterior axilla appears healed, no erythema or drainage.
|
| 724 |
+
- **Abdomen:** Soft, non‑distended, normoactive bowel sounds, no hepatosplenomegaly, no tenderness.
|
| 725 |
+
- **Extremities:** No clubbing, cyanosis, edema. Full range of motion, strength 5/5 throughout.
|
| 726 |
+
- **Skin:** Warm, dry, intact; no lesions.
|
| 727 |
+
- **Neurological:** Cranial nerves II‑XII intact, gait steady, sensation preserved.
|
| 728 |
+
|
| 729 |
+
### Laboratory Results *(drawn today)*
|
| 730 |
+
|
| 731 |
+
| Test | Result | Reference |
|
| 732 |
+
|------|--------|-----------|
|
| 733 |
+
| CBC w/diff | WBC 6.2 ×10⁹/L, Hb 13.8 g/dL, Plt 240 ×10⁹/L | All within limits |
|
| 734 |
+
| Comprehensive Metabolic Panel | Na 138 mmol/L, K 4.2 mmol/L, Cl 102 mmol/L, CO₂ 24 mmol/L, BUN 15 mg/dL, Cr 0.92 mg/dL, Glucose 94 mg/dL, AST 21 U/L, ALT 19 U/L, Alk Phos 71 U/L, Total Bilirubin 0.6 mg/dL | Unremarkable |
|
| 735 |
+
| Lipid panel | LDL 112 mg/dL, HDL 52 mg/dL, TG 115 mg/dL | Acceptable |
|
| 736 |
+
| Serum CEA | 1.8 ng/mL (≤5) | Within normal |
|
| 737 |
+
| LDH | 180 U/L (125‑250) | Normal |
|
| 738 |
+
| Thyroid function (TSH/T4) | TSH 2.1 µIU/mL, Free T4 1.1 ng/dL | Euthyroid |
|
| 739 |
+
|
| 740 |
+
All values are stable compared with prior assessments.
|
| 741 |
+
|
| 742 |
+
### Imaging Results
|
| 743 |
+
|
| 744 |
+
Most recent cross‑sectional imaging: **Chest CT (non‑contrast)** performed 6 months ago (age 50) — *Interpretation*: No residual parenchymal opacity in the left upper lobe operative bed, no new solid/subsolid nodules, mediastinum and hila free of enlarged nodes, no effusions. Radiology impression: “No evidence of recurrent or metastatic disease.” No additional scans ordered at today’s encounter.
|
| 745 |
+
|
| 746 |
+
### Assessment
|
| 747 |
+
|
| 748 |
+
1. **Stage‑IIIa (cT2bN1M0) Large Cell Carcinoma of Left Upper Lobe**, treated with definitive CRT → lobectomy (ypT1bN0, R0) – presently **no evidence of disease (NED)**. Ongoing remission >2 yr. Molecular profiling previously disclosed KRAS G12V, STK11 loss, KEAP1 mutation, TP53 Y220C, low‑level MET exon‑14 skip; no actionable EGFR, ALK, ROS1 alterations. Current disease‑free interval supports continuation of standard surveillance rather than adjuvant systemic therapy.
|
| 749 |
+
|
| 750 |
+
2. **Performance status/ECOG 0**, fully functional, no treatment‑related sequelae.
|
| 751 |
+
|
| 752 |
+
3. **Preventive health** – Up‑to‑date immunizations (influenza annually, COVID‑19 booster series, pneumococcal PCV20 administered at age 49). Screening colonoscopy due at age 55 (patient plans accordingly). Low cardiovascular risk; lipid profile modestly elevated LDL—continue dietary counseling.
|
| 753 |
+
|
| 754 |
+
### Plan
|
| 755 |
+
|
| 756 |
+
| Item | Details |
|
| 757 |
+
|------|---------|
|
| 758 |
+
| **Surveillance Imaging** | Schedule contrast‑enhanced Chest CT in 6 months (approx. age 52) per NCCN guideline for Stage I NSCLC post‑resection. Consider low‑dose CT annual thereafter if interim scans remain negative. |
|
| 759 |
+
| **Laboratory Monitoring** | Repeat CBC/CMP and serum CEA concurrently with imaging. Monitor fasting glucose quarterly given emerging data linking KRAS‑mutant tumors and metabolic derangement (though patient currently euglycemic). |
|
| 760 |
+
| **Lifestyle Counseling** | Reinforce abstinence from tobacco (maintain quit status). Encourage ≥150 min/week aerobic activity, Mediterranean‑style diet, limit processed red meat. Discuss sun protection and vitamin D supplementation (25‑OH level pending). |
|
| 761 |
+
| **Vaccinations** | Ensure seasonal influenza vaccine before fall season; verify tetanus-diphtheria boosters up to date. |
|
| 762 |
+
| **Psychosocial Support** | Offer referral to survivorship support group; screen PHQ‑9 (score 2) – no depression. |
|
| 763 |
+
| **Future Therapeutic Planning** | Documented KRAS G12V mutation positions him for eligibility in ongoing KRAS‑targeted trials (e.g., adagrasib, combination KRAS inhibitor + SHP2 blockade). Counsel patient that such options will be revisited if recurrence occurs. |
|
| 764 |
+
| **Follow‑up Appointment** | Return to clinic in 3 months for symptom check and labs; imaging result discussion to occur at subsequent visit when CT available. |
|
| 765 |
+
| **Documentation** | Update EMR problem list to reflect “Post‑lung cancer resection – disease‑free” and “KRAS G12V mutant NSCLC”. Add current medication list (none chronic beyond multivitamin). Record allergy status – NKDA. |
|
| 766 |
+
|
| 767 |
+
**Signature:** ________________________ Dr. ***[Oncologist Name], MD***
|
| 768 |
+
**Date:** [auto‑generated]
|
| 769 |
+
|
| 770 |
+
---"
|
| 771 |
+
8,74643,2020-01-09,"**Imaging Report – Computed Tomography (Chest)**
|
| 772 |
+
|
| 773 |
+
**Patient:** David Kim
|
| 774 |
+
**Sex/Age:** Male / 53 years
|
| 775 |
+
**Referring Physician:** Oncology Service
|
| 776 |
+
**Study Date:** [Date omitted per protocol]
|
| 777 |
+
**Accession #:** _______________________
|
| 778 |
+
|
| 779 |
+
---
|
| 780 |
+
|
| 781 |
+
### TECHNIQUE
|
| 782 |
+
Multidetector computed tomography of the chest was performed without intravenous contrast due to renal function considerations. Axial images were obtained from the lung apices through the adrenal glands with ≤1 mm collimation and reconstructed in standard soft‐tissue (width 400 HU, level 40 HU) and high‑frequency lung kernels (width −600 HU, level −100 HU). Sagittal and coronal multiplanar reformats were generated for anatomic correlation. No motion artifacts compromising image quality were observed.
|
| 783 |
+
|
| 784 |
+
Comparison made to prior imaging:
|
| 785 |
+
|
| 786 |
+
* Baseline staging CT (age 47) showing a 5.0 cm left upper lobe mass with ipsilateral hilar adenopathy.
|
| 787 |
+
* Post‑induction chemoradiation CT (age 49) demonstrating reduction to 2.8 cm.
|
| 788 |
+
* Surveillance CTs at ages 50–52 documenting complete response following left upper lobectomy (R0, ypT1bN0).
|
| 789 |
+
|
| 790 |
+
---
|
| 791 |
+
|
| 792 |
+
### FINDINGS
|
| 793 |
+
|
| 794 |
+
#### **Pulmonary Parenchyma**
|
| 795 |
+
- **Right Middle Lobe:** There is a solitary, partially solid pulmonary nodule measuring *approximately 1.2 × 1.0 cm* (long axis × short axis) located centrally within the posterior segment. The lesion demonstrates mixed attenuation with a peripheral ground‑glass halo surrounding a central higher‑density focus suggestive of a solid component comprising roughly 35 % of the total volume. Margins are mildly irregular but there is no definitive spiculation or cavitation. No associated bronchovascular distortion is evident. This represents a newly detected abnormality when compared with the most recent surveillance CT (age 52), where no such nodule was visualized.
|
| 796 |
+
|
| 797 |
+
- **Left Lung:** Status post left upper lobectomy with surgical changes including staple lines and fibrosis along the resection margin. No residual parenchymal masses or consolidations. Scarring appears stable relative to prior examinations.
|
| 798 |
+
|
| 799 |
+
- **Other Lobes:** Right upper and lower lobes, as well as remaining left lung fields, demonstrate normal aeration without additional focal opacities, infiltrates, or cystic change.
|
| 800 |
+
|
| 801 |
+
#### **Mediastinum & Hila**
|
| 802 |
+
- Central airways are patent. No endobronchial lesions identified.
|
| 803 |
+
- Mediastinal fat planes preserved. No enlarged mediastinal, prevascular, or para‑aortic lymph nodes (>1 cm short-axis). Bilateral hila appear unremarkable; no evidence of recurrent hilar adenopathy.
|
| 804 |
+
|
| 805 |
+
#### **Pleural Space**
|
| 806 |
+
- No pleural effusion, thickening, or plaques. Surgical clips noted adjacent to left hemithorax correlating with prior lobectomy.
|
| 807 |
+
|
| 808 |
+
#### **Osseous Structures**
|
| 809 |
+
- Visualized ribs, vertebrae, scapulae, and clavicles show no destructive lesions, cortical disruption, or suspicious sclerosis.
|
| 810 |
+
|
| 811 |
+
#### **Upper Abdomen (included portion)**
|
| 812 |
+
- Upper abdominal cuts reveal normal-sized adrenals without enlargement or heterogeneity. No hepatic or splenic lesions appreciably seen within limited field-of-view.
|
| 813 |
+
|
| 814 |
+
#### **Incidental Findings**
|
| 815 |
+
- Small (<5 mm) calcified granuloma in the right lower lobe, unchanged from prior exams—stable benign appearance.
|
| 816 |
+
- Mildly increased bibasilar subsegmental atelectasis likely secondary to shallow breathing; otherwise lungs clear.
|
| 817 |
+
|
| 818 |
+
---
|
| 819 |
+
|
| 820 |
+
### IMPRESSION
|
| 821 |
+
1. New 1.2 cm partly solid (mixed ground‑glass/solid) nodule in the right middle lobe, absent on prior imaging → concerning for early metastatic recurrence versus second primary lung neoplasm in the setting of known KRAS‑mutant large‑cell carcinoma. Correlation with prior imaging timeline and consideration for tissue sampling recommended.
|
| 822 |
+
|
| 823 |
+
2. Stable post‑lobectomy left lung with no residual or recurrent disease.
|
| 824 |
+
|
| 825 |
+
3. No mediastinal/hilar lymphadenopathy, pleural effusion, or osseous metastases identified.
|
| 826 |
+
|
| 827 |
+
4. Incidental small calcified granuloma unchanged; clinically insignificant.
|
| 828 |
+
|
| 829 |
+
---
|
| 830 |
+
|
| 831 |
+
**Prepared By:** ___________________________________
|
| 832 |
+
Board‑Certified Radiologist, MD
|
| 833 |
+
[Institution Name]
|
| 834 |
+
|
| 835 |
+
*(Signature withheld per documentation policy)* "
|
| 836 |
+
9,74643,2020-01-10,"**Oncology Progress Note – Follow‑Up Visit**
|
| 837 |
+
|
| 838 |
+
---
|
| 839 |
+
|
| 840 |
+
### Patient
|
| 841 |
+
**Name:** David Kim **MRN:** ██████ **DOB:** ████‑██‑██ **Age:** 54 **Sex:** Male
|
| 842 |
+
|
| 843 |
+
### Date of Service
|
| 844 |
+
[Date omitted – to be inserted electronically]
|
| 845 |
+
|
| 846 |
+
---
|
| 847 |
+
|
| 848 |
+
## Chief Complaint
|
| 849 |
+
Routine follow‑up after 8 weeks of sotorasib therapy for recurrent KRAS‑mutant non‑small cell lung cancer (right middle lobe lesion).
|
| 850 |
+
|
| 851 |
+
## History of Present Illness
|
| 852 |
+
David presents today for his scheduled interval assessment following initiation of sotorasib 960 mg PO QD eight weeks ago for a solitary right‑middle‑lobe subsolid nodule (initial size 1.2 cm) detected on surveillance chest CT at age 53. At the start of therapy he experienced Grade 2 watery diarrhea lasting approximately 5 days, managed conservatively with loperamide and fluid replacement, and Grade 1 transaminase elevation (AST 52 U/L, ALT 61 U/L) that peaked at week 4 and subsequently returned to baseline. He denied recurrence of diarrheal symptoms since week 5, reports normal bowel habits, and feels “generally well.” No fevers, chills, cough, dyspnea, chest pain, weight change, or neurologic complaints were noted. His performance status remains ECOG 0.
|
| 853 |
+
|
| 854 |
+
He continues sotorasib uninterrupted and denies missed doses. Current concerns focus on reassurance regarding radiographic response and discussion of continued safety monitoring.
|
| 855 |
+
|
| 856 |
+
## Review of Systems *(pertinent positives & negatives)*
|
| 857 |
+
|
| 858 |
+
| System | Positive / Negative |
|
| 859 |
+
|--------|----------------------|
|
| 860 |
+
| Constitutional | Denies fever, night sweats, recent weight loss or gain |
|
| 861 |
+
| Respiratory | No cough, sputum production, hemoptysis, dyspnea, wheeze |
|
| 862 |
+
| Cardiovascular | No chest pain, palpitations, edema |
|
| 863 |
+
| Gastrointestinal | Diarrhea resolved; occasional mild nausea, no vomiting |
|
| 864 |
+
| Hepatic/Biliary | No abdominal pain, jaundice; liver enzymes normalized |
|
| 865 |
+
| Endocrine | No polyuria/polydipsia; fasting glucose last month 98 mg/dL |
|
| 866 |
+
| Musculoskeletal | No myalgias or arthralgias |
|
| 867 |
+
| Neurologic | No headaches, dizziness, neuropathy |
|
| 868 |
+
| Dermatologic | No rash or skin changes |
|
| 869 |
+
|
| 870 |
+
## Past Medical History
|
| 871 |
+
|
| 872 |
+
* **Non–Small Cell Lung Cancer**, Large‑cell histology, KRAS G12V mutant, Stage IIIA at presentation (left upper lobe) → definitive chemoradiation + lobectomy → disease‐free until year 53 when isolated contralateral subsolid nodule appeared.
|
| 873 |
+
* Hypertension – controlled on lisinopril 20 mg daily.
|
| 874 |
+
* Hyperlipidemia – rosuvastatin 10 mg nightly.
|
| 875 |
+
|
| 876 |
+
## Surgical History
|
| 877 |
+
|
| 878 |
+
* Left upper lobectomy with mediastinal LN dissection (ypT1bN0) – age 49, uncomplicated recovery.
|
| 879 |
+
|
| 880 |
+
## Family History
|
| 881 |
+
|
| 882 |
+
* Father died of myocardial infarction at 68.
|
| 883 |
+
* Mother alive, hypertension.
|
| 884 |
+
* No known familial cancers.
|
| 885 |
+
|
| 886 |
+
## Social History
|
| 887 |
+
|
| 888 |
+
* Former smoker, 15 pack‑years, quit at age 46.
|
| 889 |
+
* Occasional alcohol (≤2 drinks/week).
|
| 890 |
+
* Works as software engineer; sedentary desk work.
|
| 891 |
+
* Lives with spouse; no illicit drugs.
|
| 892 |
+
|
| 893 |
+
## Allergies
|
| 894 |
+
|
| 895 |
+
* NKDA (no known drug allergies)
|
| 896 |
+
|
| 897 |
+
## Medications
|
| 898 |
+
|
| 899 |
+
| Medication | Dose/Frequency | Indication |
|
| 900 |
+
|------------|---------------|-----------|
|
| 901 |
+
| Sotorasib (Lumakras®) | 960 mg PO qd | KRAS‑mutant NSCLC |
|
| 902 |
+
| Lisinopril | 20 mg PO qd | HTN |
|
| 903 |
+
| Rosuvastatin | 10 mg PO hs | Dyslipidemia |
|
| 904 |
+
| Vitamin D₃ | 2000 IU PO qd | Supplement |
|
| 905 |
+
| Multivitamin | Daily | General health |
|
| 906 |
+
|
| 907 |
+
*(Patient uses over‑the‑counter antidiarrheals PRN.)*
|
| 908 |
+
|
| 909 |
+
## Physical Examination
|
| 910 |
+
|
| 911 |
+
| Parameter | Finding |
|
| 912 |
+
|----------|---------|
|
| 913 |
+
| Vital signs | BP 122/78 mmHg, HR 72 bpm, RR 16/min, Temp 36.8°C, SpO₂ 97% RA |
|
| 914 |
+
| General | Well‑appearing, NAD, alert ×3 |
|
| 915 |
+
| HEENT | Normocephalic, mucous membranes moist |
|
| 916 |
+
| Neck | Supple, no cervical adenopathy |
|
| 917 |
+
| Cardiac | Regular rate/rhythm, no murmurs |
|
| 918 |
+
| Pulmonary | Clear breath sounds bilaterally, no rales/wheezes |
|
| 919 |
+
| Abdomen | Soft, nondistended, normoactive BS, no hepatosplenomegaly |
|
| 920 |
+
| Extremities | No edema, pulses intact |
|
| 921 |
+
| Skin | Warm, dry, no rash |
|
| 922 |
+
| Neuro | Grossly intact cranial nerves, strength 5/5 UE/LE, sensation preserved |
|
| 923 |
+
|
| 924 |
+
## Laboratory Data (drawn today)
|
| 925 |
+
|
| 926 |
+
| Test | Result | Reference |
|
| 927 |
+
|------|-------|-----------|
|
| 928 |
+
| CBC w diff | WBC 6.2 ×10⁹/L, Hb 13.8 g/dL, Plt 210 ×10⁹/L | Normal |
|
| 929 |
+
| Comprehensive Metabolic Panel | Na 138, K 4.2, Cl 102, CO₂ 24, BUN 14, Cr 0.92, Glucose 96 mg/dL, AST 31 U/L, ALT 35 U/L, Alk Phos 84 U/L, Total Bilirubin 0.6 mg/dL | Within limits |
|
| 930 |
+
| Lipid panel | LDL 95 mg/dL, HDL 52 mg/dL, TG 115 mg/dL | Stable |
|
| 931 |
+
| Serum CEA | 2.1 ng/mL (previous 2.3 ng/mL) | Low/normal |
|
| 932 |
+
| Urinalysis | Neg for protein/glucose/blood | Unremarkable |
|
| 933 |
+
|
| 934 |
+
All values reflect stability compared with pre‑therapy baselines; hepatic enzymes remain ≤Grade 1.
|
| 935 |
+
|
| 936 |
+
## Imaging Results
|
| 937 |
+
|
| 938 |
+
**Chest CT (non‑contrast, thin slice)** – Performed 8 weeks after starting sotorasib:
|
| 939 |
+
|
| 940 |
+
* Right middle lobe subsolid nodule measured **0.6 cm** (long axis), previously 1.2 cm. Morphologically unchanged except for slight attenuation increase suggestive of fibrosis/reduction in cellularity. No cavitation.
|
| 941 |
+
* No additional pulmonary nodules, consolidations, pleural effusion, or atelectasis.
|
| 942 |
+
* Mediastinum: No enlarged stations; largest short-axis dimension 5 mm (stable vs prior).
|
| 943 |
+
* Upper abdomen visualized incidentally – unremarkable.
|
| 944 |
+
|
| 945 |
+
Interpretation: **Partial response** according to RECIST v1.1 criteria (≥30% reduction in longest diameter). Radiologist comment: “Findings compatible with therapeutic effect; continue close imaging surveillance.”
|
| 946 |
+
|
| 947 |
+
## Assessment
|
| 948 |
+
|
| 949 |
+
1. **Recurrent KRAS‑G12V mutated NSCLC, right middle lobe subsolid nodule** – Currently demonstrating radiographic partial response to sotorasib after 8 weeks of therapy. Clinically asymptomatic, ECOG 0.
|
| 950 |
+
2. **Treatment‑related toxicities** – Prior Grade 2 diarrhea and transient Grade 1 transaminitis have resolved; no active adverse events presently.
|
| 951 |
+
3. **Hypertension, hyperlipidemia** – Controlled on current regimen.
|
| 952 |
+
4. **Overall functional status** – Excellent; maintains regular employment and exercise.
|
| 953 |
+
|
| 954 |
+
## Plan
|
| 955 |
+
|
| 956 |
+
| Item | Details |
|
| 957 |
+
|------|---------|
|
| 958 |
+
| **Continue sotorasib** | Maintain Lumakras® 960 mg PO daily. Emphasize adherence; advise patient to contact office promptly if ≥Grade 2 GI upset, persistent elevated LFTs, or unexplained fatigue occurs. |
|
| 959 |
+
| **Safety Monitoring** | • CBC, CMP, fasting glucose at next monthly visit.<br>• Liver function tests repeat in 4 weeks (already within Grade 1). |
|
| 960 |
+
| **Imaging Surveillance** | Schedule contrast‑enhanced chest CT in 8 weeks (approximately 16 weeks total on therapy) to confirm durability of response; thereafter every 12 weeks while on sotorasib unless progression suspected clinically. |
|
| 961 |
+
| **Supportive Care** | Continue OTC loperamide PRN for breakthrough loose stools (<3 days). Encourage adequate hydration and balanced diet. |
|
| 962 |
+
| **Vaccinations** | Ensure up‑to‑date influenza vaccine (administered annually) and COVID‑19 booster per CDC schedule; no live vaccines while on targeted agent. |
|
| 963 |
+
| **Lifestyle Counseling** | Reinforce smoking abstinence, maintain moderate aerobic activity (150 min/week), limit alcohol intake. |
|
| 964 |
+
| **Follow‑up Appointment** | Return in 4 weeks for laboratory review and symptom check; sooner if acute issues arise. |
|
| 965 |
+
| **Documentation** | Update electronic problem list to reflect “NSCLC – KRAS G12V – Partial Response to sotorasib” and record current toxicity profile as *Resolved*. |
|
| 966 |
+
|
| 967 |
+
**Signature:** ___________________________
|
| 968 |
+
Dr. _________, MD — Medical Oncology
|
| 969 |
+
(Phone/EHR signature block) "
|
| 970 |
+
10,74643,2020-01-11,"**Imaging Report – Positron Emission Tomography / Computed Tomography (PET/CT)**
|
| 971 |
+
|
| 972 |
+
**Patient:** David Kim **Sex/Age:** Male, 55 years
|
| 973 |
+
**Referring Physician:** Medical Oncology
|
| 974 |
+
**Study Date:** [Date omitted per protocol]
|
| 975 |
+
**Indication:** Surveillance of previously treated right‑middle‑lobe pulmonary nodule (KRAS‑mutant non‑small‑cell lung cancer) following response to sotorasib therapy.
|
| 976 |
+
|
| 977 |
+
---
|
| 978 |
+
|
| 979 |
+
### Technique
|
| 980 |
+
Whole‑body ^18F‑FDG PET acquired 60 minutes after intravenous administration of 370 MBq (10 mCi) FDG. Low‑dose helical CT obtained for attenuation correction and anatomical correlation (120 kVp, automated mA modulation, slice thickness 3 mm). Images reconstructed with standard OSEM algorithm; axial, coronal, and sagittal fused datasets reviewed. Prior PET/CT examinations dated at ages 54 and 52 were available for comparative evaluation.
|
| 981 |
+
|
| 982 |
+
---
|
| 983 |
+
|
| 984 |
+
### Findings
|
| 985 |
+
|
| 986 |
+
#### Thorax
|
| 987 |
+
- **Right Middle Lobe Pulmonary Nodule:** Subsolid nodule measuring 0.6 × 0.5 cm (previously 0.6 cm on the most recent scan). Mild FDG avidity observed with maximum standardized uptake value (**SUVmax ≈ 2.3**, compared with SUVmax ≈ 2.5 on the preceding examination). Morphologically unchanged contour, without spiculated margins or cavitation. No associated bronchial obstruction or atelectasis.
|
| 988 |
+
- **Left Lung & Mediastinum:** Post‑lobectomy changes in the left upper lobe with complete fibrosis; no residual soft tissue abnormality. No enlarged mediastinal or hilar lymph nodes; all stations ≤ 6 mm in short axis, physiologic FDG activity.
|
| 989 |
+
- **Pleura & Chest Wall:** Unremarkable; no pleural effusion or thickening.
|
| 990 |
+
|
| 991 |
+
#### Upper Abdomen
|
| 992 |
+
- Liver, spleen, adrenal glands, pancreas unremarkable; physiological hepatic and splenic FDG distribution. No focal metabolic abnormalities detected.
|
| 993 |
+
|
| 994 |
+
#### Pelvis & Lower Extremities
|
| 995 |
+
- Normal marrow signal intensity throughout vertebral bodies and proximal femora; expected heterogeneous uptake related to age. No osseous lesions identified.
|
| 996 |
+
|
| 997 |
+
#### Additional Observations
|
| 998 |
+
- No evidence of metabolically active metastatic disease in bone, brain (limited CT component), or soft tissues.
|
| 999 |
+
- Physiological urinary excretion of tracer noted.
|
| 1000 |
+
|
| 1001 |
+
---
|
| 1002 |
+
|
| 1003 |
+
### Comparison
|
| 1004 |
+
Findings are essentially unchanged relative to the PET/CT performed eight weeks earlier (age 54): the right‑mid‑lung nodule remains stable in size and demonstrates persistently low‑grade FDG uptake. There has been no emergence of new FDG‑avid foci elsewhere in the body since baseline staging at age 47.
|
| 1005 |
+
|
| 1006 |
+
---
|
| 1007 |
+
|
| 1008 |
+
### Impression
|
| 1009 |
+
1. Stable low‑grade FDG uptake in the right middle‑lobe subsolid nodule (0.6 cm, SUVmax ~ 2.3). Imaging appearance compatible with indeterminate persistent disease versus benign inflammatory change; however, stability over successive scans favors a controlled neoplastic process under current KRAS‑directed therapy (sotorasib).
|
| 1010 |
+
2. No new FDG‑avid lesions suggestive of locoregional progression or distant metastasis.
|
| 1011 |
+
3. Overall disease burden unchanged; continued close radiographic surveillance recommended in conjunction with ongoing systemic therapy.
|
| 1012 |
+
|
| 1013 |
+
*Prepared by:* _______________________
|
| 1014 |
+
Board‑Certified Radiologist, MD
|
| 1015 |
+
Nuclear Medicine/Pet Imaging Division "
|
| 1016 |
+
11,74643,2020-01-12,"**Oncology Progress Note – Follow‑Up Visit**
|
| 1017 |
+
|
| 1018 |
+
---
|
| 1019 |
+
|
| 1020 |
+
### Patient Information
|
| 1021 |
+
**Name:** David Kim **MRN:** [redacted] **DOB:** [redacted] **Age:** 56 y/o **Sex:** Male
|
| 1022 |
+
|
| 1023 |
+
---
|
| 1024 |
+
|
| 1025 |
+
## Chief Complaint
|
| 1026 |
+
“Feeling a little more tired than usual over the last few weeks.”
|
| 1027 |
+
|
| 1028 |
+
---
|
| 1029 |
+
|
| 1030 |
+
## History of Present Illness (HPI)
|
| 1031 |
+
David is a 56‑year‑old man who presents for routine follow‑up while on **sotorasib 960 mg PO daily**, started 13 months ago for recurrent KRAS G12V‑mutant non‑small cell lung cancer (NSCLC) involving a right middle‑lobe subsolid nodule. Since initiating therapy he has tolerated treatment reasonably well aside from transient Grade 2 diarrhea and Grade 1 transaminitis early in the course, both of which resolved without dose modification.
|
| 1032 |
+
|
| 1033 |
+
Over the preceding month he has noticed *mild generalized fatigue* that does not limit his activities; he continues to perform ADLs independently, walks ≥30 minutes daily, and remains employed full‑time as a software engineer (ECOG performance status = 0). No associated dyspnea, cough, fever, weight change, night sweats, abdominal pain, nausea/vomiting, melena, hematochezia, dysuria, joint pains, rash, or neurologic symptoms. Denies new medication changes, alcohol intake increase, or travel exposures.
|
| 1034 |
+
|
| 1035 |
+
He adheres to sotorasib dosing schedule, takes occasional loperamide PRN for loose stools (last used >4 weeks ago), and monitors blood sugars per home glucometer (most readings 90–110 mg/dL). No episodes of hypoglycemia.
|
| 1036 |
+
|
| 1037 |
+
Overall impression: Stable disease on systemic therapy with emerging fatigue likely multifactorial—possible cumulative effect of KRAS inhibitor, subtle anemia of chronic disease, or lifestyle stressors. No evidence of acute toxicity warranting interruption.
|
| 1038 |
+
|
| 1039 |
+
---
|
| 1040 |
+
|
| 1041 |
+
## Review of Systems (ROS)
|
| 1042 |
+
| System | Positive / Negative |
|
| 1043 |
+
|--------|---------------------|
|
| 1044 |
+
| Constitutional | +Fatigue (mild); -Weight loss, -Fever, -Chills |
|
| 1045 |
+
| HEENT | -Headache, -Vision changes, -Sore throat |
|
| 1046 |
+
| Cardiovascular | -Chest pain, -Palpitations, -Edema |
|
| 1047 |
+
| Respiratory | -Dyspnea, -Cough, -Hemoptysis |
|
| 1048 |
+
| Gastrointestinal | -Nausea/Vomiting, -Diarrhea (resolved), -Abdominal pain |
|
| 1049 |
+
| Genitourinary | -Dysuria, -Frequency, -No polyuria |
|
| 1050 |
+
| Musculoskeletal | -Myalgias, -Arthralgias |
|
| 1051 |
+
| Neurologic | -Weakness, -Sensory deficits |
|
| 1052 |
+
| Dermatologic | -Rash, -Pruritus |
|
| 1053 |
+
| Endocrine | -Polyphagia, -Polydipsia (none reported) |
|
| 1054 |
+
| Hematologic/Lymphatic | -Bleeding/bruising |
|
| 1055 |
+
|
| 1056 |
+
---
|
| 1057 |
+
|
| 1058 |
+
## Past Medical History (PMHx)
|
| 1059 |
+
- **Stage IIIA Large Cell Carcinoma of Left Upper Lobe** (diagnosed age 47) → neoadjuvant carbo/Taxol + 60 Gy RT → left upper lobectomy (ypT1bN0, R0) – completed curative intent, observed thereafter.
|
| 1060 |
+
- **Recurrent KRAS G12V‑mutant NSCLC** (right middle‑lobe subsolid nodule detected age 53) → treated with sotorasib since age 53.
|
| 1061 |
+
- Hypertension (well controlled on lisinopril 20 mg QD).
|
| 1062 |
+
- Hyperlipidemia (atorvastatin 40 mg nightly).
|
| 1063 |
+
- Seasonal allergic rhinitis.
|
| 1064 |
+
|
| 1065 |
+
---
|
| 1066 |
+
|
| 1067 |
+
## Surgical & Procedural History
|
| 1068 |
+
- Right video‑assisted thoracoscopic wedge resection (biopsy) of right middle‑lobe nodule – pathology confirming persistent large‑cell histology with KRAS G12V, STK11 loss, KEAP1 mutation, TP53 Y220C; PD‑L1 10 %.
|
| 1069 |
+
- Left upper lobectomy with mediastinal LN dissection (stations 5, 6, 7, 10L) – R0, 0/12 nodes involved.
|
| 1070 |
+
|
| 1071 |
+
---
|
| 1072 |
+
|
| 1073 |
+
## Social History (SH)
|
| 1074 |
+
- Former smoker: 15 pack‑years, quit at age 46 (quit year coincident with initial diagnosis).
|
| 1075 |
+
- Alcohol: Occasional wine (<2 drinks/week).
|
| 1076 |
+
- Occupation: Software developer, sedentary work environment.
|
| 1077 |
+
- Living situation: Lives with spouse; independent ADLs.
|
| 1078 |
+
- Exercise: Regular brisk walking 3–4 times/week.
|
| 1079 |
+
|
| 1080 |
+
---
|
| 1081 |
+
|
| 1082 |
+
## Family History (FH)
|
| 1083 |
+
- Father deceased at 68 from myocardial infarction (no known malignancy).
|
| 1084 |
+
- Mother alive, 80, hypertension.
|
| 1085 |
+
- Siblings healthy.
|
| 1086 |
+
- No familial cancers reported.
|
| 1087 |
+
|
| 1088 |
+
---
|
| 1089 |
+
|
| 1090 |
+
## Allergies
|
| 1091 |
+
- NKDA (No Known Drug Allergies).
|
| 1092 |
+
|
| 1093 |
+
---
|
| 1094 |
+
|
| 1095 |
+
## Current Medications
|
| 1096 |
+
| Medication | Dose | Frequency | Indication |
|
| 1097 |
+
|------------|------|-----------|-------------|
|
| 1098 |
+
| Sotorasib | 960 mg | PO daily | KRAS‑mutant NSCLC |
|
| 1099 |
+
| Lisinopril | 20 mg | PO daily | HTN |
|
| 1100 |
+
| Atorvastatin | 40 mg | PO nightly | Dyslipidemia |
|
| 1101 |
+
| Vitamin D₃ | 2000 IU | PO daily | Supplement |
|
| 1102 |
+
| Acetaminophen PRN | 500 mg | q6h prn pain/fever | Symptom control |
|
| 1103 |
+
| Loperamide PRN | 2 mg | PO up to 4 tabs/day PRN diarrhea | Diarrhea management |
|
| 1104 |
+
|
| 1105 |
+
---
|
| 1106 |
+
|
| 1107 |
+
## Physical Examination
|
| 1108 |
+
**Vital Signs:** BP 122/78 mmHg, HR 72 bpm regular, RR 16/min, Temp 36.8°C, SpO₂ 98% RA, Weight 84 kg (BMI 27 kg/m²).
|
| 1109 |
+
|
| 1110 |
+
**General:** Alert, oriented ×3, NAD, appears well‑nutrient.
|
| 1111 |
+
|
| 1112 |
+
**HEENT:** Normocephalic, atraumatic; mucous membranes moist; no cervical adenopathy.
|
| 1113 |
+
|
| 1114 |
+
**Neck:** Supple, trachea midline, no JVD.
|
| 1115 |
+
|
| 1116 |
+
**Cardiovascular:** RRR, no murmurs, rubs, gallops. Peripheral pulses intact bilaterally.
|
| 1117 |
+
|
| 1118 |
+
**Respiratory:** Clear breath sounds bilaterally, no wheezes/rhonchi, good expansion.
|
| 1119 |
+
|
| 1120 |
+
**GI:** Soft, nondistended, normoactive bowel sounds, no tenderness.
|
| 1121 |
+
|
| 1122 |
+
**Extremities:** No edema, clubbing absent.
|
| 1123 |
+
|
| 1124 |
+
**Skin:** Warm, dry, no rashes or lesions.
|
| 1125 |
+
|
| 1126 |
+
**Neurological:** Grossly intact cranial nerves II–XII, strength 5/5 UE/LE, sensation preserved, gait steady.
|
| 1127 |
+
|
| 1128 |
+
---
|
| 1129 |
+
|
| 1130 |
+
## Laboratory Data *(drawn today)*
|
| 1131 |
+
|
| 1132 |
+
| Test | Result | Reference Range | Comment |
|
| 1133 |
+
|------|--------|-----------------|---------|
|
| 1134 |
+
| CBC w/ diff | WBC 6.2 ×10⁹/L ; Hb 13.2 g/dL ; Hct 39%; Plts 210 ×10⁹/L | WBC 4‑10 ; Hb 13‑17 ; Plts 150‑400 | Mild dip in hemoglobin compared to baseline (previous 13.8 g/dL) — may reflect chronic disease/fatigue. |
|
| 1135 |
+
| Comprehensive Metabolic Panel | Na 138 mmol/L ; K 4.2 ; Cl 102 ; CO₂ 24 ; BUN 18 ; Cr 0.92 ; Glucose 96 mg/dL ; AST 32 IU/L ; ALT 38 IU/L ; Alk Phos 85 IU/L ; Total Bilirubin 0.6 mg/dL | Within limits | Slight elevation of transaminases versus pre‑therapy baseline (AST/ALT ≤25 IU/L) – stable, no intervention needed. |
|
| 1136 |
+
| Lipid panel | LDL 112 mg/dL ; HDL 44 mg/dL ; TG 135 mg/dL | Target LDL <130 | Unchanged. |
|
| 1137 |
+
| Thyroid Stimulating Hormone | 2.1 µU/mL | 0.4‑4.0 | Normal. |
|
| 1138 |
+
| Fasting Insulin | 8 µU/mL | 2‑25 | Normal. |
|
| 1139 |
+
| CEA* | 2.1 ng/mL | ≤5.0 | Low, stable trend. |
|
| 1140 |
+
| CA19‑9 | 12 U/mL | ≤37 | Within range. |
|
| 1141 |
+
|
| 1142 |
+
\*Tumor markers drawn for longitudinal tracking; trends remain flat.
|
| 1143 |
+
|
| 1144 |
+
---
|
| 1145 |
+
|
| 1146 |
+
## Radiology Summary
|
| 1147 |
+
|
| 1148 |
+
**Most Recent Chest CT (performed 4 weeks ago):** Thin‑slice axial images demonstrate persistence of a solitary right middle‑lobe subsolid nodule measuring **0.62 cm x 0.58 cm** (stable compared with prior measurement of 0.61 cm). No interval development of additional pulmonary nodules, consolidations, pleural effusion, or mediastinal/hilar lymphadenopathy. Lung parenchymal architecture otherwise unremarkable.
|
| 1149 |
+
|
| 1150 |
+
**Prior PET/CT (age 55, 6 months earlier):** Showed low‑grade FDG avidity within the aforementioned nodule (SUVmax ≈2.1) with no new foci elsewhere. Correlates with indolent behavior under sotorasib.
|
| 1151 |
+
|
| 1152 |
+
Interpretation: Disease remains radiographically stable (RECIST criteria: non‑progressive).
|
| 1153 |
+
|
| 1154 |
+
---
|
| 1155 |
+
|
| 1156 |
+
## Assessment
|
| 1157 |
+
|
| 1158 |
+
1. **KRAS G12V‑mutant NSCLC, recurrent, right middle‑lobe subsolid nodule – stable disease** on sotorasib therapy.
|
| 1159 |
+
- Molecular profile includes co‑alterations (STK11 loss, KEAP1 mutation, TP53 Y220C) which confer modest resistance risk but response thus far sustained.
|
| 1160 |
+
- Current imaging demonstrates stability; laboratory parameters acceptable apart from minimal transaminase rise and borderline drop in hemoglobin.
|
| 1161 |
+
|
| 1162 |
+
2. **Treatment‑related fatigue (Grade 1)** – likely multifactorial (chronic therapy exposure, mild anemia, psychosocial factors).
|
| 1163 |
+
|
| 1164 |
+
3. **Hypertension, hyperlipidemia** – controlled on current regimen.
|
| 1165 |
+
|
| 1166 |
+
4. **Preventive health:** Up‑to‑date immunizations (influenza annually, COVID‑19 booster series).
|
| 1167 |
+
|
| 1168 |
+
---
|
| 1169 |
+
|
| 1170 |
+
## Plan
|
| 1171 |
+
|
| 1172 |
+
| Item | Details |
|
| 1173 |
+
|------|----------|
|
| 1174 |
+
| **Continue sotorasib** | Maintain 960 mg PO daily. Reinforce adherence. No dose reduction required at this time. |
|
| 1175 |
+
| **Monitor labs** | Repeat CBC/CMP in 4 weeks; specifically watch hemoglobin trending and hepatic enzymes. Consider iron studies if Hb falls below 12 g/dL. |
|
| 1176 |
+
| **Manage fatigue** | • Counsel regarding sleep hygiene, balanced nutrition, moderate aerobic activity.<br>• Offer trial of vitamin B12 level check; supplement if deficient.<br>• Discuss possibility of low‑dose methylphenidate if fatigue becomes disabling (shared decision). |
|
| 1177 |
+
| **Supportive care** | Keep loperamide available PRN; educate patient to contact office if ≥3 watery stools/day lasting >48 hrs. |
|
| 1178 |
+
| **Imaging follow‑up** | Schedule thin‑slice chest CT in 12 weeks (±2 wks) to reassess nodule size; interim PET/CT reserved for symptomatic progression or significant growth (>20%). |
|
| 1179 |
+
| **Vaccinations** | Verify influenza vaccine administered; arrange COVID‑19 booster per CDC guidance. |
|
| 1180 |
+
| **Comorbidity optimization** | Blood pressure target <130/80 mmHg – continue lisinopril; lipid goal LDL <70 mg/dL – discuss intensifying statin if ASCVD risk escalates. |
|
| 1181 |
+
| **Referral** | None presently; consider referral to cardio‑oncology if hypertensive burden increases secondary to sotorasib (rare). |
|
| 1182 |
+
| **Next appointment** | Return in 4 weeks for labs review; sooner if new symptoms develop (e.g., worsening fatigue, GI upset, jaundice, shortness of breath). |
|
| 1183 |
+
| **Documentation** | Updated problem list entered into EMR; consent reaffirmed for continued off‑label KRAS inhibition. |
|
| 1184 |
+
|
| 1185 |
+
**Signature:** ___________________________
|
| 1186 |
+
Dr. ***[Oncologist Name], MD***
|
| 1187 |
+
Medical Oncology Division
|
| 1188 |
+
Date: _____________
|
| 1189 |
+
|
| 1190 |
+
---
|
| 1191 |
+
|
| 1192 |
+
*End of Document*"
|
| 1193 |
+
12,74643,2020-01-13,"**Oncology Follow‑Up Visit – Progress Note**
|
| 1194 |
+
|
| 1195 |
+
**Patient:** David Kim **MRN:** ██████ **DOB:** ███─██─██ **Age:** 58 y **Sex:** Male
|
| 1196 |
+
**Date of Service:** [to be inserted]
|
| 1197 |
+
|
| 1198 |
+
---
|
| 1199 |
+
|
| 1200 |
+
### Chief Complaint
|
| 1201 |
+
“Just here for my scheduled check‑up.” No acute complaints.
|
| 1202 |
+
|
| 1203 |
+
---
|
| 1204 |
+
|
| 1205 |
+
### History of Present Illness (HPI)
|
| 1206 |
+
|
| 1207 |
+
David is a 58‑year‑old man with a remote history of Stage IIIA left upper‐lobe large‑cell carcinoma (treated with definitive CRT → lobectomy → observation) who subsequently developed a solitary right‑middle‑lobe subsolid nodule at age 53. Molecular profiling disclosed KRAS G12V, prompting initiation of sotorasib 960 mg PO daily (Lumakras®) at age 53. He has been on continuous therapy for **≈5 years**, achieving radiographic partial response (nodule reduced from 1.2 cm → 0.6 cm) and thereafter maintaining stable disease over multiple assessments.
|
| 1208 |
+
|
| 1209 |
+
Therapy‑related toxicities have been modest:
|
| 1210 |
+
|
| 1211 |
+
* Grade 2 watery diarrhea (managed with dietary modification & loperamide PRN) – resolved by month 3 of therapy.
|
| 1212 |
+
* Grade 1 transaminase elevation (AST/ALT ≤2× ULN) – remained static, monitored quarterly.
|
| 1213 |
+
* New onset Grade 3 hyperglycemia at age 57 necessitating initiation of basal–bolus insulin (glargine 20 U nightly, lispro sliding scale).
|
| 1214 |
+
|
| 1215 |
+
He reports **mild intermittent fatigue** (no impact on ADLs), **well‑controlled blood sugars** on his insulin regimen (fasting BG 110‑130 mg/dL), **no abdominal pain**, **no dysphagia**, and **no dyspnea**. Denies fevers, chills, night sweats, weight change (>2 kg), cough, hemoptysis, or neurologic symptoms.
|
| 1216 |
+
|
| 1217 |
+
Overall functional status remains excellent (ECOG 0). He continues to work full‑time as a software engineer and engages in regular aerobic exercise (≥150 min/week).
|
| 1218 |
+
|
| 1219 |
+
---
|
| 1220 |
+
|
| 1221 |
+
### Review of Systems (ROS)
|
| 1222 |
+
| System | Positive / Negative |
|
| 1223 |
+
|--------|----------------------|
|
| 1224 |
+
| Constitutional | Fatigue (mild); denies fever, chills, weight loss |
|
| 1225 |
+
| HEENT | No headache, visual changes, sore throat |
|
| 1226 |
+
| Cardiovascular | No chest pain, palpitations, edema |
|
| 1227 |
+
| Respiratory | No cough, sputum, dyspnea, wheeze |
|
| 1228 |
+
| Gastrointestinal | Occasional loose stools (<2/day); no nausea/vomiting, melena, hematochezia |
|
| 1229 |
+
| Genitourinary | No dysuria, frequency, hematuria |
|
| 1230 |
+
| Musculoskeletal | No bone pain, arthralgia |
|
| 1231 |
+
| Neurologic | No weakness, paresthesias, seizures |
|
| 1232 |
+
| Dermatologic | No rash, pruritus |
|
| 1233 |
+
| Endocrine | Hyperglycemia controlled on insulin; no polyuria/polydipsia beyond baseline |
|
| 1234 |
+
|
| 1235 |
+
---
|
| 1236 |
+
|
| 1237 |
+
### Physical Examination
|
| 1238 |
+
- **General:** Well‑appearing, NAD, alert, oriented ×3, BMI 27 kg/m².
|
| 1239 |
+
- **Vital Signs:** BP 128/78 mmHg, HR 72 bpm, RR 16/min, SpO₂ 98% RA, Temp 36.8°C.
|
| 1240 |
+
- **HEENT:** Normocephalic, atraumatic, mucous membranes moist.
|
| 1241 |
+
- **Neck:** Supple, no cervical adenopathy.
|
| 1242 |
+
- **Cardiovascular:** Regular rate/rhythm, S₁S₂ normal, no murmurs.
|
| 1243 |
+
- **Respiratory:** Clear breath sounds bilaterally, no rales/wheezes, good air entry.
|
| 1244 |
+
- **Abdomen:** Soft, non‑distended, normoactive bowel sounds, no hepatosplenomegaly, no tenderness.
|
| 1245 |
+
- **Extremities:** No clubbing, cyanosis, edema. Peripheral pulses intact.
|
| 1246 |
+
- **Skin:** Warm, dry, no lesions.
|
| 1247 |
+
- **Neurological:** Grossly intact cranial nerves II‑XII, strength 5/5 UE/LE, sensation preserved, gait steady.
|
| 1248 |
+
|
| 1249 |
+
---
|
| 1250 |
+
|
| 1251 |
+
### Laboratory Results *(drawn today)*
|
| 1252 |
+
|
| 1253 |
+
| Test | Result | Reference |
|
| 1254 |
+
|------|--------|-----------|
|
| 1255 |
+
| CBC w diff | WBC 6.2 (4‑10) k/µL, ANC 3.4 (1.5‑7.5), Hb 13.8 (g/dL), Plts 210 (k/µL) | Normal |
|
| 1256 |
+
| Comprehensive Metabolic Panel | Na⁺ 138 (mEq/L), K⁺ 4.2, Cl⁻ 102, CO₂ 24, BUN 15, Cr 0.92, Glucose 118 mg/dL (random), AST 38 (U/L) ↑, ALT 42 (U/L) ↑, Alk Phos 84, Total Bilirubin 0.6 | Transaminases ≤2×ULN |
|
| 1257 |
+
| Lipid panel | LDL 112 mg/dL, HDL 46 mg/dL, TG 140 mg/dL | — |
|
| 1258 |
+
| Hemoglobin A1c | 7.2 % (target <7%) | Slightly above goal |
|
| 1259 |
+
| Serum CEA | 2.1 ng/mL (≤5) | Within normal limits |
|
| 1260 |
+
| Urinalysis | Neg for protein/glucose/blood | — |
|
| 1261 |
+
|
| 1262 |
+
*(Prior trend: transaminases plateaued at ≈40 U/L for >6 months; glycemic control improved after insulin titration.)*
|
| 1263 |
+
|
| 1264 |
+
---
|
| 1265 |
+
|
| 1266 |
+
### Imaging Results
|
| 1267 |
+
|
| 1268 |
+
**Chest CT (contrast‑enhanced)** – Performed 4 weeks ago, reviewed today.
|
| 1269 |
+
|
| 1270 |
+
- Right middle lobe: Subsolid nodule measuring **0.62 cm x 0.58 cm**, unchanged compared with prior study (0.61 cm). No solid component, margins smooth, no cavitation.
|
| 1271 |
+
- Left lung fields: Post‑lobectomy changes without evidence of recurrence.
|
| 1272 |
+
- Mediastinum/hilum: No enlarged lymph nodes; stations 5,6,7,10L surgically cleared previously remain negative.
|
| 1273 |
+
- Osseous structures: No destructive lesions.
|
| 1274 |
+
|
| 1275 |
+
Interpretation: Radiographically stable disease per RECIST v1.1 criteria (non‑progressive, non‑regressive). No new pulmonary or extrapulmonary foci.
|
| 1276 |
+
|
| 1277 |
+
---
|
| 1278 |
+
|
| 1279 |
+
### Assessment
|
| 1280 |
+
|
| 1281 |
+
1. **KRAS‑mutant NSCLC – metastatic/recurrent disease, currently stable on sotorasib 960 mg QD**
|
| 1282 |
+
- Ongoing durable disease control ≥5 yr; no radiographic progression.
|
| 1283 |
+
2. **Treatment‑related toxicities**
|
| 1284 |
+
- *Grade 1 transaminitis*: Stable, asymptomatic.
|
| 1285 |
+
- *Diarrhea*: Resolved; occasional mild episodes managed with OTC loperamide.
|
| 1286 |
+
- *Hyperglycemia (insulin-dependent Type 2 DM secondary to KRAS inhibitor)* – Controlled on basal‑bolus regimen; A1c mildly elevated.
|
| 1287 |
+
3. **Fatigue** – Likely multifactorial (therapy, sleep hygiene); otherwise benign.
|
| 1288 |
+
4. **Performance Status** – ECOG 0, fully active.
|
| 1289 |
+
|
| 1290 |
+
---
|
| 1291 |
+
|
| 1292 |
+
### Plan
|
| 1293 |
+
|
| 1294 |
+
| Item | Details |
|
| 1295 |
+
|------|---------|
|
| 1296 |
+
| **Continue sotorasib** | Maintain Lumakras® 960 mg PO daily. Reassess tolerance at next visit. Consider dose hold only if hepatic enzymes rise >3×ULN or symptomatic worsening. |
|
| 1297 |
+
| **Laboratory Monitoring** | Repeat CBC/CMP including liver function tests in 6 weeks; repeat fasting glucose/HgbA1c in 3 months. |
|
| 1298 |
+
| **Insulin Management** | Current regimen (Glargine 20 U HS, Lispro sliding scale) appears adequate. Encourage SMBG q·day, diet counseling, and annual ophthalmology screening. Adjust doses pending upcoming A1c. |
|
| 1299 |
+
| **GI Symptom Control** | Keep loperamide 2 mg PRN up to 8 mg/day; advise hydration and fiber intake. |
|
| 1300 |
+
| **Imaging Surveillance** | Schedule thin‑slice Chest CT in 12 weeks (±2 wk) per NCCN recommendation for KRAS‑directed therapy. Should there be any symptom suggestive of progression, obtain sooner. |
|
| 1301 |
+
| **Supportive Care** | Discuss potential enrollment in a prospective registry for long‑term KRAS inhibitors. Offer referral to survivorship clinic for lifestyle optimization (exercise, nutrition). |
|
| 1302 |
+
| **Vaccinations** | Ensure influenza vaccine annually; COVID‑19 booster updated; pneumococcal vaccination per CDC guidelines (PCV20 then PPSV23). |
|
| 1303 |
+
| **Follow‑up Appointment** | Return in 3 months for office evaluation, sooner if new symptoms develop. |
|
| 1304 |
+
| **Documentation** | Update problem list, medication reconciliation, and provide copy of latest imaging to patient portal. |
|
| 1305 |
+
|
| 1306 |
+
**Signature:** ___________________________
|
| 1307 |
+
Dr. ____________, MD – Medical Oncology
|
| 1308 |
+
[Institution Name]
|
| 1309 |
+
|
| 1310 |
+
---
|
| 1311 |
+
|
| 1312 |
+
*Prepared electronically; dictation verified.*"
|