st192011 commited on
Commit
531b9e2
·
verified ·
1 Parent(s): ffbc724

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +41 -11
app.py CHANGED
@@ -203,8 +203,6 @@ report_md = """
203
  # 🏛️ The Janus Interface: Research & Technical Analysis
204
  **Project Status:** Research Prototype v2.0 (Gold Standard)
205
 
206
- ---
207
-
208
  ### 1. Research Motivation: The Privacy-Utility Paradox
209
  In regulated domains (Healthcare, Legal, Finance), Generative AI adoption is stalled by a fundamental conflict:
210
  * **Utility:** Large Cloud Models (GPT-4, Claude) offer superior reasoning but require sending data off-premise.
@@ -227,11 +225,8 @@ The system utilizes a **Multi-Task Adapter** trained to switch between two disti
227
  * **Function:** A secure, offline engine that accepts the JanusScript and a Local SQL Database record.
228
  * **Output:** It merges the abstract logic with the concrete identity to generate the final, human-readable document.
229
 
230
- ---
231
-
232
  ### 3. Data Engineering: The "Gold Standard" Pipeline
233
  To achieve high fidelity without using private patient data, we developed a **Synthesized Data Pipeline**:
234
-
235
  1. **Synthesis:** We generated **306 high-quality clinical scenarios** using Large Language Models (LLMs).
236
  2. **Alignment:** Unlike previous iterations where headers were random, this dataset ensured strict mathematical alignment between the Identity Header (Age/Sex) and the Clinical Narrative.
237
  3. **Result:** This eliminated the "hallucination" issues seen in earlier tests where the model would confuse patient gender or age due to conflicting training signals.
@@ -240,15 +235,50 @@ To achieve high fidelity without using private patient data, we developed a **Sy
240
  * **Base Model:** Microsoft Phi-3.5-mini-instruct (3.8B Parameters).
241
  * **Framework:** **Unsloth** (Optimized QLoRA).
242
  * **Technique:** **DoRA (Weight-Decomposed Low-Rank Adaptation)**.
243
- * *Why DoRA?* Standard LoRA struggles with strict syntax/coding tasks. DoRA updates both magnitude and direction vectors, allowing the model to learn the strict `JanusScript` grammar effectively.
244
  * **Loss Masking:** We used `train_on_responses_only`. The model was **never** trained on the input text, only on the output. This prevents the model from memorizing patient PII from the training set.
245
- * **Hyperparameters:** Rank 16, Alpha 16, Learning Rate 2e-4, **2 Epochs** (approx 78 steps used for final checkpoint).
246
-
247
- ### 5. Results & Conclusion
248
- * **Zero-Trust Validation:** The "Vault" successfully reconstructs documents using *only* the database for identity.
249
- * **Semantic Expansion:** The model demonstrates the ability to take a concise code (`Dx(Pneumonia)`) and expand it into fluent medical narrative ("Patient presented with symptoms consistent with Pneumonia...").
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
250
  """
251
 
 
252
  # ==============================================================================
253
  # 6. LAUNCHER
254
  # ==============================================================================
 
203
  # 🏛️ The Janus Interface: Research & Technical Analysis
204
  **Project Status:** Research Prototype v2.0 (Gold Standard)
205
 
 
 
206
  ### 1. Research Motivation: The Privacy-Utility Paradox
207
  In regulated domains (Healthcare, Legal, Finance), Generative AI adoption is stalled by a fundamental conflict:
208
  * **Utility:** Large Cloud Models (GPT-4, Claude) offer superior reasoning but require sending data off-premise.
 
225
  * **Function:** A secure, offline engine that accepts the JanusScript and a Local SQL Database record.
226
  * **Output:** It merges the abstract logic with the concrete identity to generate the final, human-readable document.
227
 
 
 
228
  ### 3. Data Engineering: The "Gold Standard" Pipeline
229
  To achieve high fidelity without using private patient data, we developed a **Synthesized Data Pipeline**:
 
230
  1. **Synthesis:** We generated **306 high-quality clinical scenarios** using Large Language Models (LLMs).
231
  2. **Alignment:** Unlike previous iterations where headers were random, this dataset ensured strict mathematical alignment between the Identity Header (Age/Sex) and the Clinical Narrative.
232
  3. **Result:** This eliminated the "hallucination" issues seen in earlier tests where the model would confuse patient gender or age due to conflicting training signals.
 
235
  * **Base Model:** Microsoft Phi-3.5-mini-instruct (3.8B Parameters).
236
  * **Framework:** **Unsloth** (Optimized QLoRA).
237
  * **Technique:** **DoRA (Weight-Decomposed Low-Rank Adaptation)**.
 
238
  * **Loss Masking:** We used `train_on_responses_only`. The model was **never** trained on the input text, only on the output. This prevents the model from memorizing patient PII from the training set.
239
+ * **Hyperparameters:** Rank 16, Alpha 16, Learning Rate 2e-4, 2 Epochs (approx 78 steps used for final checkpoint).
240
+
241
+ ### 5. Validated Output Examples
242
+ *Due to hardware constraints on the Free Tier CPU, live inference may be slow. Below are validated outputs from the model running on T4 GPU.*
243
+
244
+ #### **Phase 1: The Scout (Logic Extraction)**
245
+ *Input (Raw Note):* "Pt admitted for appendicitis. 45M. CT showed inflamed appendix. Taken to OR for Lap Appy. Uncomplicated. Discharged home on Percocet."
246
+ *Output (JanusScript):*
247
+ ```javascript
248
+ Hx(Appendicitis.Suspected);
249
+ Sx(Pain.RightLowerQuadrant);
250
+ Dx(Appendicitis.Confirmed);
251
+ Lab(CT).result(InflamedAppendix);
252
+ Tx(Surgery).action(LaparoscopicAppendectomy);
253
+ Crs(PostOp).status(Uncomplicated);
254
+ Tx(Meds).action(Percocet);
255
+ Plan(Discharge.Home).
256
+ Phase 2: The Vault (Reconstruction)
257
+ Input (Protocol): Dx(Pneumonia); Tx(Meds).action(Antibiotics.IV); Crs(Response).result(Improved); Plan(Discharge.Home);
258
+ Input (Secure DB): {"pt_name": "Walter White", "pt_mrn": "HEISEN-01", ...}
259
+ Output (Final Document):
260
+ code
261
+ Text
262
+ DISCHARGE SUMMARY
263
+ Patient: Walter White (MRN: HEISEN-01)
264
+ DOB: 1958-09-07 (Sex: M)
265
+ Attending: Dr. Goodman (Internal Medicine)
266
+ Dates: 2025-05-01 to 2025-05-05
267
+
268
+ **History & Diagnosis:**
269
+ The patient is a 67-year-old male who presented with fever, productive cough, and shortness of breath. Diagnosis was Pneumonia.
270
+
271
+ **Hospital Course:**
272
+ He was treated with IV antibiotics. His respiratory status improved, and he was able to maintain oxygen saturation on room air.
273
+
274
+ **Discharge Plan:**
275
+ The patient is discharged home.
276
+ 6. Conclusion
277
+ Zero-Trust Validation: The "Vault" successfully reconstructs documents using only the database for identity.
278
+ Semantic Expansion: The model demonstrates the ability to take a concise code (Dx(Pneumonia)) and expand it into fluent medical narrative ("Patient presented with symptoms consistent with Pneumonia...").
279
  """
280
 
281
+
282
  # ==============================================================================
283
  # 6. LAUNCHER
284
  # ==============================================================================