Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
|
@@ -203,8 +203,6 @@ report_md = """
|
|
| 203 |
# 🏛️ The Janus Interface: Research & Technical Analysis
|
| 204 |
**Project Status:** Research Prototype v2.0 (Gold Standard)
|
| 205 |
|
| 206 |
-
---
|
| 207 |
-
|
| 208 |
### 1. Research Motivation: The Privacy-Utility Paradox
|
| 209 |
In regulated domains (Healthcare, Legal, Finance), Generative AI adoption is stalled by a fundamental conflict:
|
| 210 |
* **Utility:** Large Cloud Models (GPT-4, Claude) offer superior reasoning but require sending data off-premise.
|
|
@@ -227,11 +225,8 @@ The system utilizes a **Multi-Task Adapter** trained to switch between two disti
|
|
| 227 |
* **Function:** A secure, offline engine that accepts the JanusScript and a Local SQL Database record.
|
| 228 |
* **Output:** It merges the abstract logic with the concrete identity to generate the final, human-readable document.
|
| 229 |
|
| 230 |
-
---
|
| 231 |
-
|
| 232 |
### 3. Data Engineering: The "Gold Standard" Pipeline
|
| 233 |
To achieve high fidelity without using private patient data, we developed a **Synthesized Data Pipeline**:
|
| 234 |
-
|
| 235 |
1. **Synthesis:** We generated **306 high-quality clinical scenarios** using Large Language Models (LLMs).
|
| 236 |
2. **Alignment:** Unlike previous iterations where headers were random, this dataset ensured strict mathematical alignment between the Identity Header (Age/Sex) and the Clinical Narrative.
|
| 237 |
3. **Result:** This eliminated the "hallucination" issues seen in earlier tests where the model would confuse patient gender or age due to conflicting training signals.
|
|
@@ -240,15 +235,50 @@ To achieve high fidelity without using private patient data, we developed a **Sy
|
|
| 240 |
* **Base Model:** Microsoft Phi-3.5-mini-instruct (3.8B Parameters).
|
| 241 |
* **Framework:** **Unsloth** (Optimized QLoRA).
|
| 242 |
* **Technique:** **DoRA (Weight-Decomposed Low-Rank Adaptation)**.
|
| 243 |
-
* *Why DoRA?* Standard LoRA struggles with strict syntax/coding tasks. DoRA updates both magnitude and direction vectors, allowing the model to learn the strict `JanusScript` grammar effectively.
|
| 244 |
* **Loss Masking:** We used `train_on_responses_only`. The model was **never** trained on the input text, only on the output. This prevents the model from memorizing patient PII from the training set.
|
| 245 |
-
* **Hyperparameters:** Rank 16, Alpha 16, Learning Rate 2e-4,
|
| 246 |
-
|
| 247 |
-
### 5.
|
| 248 |
-
*
|
| 249 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 250 |
"""
|
| 251 |
|
|
|
|
| 252 |
# ==============================================================================
|
| 253 |
# 6. LAUNCHER
|
| 254 |
# ==============================================================================
|
|
|
|
| 203 |
# 🏛️ The Janus Interface: Research & Technical Analysis
|
| 204 |
**Project Status:** Research Prototype v2.0 (Gold Standard)
|
| 205 |
|
|
|
|
|
|
|
| 206 |
### 1. Research Motivation: The Privacy-Utility Paradox
|
| 207 |
In regulated domains (Healthcare, Legal, Finance), Generative AI adoption is stalled by a fundamental conflict:
|
| 208 |
* **Utility:** Large Cloud Models (GPT-4, Claude) offer superior reasoning but require sending data off-premise.
|
|
|
|
| 225 |
* **Function:** A secure, offline engine that accepts the JanusScript and a Local SQL Database record.
|
| 226 |
* **Output:** It merges the abstract logic with the concrete identity to generate the final, human-readable document.
|
| 227 |
|
|
|
|
|
|
|
| 228 |
### 3. Data Engineering: The "Gold Standard" Pipeline
|
| 229 |
To achieve high fidelity without using private patient data, we developed a **Synthesized Data Pipeline**:
|
|
|
|
| 230 |
1. **Synthesis:** We generated **306 high-quality clinical scenarios** using Large Language Models (LLMs).
|
| 231 |
2. **Alignment:** Unlike previous iterations where headers were random, this dataset ensured strict mathematical alignment between the Identity Header (Age/Sex) and the Clinical Narrative.
|
| 232 |
3. **Result:** This eliminated the "hallucination" issues seen in earlier tests where the model would confuse patient gender or age due to conflicting training signals.
|
|
|
|
| 235 |
* **Base Model:** Microsoft Phi-3.5-mini-instruct (3.8B Parameters).
|
| 236 |
* **Framework:** **Unsloth** (Optimized QLoRA).
|
| 237 |
* **Technique:** **DoRA (Weight-Decomposed Low-Rank Adaptation)**.
|
|
|
|
| 238 |
* **Loss Masking:** We used `train_on_responses_only`. The model was **never** trained on the input text, only on the output. This prevents the model from memorizing patient PII from the training set.
|
| 239 |
+
* **Hyperparameters:** Rank 16, Alpha 16, Learning Rate 2e-4, 2 Epochs (approx 78 steps used for final checkpoint).
|
| 240 |
+
|
| 241 |
+
### 5. Validated Output Examples
|
| 242 |
+
*Due to hardware constraints on the Free Tier CPU, live inference may be slow. Below are validated outputs from the model running on T4 GPU.*
|
| 243 |
+
|
| 244 |
+
#### **Phase 1: The Scout (Logic Extraction)**
|
| 245 |
+
*Input (Raw Note):* "Pt admitted for appendicitis. 45M. CT showed inflamed appendix. Taken to OR for Lap Appy. Uncomplicated. Discharged home on Percocet."
|
| 246 |
+
*Output (JanusScript):*
|
| 247 |
+
```javascript
|
| 248 |
+
Hx(Appendicitis.Suspected);
|
| 249 |
+
Sx(Pain.RightLowerQuadrant);
|
| 250 |
+
Dx(Appendicitis.Confirmed);
|
| 251 |
+
Lab(CT).result(InflamedAppendix);
|
| 252 |
+
Tx(Surgery).action(LaparoscopicAppendectomy);
|
| 253 |
+
Crs(PostOp).status(Uncomplicated);
|
| 254 |
+
Tx(Meds).action(Percocet);
|
| 255 |
+
Plan(Discharge.Home).
|
| 256 |
+
Phase 2: The Vault (Reconstruction)
|
| 257 |
+
Input (Protocol): Dx(Pneumonia); Tx(Meds).action(Antibiotics.IV); Crs(Response).result(Improved); Plan(Discharge.Home);
|
| 258 |
+
Input (Secure DB): {"pt_name": "Walter White", "pt_mrn": "HEISEN-01", ...}
|
| 259 |
+
Output (Final Document):
|
| 260 |
+
code
|
| 261 |
+
Text
|
| 262 |
+
DISCHARGE SUMMARY
|
| 263 |
+
Patient: Walter White (MRN: HEISEN-01)
|
| 264 |
+
DOB: 1958-09-07 (Sex: M)
|
| 265 |
+
Attending: Dr. Goodman (Internal Medicine)
|
| 266 |
+
Dates: 2025-05-01 to 2025-05-05
|
| 267 |
+
|
| 268 |
+
**History & Diagnosis:**
|
| 269 |
+
The patient is a 67-year-old male who presented with fever, productive cough, and shortness of breath. Diagnosis was Pneumonia.
|
| 270 |
+
|
| 271 |
+
**Hospital Course:**
|
| 272 |
+
He was treated with IV antibiotics. His respiratory status improved, and he was able to maintain oxygen saturation on room air.
|
| 273 |
+
|
| 274 |
+
**Discharge Plan:**
|
| 275 |
+
The patient is discharged home.
|
| 276 |
+
6. Conclusion
|
| 277 |
+
Zero-Trust Validation: The "Vault" successfully reconstructs documents using only the database for identity.
|
| 278 |
+
Semantic Expansion: The model demonstrates the ability to take a concise code (Dx(Pneumonia)) and expand it into fluent medical narrative ("Patient presented with symptoms consistent with Pneumonia...").
|
| 279 |
"""
|
| 280 |
|
| 281 |
+
|
| 282 |
# ==============================================================================
|
| 283 |
# 6. LAUNCHER
|
| 284 |
# ==============================================================================
|