Evidence-first generation
Answer generation only follows an accepted EMR/CPG evidence bundle, rather than raw top-K context.
diff --git "a/index.html" "b/index.html" --- "a/index.html" +++ "b/index.html" @@ -1,1617 +1,449 @@ - - + +
- - -
+
+ MedSwin
+ Evidence Observatory
+
+
+
+
+ Clinical RAG · EMR + CPG · audit trail
+MedSwin combines patient-specific EMR retrieval, clinical practice guidelines, biomedical reranking, specialist critique, and provenance tracking so each answer can be traced back to the evidence that shaped it.
+- MedSwin is an evidence-constrained clinical QA stack: - specialised agents coordinate retrieval, EMR summarisation, guideline synthesis, and safety critique—while a - calibrated reranker enforces evidence sufficiency under a token budget. -
- - - -Design primitives for clinical deployment readiness
-- Typed artifacts + explicit provenance (document IDs, sections, timestamps, chunk offsets) enable replay and review. -
-- Retrieval is accepted only when EMR/CPG coverage targets are met under a strict token budget. -
-- Long-context biomedical reranker outputs calibrated probabilities for deterministic inclusion policies. -
-- Critic checks missing evidence, contraindications, and unsafe advice—then routes “retrieve-more” when needed. -
-Live evidence sonar
+Hover a beacon to inspect an evidence channel.
+- MedSwin frames clinical QA as an evidence-constrained decision pipeline. - Every answer is gated by evidence sufficiency, bounded by a strict context budget, and accompanied by a - replayable trace suitable for audit and safety review. -
- -- Clinically phrased, uncertainty-aware output generated only when evidence gates are satisfied. -
-- Compact EMR + guideline passages selected under token and diversity constraints. -
-- Structured artifact log: retrieval, ranking, policies, safety checks. -
-System principles
+The system tracks provenance, calibrated relevance, facet sufficiency, contradiction handling, agent reliability, benchmark performance, and release artifacts as distinct parts of the clinical evidence workflow.
+Answer generation only follows an accepted EMR/CPG evidence bundle, rather than raw top-K context.
Reranker logits are converted to calibrated probabilities for threshold-based policy checks.
Contraindications, interactions, exclusions, and contradictions remain visible throughout selection.
Each claim keeps source metadata, score context, evidence grade, and facet role.
- Every agent produces artifacts with provenance metadata and logs tool calls + selected evidence. This creates a - structured audit trail suitable for review and replay. -
-- Stage-1 retrieves candidates via dense retrieval + BM25 union. Stage-2 reranks with a long-context biomedical reranker - (LoRA-adapted) and outputs calibrated probabilities used by policy constraints. -
-Architecture animation
+A clinician query is decomposed into patient context and guideline evidence, reranked with biomedical relevance signals, checked for sufficiency, and returned with a grounded answer plus audit trail.
+Pipeline playback
+- A compact student is built with SFT on augmented biomedical QA, then refined using hard+soft KD from a larger instructor. - This targets deployability while preserving calibrated reasoning behavior. -
-- MedSwin outputs an answer only when the evidence bundle is sufficient. Otherwise it asks clarifying questions or expands retrieval. -
-Canonicalise terms, expand abbreviations, form retrieval probes.
-Hybrid candidates, rerank with calibrated probabilities, enforce sufficiency.
-Summarise EMR, synthesise guideline actions, run safety critique, then answer.
-- Explore MedSwin layers: multi-agent workflow, two-stage retrieval, evidence sufficiency checks, and MAC coordination. -
-Medical specialist
+The specialist model is built from supervised biomedical instruction data, teacher-student distillation, and training-free merge operators that control destructive interference between updates.
+SFT mixture
-flowchart LR - U["Clinician UI / EMR"] -->|q + patient context| ORCH["Orchestrator (MAC)\nplanning · policy checks · logging"] - ORCH --> QN["Query Normaliser"] - ORCH --> RET["Evidence Retriever"] - ORCH --> EMRS["EMR Summariser"] - ORCH --> GS["Guideline Synthesiser"] - ORCH --> SC["Safety Critic"] - - subgraph IR["Two-Stage Retrieval (Budgeted)"] - DENSE["Stage 1: Dense ANN (MedEmbed)"] --> CAND["Candidates (dense OR BM25)"] - BM25["Stage 1: Lexical (BM25)"] --> CAND - CAND --> RER["Stage 2: Long-context Reranker\n(LoRA-adapted, calibrated)"] - RER --> SEL["Policy-aware selection\nMMR + sufficiency constraints"] - end - - RET --> IR - SEL --> EVID["Evidence bundle M\nEMR + CPG + metadata"] - - EMRS --> STATE["Clinical state summary"] - GS --> ACTIONS["Guideline actions + contraindications"] - SC --> FLAGS["Safety flags / missing evidence"] - - EVID --> FUSE["Evidence-constrained synthesis"] - STATE --> FUSE - ACTIONS --> FUSE - FLAGS --> FUSE - FUSE --> OUT["Final answer + citations + cautions\n+ structured trace"] --
Hybrid SFT + KD
- Evidence selection is separated into recall-oriented candidate generation and precision-oriented reranking. - This avoids early truncation while enabling deterministic, policy-aware inclusion decisions. -
- -- Dense ANN retrieval is unioned with BM25 to preserve rare clinical terms, abbreviations, and lab-specific phrasing. -
-- A biomedical LLM reranker scores each passage and outputs calibrated probabilities usable as policy thresholds. -
-- Final selection enforces EMR + guideline sufficiency, diversity (MMR-style), and a strict token budget. -
-Interference-aware composition
- MedSwin’s deployable 7B model is trained for reliability rather than raw scale, - combining large-scale augmentation, supervised fine-tuning, and knowledge distillation. -
- -- Paraphrasing, formatting variants, deduplication, and medical consistency checks expand coverage without semantic drift. -
-- Aligns the student to clinical instruction style, tone control, and structured answers. -
-- Hard labels expand task coverage; soft labels preserve calibration and uncertainty from a larger instructor. -
-Retrieval and sufficiency
+Each retrieval stage uses a responsive diagram, and the sufficiency simulator animates toward threshold when more evidence is retrieved.
+Facet sufficiency simulator
- MedSwin evaluates clinical QA systems beyond answer accuracy, focusing on evidence quality, - guideline compliance, and runtime safety behaviour. -
- -- Evidence relevance and coverage under a fixed token budget. -
-- Presence of actionable recommendations and contraindications. -
-- Final answers remain grounded in cited evidence only. -
-Multi-agent coordination
+Agents explore different hypotheses, return claim-level ledgers, and are aggregated through reliability-weighted evidence selection.
+Claim-level ledger
| Facet | Role | Polarity | Grade | Trace |
|---|---|---|---|---|
| Guideline concordance | Recommendation | supports | CPG | doc · version · section |
| Patient applicability | Lab / comorbidity | qualifies | EMR | encounter · timestamp |
| Safety risk | Contraindication | conflicts | Safety | severity · source |
| Uncertainty | Contradiction pair | preserved | Mixed | adjudication status |
- A multidisciplinary research team building an auditable medical AI system. -
+Evaluation dashboard
+Overlap metrics, semantic similarity, retrieval quality, latency, and audit completeness are kept in separate views so answer similarity is not mistaken for deployment readiness.
+