diff --git "a/index.html" "b/index.html" --- "a/index.html" +++ "b/index.html" @@ -1,1617 +1,449 @@ - - + + - - - MedSwin — Project Introduction - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + MedSwin �� Clinical Specialist + + + + + + + + + - - - - Skip to content - - - -
-
-
- - -
-
-
- -
-
- -
-
- - - - -
- -
-
-
- - -
-
- Evidence-first - Two-stage retrieval - Auditable traces - Distilled 7B -
- -

- Multi-Agent Biomedical Reasoning - Grounded in EMR & Guidelines -

- -

- MedSwin is an evidence-constrained clinical QA stack: - specialised agents coordinate retrieval, EMR summarisation, guideline synthesis, and safety critique—while a - calibrated reranker enforces evidence sufficiency under a token budget. -

- - - -
- MedSwin - Local-deployable by design - Provenance-aware output -
- - -
-
-
- Replayable traces - Evidence sufficiency checks - Token budget control - Hybrid ANN + BM25 - Calibrated reranker - Multi-agent coordination (MAC) -
- - - -
-
-
- - -
-
-
-
-
-

Trust Stack

-

Design primitives for clinical deployment readiness

-
- System -
- -
-
-
Auditability
-

- Typed artifacts + explicit provenance (document IDs, sections, timestamps, chunk offsets) enable replay and review. -

-
-
-
Evidence sufficiency
-

- Retrieval is accepted only when EMR/CPG coverage targets are met under a strict token budget. -

-
-
-
Calibrated ranking
-

- Long-context biomedical reranker outputs calibrated probabilities for deterministic inclusion policies. -

-
-
-
Safety critique
-

- Critic checks missing evidence, contraindications, and unsafe advice—then routes “retrieve-more” when needed. -

-
-
- - -
-
- - - Research prototype — not a substitute for professional medical advice. - -
-
+
+
+
+

Live evidence sonar

+

Clinical channels under audit

+ 4,895 m +
+
+
+
+
+
+
MedSwinevidence lock
+ + + + +
+

Hover a beacon to inspect an evidence channel.

+
+
0B student
+
0B teacher
+
0agents
+
0% Recall@10
- -
-
-
- -
-

Overview

-

- MedSwin frames clinical QA as an evidence-constrained decision pipeline. - Every answer is gated by evidence sufficiency, bounded by a strict context budget, and accompanied by a - replayable trace suitable for audit and safety review. -

- -
-
-
- Answer -
-

- Clinically phrased, uncertainty-aware output generated only when evidence gates are satisfied. -

-
-
-
- Evidence bundle -
-

- Compact EMR + guideline passages selected under token and diversity constraints. -

-
-
-
- Trace -
-

- Structured artifact log: retrieval, ranking, policies, safety checks. -

-
-
-
- - - -
+
+
+

System principles

+

MedSwin separates retrieval, evidence quality, safety critique, and answer synthesis.

+

The system tracks provenance, calibrated relevance, facet sufficiency, contradiction handling, agent reliability, benchmark performance, and release artifacts as distinct parts of the clinical evidence workflow.

+
+
+
01

Evidence-first generation

Answer generation only follows an accepted EMR/CPG evidence bundle, rather than raw top-K context.

+
02

Calibrated inclusion

Reranker logits are converted to calibrated probabilities for threshold-based policy checks.

+
03

Safety is a first-class facet

Contraindications, interactions, exclusions, and contradictions remain visible throughout selection.

+
04

Auditability over confidence

Each claim keeps source metadata, score context, evidence grade, and facet role.

- -
-
-
-
-
-
-

1) Auditable Multi-Agent Orchestration

- -
-

- Every agent produces artifacts with provenance metadata and logs tool calls + selected evidence. This creates a - structured audit trail suitable for review and replay. -

-
    -
  • Typed artifacts: ids, sections, timestamps, offsets
  • -
  • Deterministic “retrieve-more” instead of guessing
  • -
  • Critique + safety checks before finalisation
  • -
-
-
- -
-
-
-

2) Two-Stage Retrieval with Calibrated Reranking

- -
-

- Stage-1 retrieves candidates via dense retrieval + BM25 union. Stage-2 reranks with a long-context biomedical reranker - (LoRA-adapted) and outputs calibrated probabilities used by policy constraints. -

-
    -
  • Hybrid candidate pool: ANN + lexical coverage
  • -
  • Evidence sufficiency thresholds for EMR/CPG
  • -
  • Budgeted, diverse selection (MMR-style)
  • -
-
+
+
+

Architecture animation

+

Trace a clinical query through the evidence chamber.

+

A clinician query is decomposed into patient context and guideline evidence, reranked with biomedical relevance signals, checked for sufficiency, and returned with a grounded answer plus audit trail.

+
+
+
+
+

Pipeline playback

+

Clinician query enters the chamber.

- -
-
-
-

3) Distilled 7B Medical LLM Pipeline

- -
-

- A compact student is built with SFT on augmented biomedical QA, then refined using hard+soft KD from a larger instructor. - This targets deployability while preserving calibrated reasoning behavior. -

-
    -
  • Large-scale augmentation with semantic checks
  • -
  • Hard labels expand coverage; soft labels preserve uncertainty
  • -
  • PEFT (QLoRA/LoRA) enables modest GPU training
  • -
-
+
+ +
- - -
-
-

From question → audited answer

-

- MedSwin outputs an answer only when the evidence bundle is sufficient. Otherwise it asks clarifying questions or expands retrieval. -

-
- Clarify - Retrieve-more - Safe final -
-
- -
-
    -
  1. -
    1 Normalise
    -

    Canonicalise terms, expand abbreviations, form retrieval probes.

    -
  2. -
  3. -
    2 Retrieve + Rank
    -

    Hybrid candidates, rerank with calibrated probabilities, enforce sufficiency.

    -
  4. -
  5. -
    3 Synthesize + Critique
    -

    Summarise EMR, synthesise guideline actions, run safety critique, then answer.

    -
  6. -
-
+
+ + + + + + + + + + + + + + + + + + + + + Private Clinical System + + + + Clinician Input / Case Query + + + + + + + Patient ID + + + + + + + + + Case query + + + + + + + Patient-Specific EMR + + + + + + + + + Patient + + + + + + + + + + Grounded EMR + + + + + + CPG / Guidelines + + + + + + + + + Guideline Support + + + + + + + + + + Query + EMR Context + + + + + + + + Orchestrator + + + + + + + + + + + + + + + + + + + + + Stage 1: Hybrid Biomedical Retrieval + EMR candidates + CPG candidates + + + + + + + + + + Stage 2: Biomedical Reranker + + + + + + Iterative Reasoning Loop + + + + + + + + + + + + + + + + Grounded Clinical Answer + +
+
- -
-
-
-
-

Architecture Explorer

-

- Explore MedSwin layers: multi-agent workflow, two-stage retrieval, evidence sufficiency checks, and MAC coordination. -

-
-
- Orchestrated - Provenance - Budgeted -
-
- -
- - - - +
+
+

Medical specialist

+

Training, distillation, and model merging are displayed as separate mechanisms.

+

The specialist model is built from supervised biomedical instruction data, teacher-student distillation, and training-free merge operators that control destructive interference between updates.

+
+
+
+

SFT mixture

Biomedical supervision sources

augmented
+
- -
-
-
-
-
-
-

High-Level System Diagram

- Mermaid · interactive tab -
-
-flowchart LR
-  U["Clinician UI / EMR"] -->|q + patient context| ORCH["Orchestrator (MAC)\nplanning · policy checks · logging"]
-  ORCH --> QN["Query Normaliser"]
-  ORCH --> RET["Evidence Retriever"]
-  ORCH --> EMRS["EMR Summariser"]
-  ORCH --> GS["Guideline Synthesiser"]
-  ORCH --> SC["Safety Critic"]
-
-  subgraph IR["Two-Stage Retrieval (Budgeted)"]
-    DENSE["Stage 1: Dense ANN (MedEmbed)"] --> CAND["Candidates (dense OR BM25)"]
-    BM25["Stage 1: Lexical (BM25)"] --> CAND
-    CAND --> RER["Stage 2: Long-context Reranker\n(LoRA-adapted, calibrated)"]
-    RER --> SEL["Policy-aware selection\nMMR + sufficiency constraints"]
-  end
-
-  RET --> IR
-  SEL --> EVID["Evidence bundle M\nEMR + CPG + metadata"]
-
-  EMRS --> STATE["Clinical state summary"]
-  GS --> ACTIONS["Guideline actions + contraindications"]
-  SC --> FLAGS["Safety flags / missing evidence"]
-
-  EVID --> FUSE["Evidence-constrained synthesis"]
-  STATE --> FUSE
-  ACTIONS --> FUSE
-  FLAGS --> FUSE
-  FUSE --> OUT["Final answer + citations + cautions\n+ structured trace"]
-                  
-
- - - - - - -
-
+
+

Hybrid SFT + KD

Teacher-student transfer

QLoRA
+
+
27BTeacher
+
hard labelstop-k soft logitsuncertainty transfer
+
7BStudent
- - - +
Lt = α CE + (1 − α)τ² KL(pT(·|τ) ∥ pS(·))
-
- - -
-
-
-
-

- Two-Stage Retrieval & Calibrated Reranking -

-

- Evidence selection is separated into recall-oriented candidate generation and precision-oriented reranking. - This avoids early truncation while enabling deterministic, policy-aware inclusion decisions. -

- -
-
-
- Stage 1 — Candidate generation -
-

- Dense ANN retrieval is unioned with BM25 to preserve rare clinical terms, abbreviations, and lab-specific phrasing. -

-
- -
-
- Stage 2 — Long-context reranking -
-

- A biomedical LLM reranker scores each passage and outputs calibrated probabilities usable as policy thresholds. -

-
- -
-
- Policy-aware selection -
-

- Final selection enforces EMR + guideline sufficiency, diversity (MMR-style), and a strict token budget. -

-
-
-
- - - +
+
+

Interference-aware composition

Merge operators reduce destructive update conflict.

+ training-free +
+
+
+
- -
- - -
-
-
-
-

- Data, Training & Distillation -

-

- MedSwin’s deployable 7B model is trained for reliability rather than raw scale, - combining large-scale augmentation, supervised fine-tuning, and knowledge distillation. -

- -
-
-
A · Data augmentation
-

- Paraphrasing, formatting variants, deduplication, and medical consistency checks expand coverage without semantic drift. -

-
-
-
B · Supervised fine-tuning
-

- Aligns the student to clinical instruction style, tone control, and structured answers. -

-
-
-
C · Knowledge distillation
-

- Hard labels expand task coverage; soft labels preserve calibration and uncertainty from a larger instructor. -

-
-
+
+
+

Retrieval and sufficiency

+

Evidence is selected by clinical utility, not by raw top-K truncation.

+

Each retrieval stage uses a responsive diagram, and the sufficiency simulator animates toward threshold when more evidence is retrieved.

+
+
+
+
+
+
+
+
+
+

Facet sufficiency simulator

Build an evidence bundle

+
- - - +
+
+
+
Critical facets are below acceptance threshold.
- -
- - -
-
-
-
-

- Evaluation & Safety -

-

- MedSwin evaluates clinical QA systems beyond answer accuracy, focusing on evidence quality, - guideline compliance, and runtime safety behaviour. -

- -
-
-
- Retrieval quality -
-

- Evidence relevance and coverage under a fixed token budget. -

-
-
-
- Guideline coverage -
-

- Presence of actionable recommendations and contraindications. -

-
-
-
- Faithfulness -
-

- Final answers remain grounded in cited evidence only. -

-
-
+
+
+

Multi-agent coordination

+

The MAC layer behaves like a specialist dive team.

+

Agents explore different hypotheses, return claim-level ledgers, and are aggregated through reliability-weighted evidence selection.

+
+
+
+
+

Claim-level ledger

Every claim keeps its source role.

audit artifact
+
+ + + + + + + + +
FacetRolePolarityGradeTrace
Guideline concordanceRecommendationsupportsCPGdoc · version · section
Patient applicabilityLab / comorbidityqualifiesEMRencounter · timestamp
Safety riskContraindicationconflictsSafetyseverity · source
UncertaintyContradiction pairpreservedMixedadjudication status
- - - +
Contradictions are not averaged away.High-grade conflicts are preserved until the critic or final synthesiser explicitly adjudicates them.
- -
-
-
-
-

Team

-

- A multidisciplinary research team building an auditable medical AI system. -

+
+
+

Evaluation dashboard

+

QA, reranking, and audit metrics are inspected separately.

+

Overlap metrics, semantic similarity, retrieval quality, latency, and audit completeness are kept in separate views so answer similarity is not mistaken for deployment readiness.

+
+
+
+
+ + +
-
- Swinburne - Multi-role +
+ +
- -
-
-
🎖️
-
Liam
-
Leader
-
-
-
🧪
-
Henry
-
LLM
-
-
-
🔗
-
Hai
-
System
+
+
+
+
ModelROUGE-LBERT-F1Token F1Uni PrecBi Prec
+
+
+
+
+
+

MSAS component families

Auditability is separated from answer similarity.

    +

    Clinical interpretation boundary

    Overlap and semantic metrics are useful signals, but unsupported claims, unsafe omissions, missing provenance, and unresolved contradictions are tracked separately because answer similarity alone is not enough for deployment safety.

    - - -
    - - -
    - - - +
    + - -