gemeo-twin-stack / src /gemeo /README.md

timmers

GEMEO world-model — initial release (module + NeuralSurv ckpt + RareBench v49 + KG embeddings)

089d665 verified 3 days ago

preview code

raw

history blame contribute delete

5.92 kB

Gemeo

SOTA digital twin module for rare disease patients, grounded in Brazilian SUS. A learned, graph-native, continuously-evolving twin that fuses a Heterogeneous Graph Transformer over PrimeKG with the country's public-health constraints.

gemeo = patient embedding
      + cohort retrieval (patients-like-mine)
      + reasoning subgraph (KG sparsification)
      + trajectory (TGNN over snapshot chains)
      + risk + survival (NeuralSurv)
      + drug repurposing (TxGNN fine-tuned)
      + active learning (info-gain on KG)
      + counterfactual (what-if engine)
      + SUS grounding (PCDT/CEAF/UF)
      + feedback loop
      + viz payload

Installation

Already part of rarasnet-swarm-py. Auto-mounted in main.py at /api/gemeo/*.

Optional Phase-2 training:

pip install torch_geometric tqdm
python -m gemeo.train.primekg
python -m gemeo.train.hgt

Quickstart

from gemeo import build_gemeo, what_if

twin = await build_gemeo(
    case_text="Menino, 5 anos, ataxia progressiva, telangiectasia, AFP elevado.",
    patient_info={"age": 5, "sex": "M"},
    context={"sus_region": "SP"},
)

twin.diagnoses[:3]              # top hypotheses (ranked)
twin.cohort.members[:5]          # patients-like-mine
twin.subgraph                    # reasoning subgraph
twin.trajectory.horizons         # 6/12/24m predictions
twin.risk.survival_curve         # months → P(alive)
twin.drugs.candidates[:3]        # repurposing
twin.next_questions[:3]          # active learning
twin.sus_check.pcdt_url          # PCDT compliance
twin.viz_data                    # ready for react-force-graph

API endpoints

Method	Path	Purpose
POST	`/api/gemeo/build`	create twin from case
GET	`/api/gemeo/{id}`	full twin
POST	`/api/gemeo/{id}/evolve`	add new clinical data
POST	`/api/gemeo/{id}/whatif`	counterfactual
POST	`/api/gemeo/{id}/feedback`	record correction
GET	`/api/gemeo/{id}/{cohort,subgraph,trajectory,risk,drugs,trials,next-questions,sus,viz}`	per-capability getters
GET	`/api/gemeo/health`	bridge + feedback stats

Architecture

Two-tier:

Bootstrap (today) — wraps existing swarm-py modules + raras-app artifacts. Everything works on day-1, no training needed.
Phase-2 SOTA (training) — gemeo/train/ scaffolds for HGT, TxGNN, TGNN, NeuralSurv, CF-GNN. When checkpoints land in gemeo/artifacts/, the runtime auto-detects and overrides bootstrap paths.

                  ┌────────────────────────┐
                  │    raras-app             │
                  │  data/graph-ml/*.npz     │ ← read-only via gemeo.bridge
                  │  Patient.embedding (Neo4j)│
                  │  /grafo (force-graph)     │ ← consumes /api/gemeo/{id}/viz
                  └─────────────┬──────────────┘
                                │
                  ┌─────────────▼──────────────┐
                  │     gemeo (this module)     │
                  │                             │
                  │  bridge.py   ── load .npz   │
                  │  encoder.py  ── HGT or boot │
                  │  cohort.py   ── kNN+graph   │
                  │  subgraph.py ── KG sparsify │
                  │  trajectory  ── TGNN or LLM │
                  │  risk.py     ── NeuralSurv  │
                  │  repurpose   ── TxGNN+SUS   │
                  │  whatif.py   ── CF-GNN      │
                  │  ask.py      ── info-gain   │
                  │  ground_sus  ── PCDT/UF     │
                  │  feedback    ── jsonl ledger│
                  │  viz.py      ── force-graph │
                  │  core.py     ── orchestrator│
                  │  api.py      ── FastAPI     │
                  └─────────────┬──────────────┘
                                │
                  ┌─────────────▼──────────────┐
                  │   swarm-py existing infra   │
                  │  digital_twin_workflow      │
                  │  patient_space (KG)          │
                  │  trajectory_engine, risk_qua │
                  │  drug_repurposer, trial_     │
                  │  matcher, brazilian_context  │
                  └────────────────────────────┘

What's bootstrap vs. learned

Capability	Bootstrap (works today)	Phase-2 SOTA
Patient embedding	Weighted mean of fused-768/3072-dim disease+HPO+gene embeddings (matches raras-app)	HGT trained on PrimeKG with disease link-pred + patient contrastive losses
Cohort	Neo4j vector kNN + Cypher overlap	same retrieval, learned embedding
Subgraph	Cypher 1-hop sparsification	KG sparsification trained on diagnostic outcomes
Trajectory	LLM over disease natural history	TRANS-style TGNN over snapshot chains
Risk / survival	Rule-based severity → exponential survival	NeuralSurv Bayesian survival on KG-walk features
Drug repurposing	KG walks Disease→Gene→Drug	TxGNN fine-tuned on PrimeKG + SUS auxiliary head
What-if	Heuristic: mutate snapshot, re-run	CF-GNNExplainer + do-calculus
Active learning	Info-gain over KG annotation frequencies	Bayesian acquisition over learned dx posterior

Citation

Timmers D, Kawassaki A. Gemeo: Heterogeneous graph foundation model for rare disease digital twins grounded in Brazilian SUS. Raras, 2026.