gemeo-twin-stack / README.md
timmers's picture
Fix author name: Dimas Timmers
c6cc543 verified
metadata
license: cc-by-nc-4.0
language:
  - pt
  - en
tags:
  - rare-disease
  - digital-twin
  - world-model
  - patient-trajectory
  - knowledge-graph
  - primekg
  - hgt
  - tgnn
  - txgnn
  - neural-survival
  - causal-inference
  - brazilian-sus
  - datasus
library_name: pytorch
pipeline_tag: graph-ml
datasets:
  - DATASUS-SIH-RD
  - DATASUS-APAC-SIA
  - DATASUS-BPA-I
  - DATASUS-SIM
  - PrimeKG
  - RareBench
metrics:
  - c-index
  - recall-at-k
  - mrr
  - auroc
  - brier
extra_gated_prompt: >-
  This module is released for **research only**. It is NOT a medical device, NOT
  approved by any regulator, and MUST NOT be used to inform diagnosis or
  treatment without physician oversight and regulatory clearance.
extra_gated_fields:
  Name: text
  Affiliation: text
  Intended use: text
  I agree to non-clinical research use only: checkbox

GEMEO Twin Stack β€” Application Layer for the GEMEO Patient World Model

The graph-native digital-twin application stack that runs on top of the GEMEO World Model. Six inference modes β€” trajectory, diagnosis, risk/survival, counterfactual, repurposing, cohort β€” wired together with auxiliary heads, KG embeddings, and a FastAPI surface. Research preview. Not a medical device.

Authors: Raras.ai team Β· Contact: dimas@raras.ai Source: github companion repo (raras.org) Paper v1 (Zenodo): DOI 10.5281/zenodo.20092131 🌍 World Model (the dynamics core this stack runs on top of): Raras-AI/gemeo-world-model πŸ“± Mobile decision-support sibling: Raras-AI/araras-gemma4 License: CC-BY-NC 4.0 + non-clinical-use rider (see LICENSE)

Note on naming. Previously released as Raras-AI/gemeo-world-model (HF auto-redirects). Renamed to gemeo-twin-stack because the actual world model β€” the generative dynamics core β€” is the Causal Diffusion Forcing transformer now living at the new gemeo-world-model slug. This repo is the application layer (encoder, cohort, risk, whatif, repurpose, ask, ground_sus, api) that turns the world model into a usable digital twin with six inference modes.


What this is

This is the application layer of GEMEO β€” a module (β‰ˆ22k LOC) that takes the GEMEO World Model (a Causal Diffusion Forcing transformer, published separately at Raras-AI/gemeo-world-model) and wires it together with auxiliary heads, KG embeddings, and tooling to produce a complete digital-twin product. Each inference mode has a clean Python API, a bootstrap implementation that runs today, and an optional learned slot that drops in when a checkpoint exists.

GEMEO Twin Stack  (this repo)
β”œβ”€β”€ Patient embedding       (gemeo/encoder.py)        HGT scaffolded; bootstrap = weighted KG embedding
β”œβ”€β”€ Cohort retrieval        (gemeo/cohort.py)         kNN + Cypher overlap on PrimeKG
β”œβ”€β”€ Subgraph reasoning      (gemeo/subgraph.py)       1-hop sparsification (learned variant in train/)
β”œβ”€β”€ Trajectory mode         (gemeo/trajectory.py)     ← calls into the GEMEO World Model
β”œβ”€β”€ Diagnostic mode         (deeprare_diagnosis.py +  ← multi-agent over PrimeKG paths
β”‚                            fast_dx.py, ensemble)
β”œβ”€β”€ Risk / survival         (gemeo/risk.py)           NeuralSurv trained on DATASUS SIM (c-index 0.70)
β”œβ”€β”€ Drug repurposing        (gemeo/repurpose.py)      TxGNN slot + SUS auxiliary head
β”œβ”€β”€ Counterfactual          (gemeo/whatif.py)         do-calculus mutator; uses world-model rollout
β”œβ”€β”€ Active learning         (gemeo/ask.py)            info-gain over KG annotations
β”œβ”€β”€ SUS grounding           (gemeo/ground_sus.py)     PCDT/CEAF/UF coverage check
└── FastAPI                 (gemeo/api.py)            production /api/gemeo/* endpoints

GEMEO World Model  (separate repo: Raras-AI/gemeo-world-model)
└── Causal Diffusion Forcing transformer (19.86M params)
    The generative dynamics core. Trajectory + counterfactual modes
    above delegate to it.

Headline numbers (real, audited)

Mode Metric Value Notes
Diagnosis (RareBench, 200 cases, v49) R@1 canonical 57.0% (114/200) matches DeepRare phenotype-only published level
R@5 canonical 81.5%
R@1 strict (ORPHA-code) 39.5% strict pattern matching
timeout rate 0.0%
Risk / survival (NeuralSurv, DATASUS SIM) val c-index 0.70 (best @ ep 30) 4,624 SIM mortality records, 37,494 censoring samples
final c-index 0.694 after 100 epochs
World Model backbone (gemeo-world-model) val cross-entropy 0.030 per-token diffusion forcing
calibration ICI 0.0006 well calibrated
training 5.8 min on 1Γ—H100 $0.48

These are the audited, end-to-end numbers from the same swarm-py module β€” they are NOT promises or projections.


Six inference modes

1 Β· Trajectory (digital twin)

from gemeo import build_gemeo

twin = await build_gemeo(
    case_text="5-year-old boy, progressive ataxia, telangiectasia, elevated AFP.",
    patient_info={"age": 5, "sex": "M"},
    context={"sus_region": "SP"},
)
twin.trajectory.horizons   # 6/12/24-month event predictions

Backbone: bootstrap LLM-over-natural-history today, Causal Diffusion Forcing when the CDF checkpoint is mounted.

2 Β· Diagnosis

twin.diagnoses[:5]   # top hypotheses ranked, with evidence chain

Three diagnostic engines wired in:

  • deeprare_diagnosis.py (DeepRare-style multi-agent over PrimeKG paths)
  • fast_dx.py (low-latency phenotype β†’ ORPHA ranking)
  • ensemble_diagnostic.py (ensemble over the above)

Baseline RareBench numbers reported above. v49 corresponds to the checkpoint shipping in this repo.

3 Β· Risk / survival

twin.risk.survival_curve   # months β†’ P(alive), with bootstrap CI

artifacts/neuralsurv.pt (186 KB) β€” Bayesian-style neural survival head over KG-walk features. Trained on DATASUS SIM (n=4,624 deaths, n=37,494 censoring samples). See artifacts/neuralsurv_datasus_summary.json for the per-ORPHA sanity check (e.g., ORPHA:586 / cystic fibrosis: 96.4% alive at 12 m, 80.3% at 72 m).

4 Β· Counterfactual (what-if)

twin.what_if({"add_treatment": "ivacaftor"})   # mutated trajectory

Bootstrap = heuristic snapshot mutation; learned variant = CF-GNNExplainer slot.

5 Β· Drug repurposing

twin.drugs.candidates[:5]   # KG-walk Disease→Gene→Drug, SUS-availability ranked

TxGNN slot + SUS auxiliary head (filters by CONITEC/PCDT coverage).

6 Β· Cohort + active learning

twin.cohort.members[:10]      # patients-like-mine (kNN on PrimeKG)
twin.next_questions[:3]       # information-gain ranked clinical questions

What's in this repo

src/gemeo/                            # 22 k LOC, the world-model module
β”œβ”€β”€ core.py, api.py, types.py         # orchestrator + FastAPI + types
β”œβ”€β”€ encoder.py, cohort.py, subgraph.py
β”œβ”€β”€ trajectory.py, risk.py, whatif.py
β”œβ”€β”€ repurpose.py, ask.py, ground_sus.py
β”œβ”€β”€ deeprare_diagnosis-style modules  # diagnostic mode wiring
β”œβ”€β”€ train/                            # HGT, TGNN, TxGNN, NeuralSurv training
β”œβ”€β”€ datasus/                          # DATASUS pull (SIH, APAC, SIM, CNS linkage)
└── cwm/                              # earlier causal-world-model exploration

artifacts/
β”œβ”€β”€ neuralsurv.pt                     # 186 KB, NeuralSurv risk head (c-index 0.70)
β”œβ”€β”€ neuralsurv_datasus_summary.json   # per-ORPHA survival sanity check
└── dt_fm.pt                          # 7.6 MB, decision-time foundation-model baseline

data_derived/                         # ALL derived from PrimeKG (CC-BY 4.0)
β”œβ”€β”€ fused_embeddings_fp16.npz         # 43 MB, 3072-dim disease/HPO/gene embeddings
β”œβ”€β”€ graph_embeddings.npz              # 7 MB, 768-dim KG node embeddings
β”œβ”€β”€ hetero_graph.json                 # 6 MB, edge index by relation
└── node_ids.json                     # 530 KB, id β†’ ORPHA/HPO/HGNC mapping

benchmarks/
└── rarebench_v49_diagnose_200.json   # 200-case RareBench eval, per-case results

examples/
└── quickstart.py                     # minimal usage

LICENSE
README.md   ← this file

The GEMEO World Model (the Causal Diffusion Forcing dynamics core) is published separately at Raras-AI/gemeo-world-model (~80 MB) β€” keep that repo for the heavy ckpt, this repo for the application stack that runs on top of it.


Quickstart

import sys; sys.path.append("src")
from gemeo import build_gemeo

twin = await build_gemeo(
    case_text="Menino, 5 anos, ataxia progressiva, telangiectasia, AFP elevado.",
    patient_info={"age": 5, "sex": "M"},
    context={"sus_region": "SP"},
)

twin.diagnoses[:3]              # β†’ e.g. Ataxia-Telangiectasia (ORPHA:100) at top
twin.cohort.members[:5]         # patients-like-mine
twin.risk.survival_curve        # months β†’ P(alive)
twin.drugs.candidates[:3]       # repurposing
twin.next_questions[:3]         # active learning
twin.sus_check.pcdt_url         # PCDT compliance link

For the FastAPI server (production layout β€” same as raras.org runs):

pip install -r requirements.txt
uvicorn gemeo.api:app --reload --port 8000
# POST /api/gemeo/build           β€” create a twin
# POST /api/gemeo/{id}/evolve     β€” add new clinical data
# POST /api/gemeo/{id}/whatif     β€” counterfactual
# GET  /api/gemeo/{id}/{cohort,subgraph,trajectory,risk,drugs,trials,next-questions,sus,viz}

Data, ethics, governance

  • Source: Brazilian DATASUS subsystems (SIH-RD, APAC-SIA, BPA-I, SIM).
  • Linkage: CNS-hash deterministic (Tier 1 via APAC).
  • De-identification: ages bucketed (5y); UF only (no municΓ­pio); k-anonymity β‰₯ 5.
  • Ethics: Brazilian Res. CNS 466/2012 + 510/2016. LGPD-compliant.
  • Not on this repo: PHI, raw CNS hashes, individual-level data. Only derived embeddings (PrimeKG), trained weights (NeuralSurv, DT-FM), source code, and aggregate metrics.

What's in development (and what Mayo would enable)

The architecture is mode-complete: every box has a working bootstrap or trained slot. The bottleneck is data substrate:

Mode SUS today What Mayo / multimodal substrate unlocks
Trajectory events only (no notes) + 1.65 B clinical notes β†’ richer event tokens
Diagnosis R@1 57% canonical / 200 cases + WES variants + HPO from notes β†’ target R@1 β‰₯ 78% (DeepRare bar)
Risk c-index 0.70 on 4,624 deaths + longitudinal labs + meds β†’ target c-index β‰₯ 0.80
Counterfactual heuristic + trial-emulation labels β†’ causal-grade what-if
Repurposing TxGNN + SUS heuristics + Mayo trial outcomes β†’ real evidence ranking

This is the basis of our Mayo Clinic Platform_Accelerate proposal: same world model, multimodal substrate.


Citation

@misc{gemeo_world_model_2026,
  title  = {GEMEO: A Patient World Model for Rare Disease, Grounded in
            Brazilian SUS Data},
  author = {Timmers, Dimas and the Raras.ai team},
  year   = {2026},
  url    = {https://huggingface.co/Raras-AI/gemeo-world-model},
  note   = {Research preview. Not a medical device.}
}

@misc{gemeo_v1_2026,
  title  = {GEMEO v1: A SUS-Grounded Patient Digital Twin for Rare-Disease
            Trajectory Forecasting},
  author = {Timmers, Dimas and Kawassaki, Alexandre},
  year   = {2026},
  doi    = {10.5281/zenodo.20092131},
  url    = {https://doi.org/10.5281/zenodo.20092131}
}

Building blocks (please also cite as appropriate)

  • PrimeKG β€” Chandak, Huang, Zitnik. Nature Sci Data (2023).
  • HGT β€” Hu et al., WWW 2020. Heterogeneous Graph Transformer.
  • TGNN β€” Rossi et al., ICML 2020. Temporal Graph Networks.
  • TxGNN β€” Huang et al., Nature Medicine 2024. A foundation model for clinician-centered drug repurposing.
  • NeuralSurv β€” Lee et al., Stat. Med. 2021.
  • Diffusion Forcing β€” Chen et al., NeurIPS 2024 (2407.01392).
  • DeepRare β€” Nature 2026, s41586-025-10097-9.
  • PhenoKG β€” arXiv 2506.13119 (Jun 2025).
  • MEDS β€” Medical Event Data Standard v0.4.1, McDermott et al. 2024.

Changelog

  • 2026-05 (this release) β€” Initial public release of the twin-stack application layer + NeuralSurv ckpt (c-index 0.70) + DT-FM baseline + PrimeKG-derived embeddings + RareBench v49 results.
  • 2026-05-19 β€” Renamed from Raras-AI/gemeo-world-model β†’ Raras-AI/gemeo-twin-stack (HF auto-redirects). The slug gemeo-world-model now hosts the actual Causal Diffusion Forcing world model.
  • 2026-05 β€” GEMEO World Model v2 (CDF backbone) published at Raras-AI/gemeo-world-model.
  • 2026-04 β€” GEMEO v1 paper published on Zenodo (DOI 10.5281/zenodo.20092131).

⚠️ Reminder: Research only. Not a medical device. No clinical use without physician oversight and applicable regulatory clearance.