Gemeo
SOTA digital twin module for rare disease patients, grounded in Brazilian SUS. A learned, graph-native, continuously-evolving twin that fuses a Heterogeneous Graph Transformer over PrimeKG with the country's public-health constraints.
gemeo = patient embedding
+ cohort retrieval (patients-like-mine)
+ reasoning subgraph (KG sparsification)
+ trajectory (TGNN over snapshot chains)
+ risk + survival (NeuralSurv)
+ drug repurposing (TxGNN fine-tuned)
+ active learning (info-gain on KG)
+ counterfactual (what-if engine)
+ SUS grounding (PCDT/CEAF/UF)
+ feedback loop
+ viz payload
Installation
Already part of rarasnet-swarm-py. Auto-mounted in main.py at /api/gemeo/*.
Optional Phase-2 training:
pip install torch_geometric tqdm
python -m gemeo.train.primekg
python -m gemeo.train.hgt
Quickstart
from gemeo import build_gemeo, what_if
twin = await build_gemeo(
case_text="Menino, 5 anos, ataxia progressiva, telangiectasia, AFP elevado.",
patient_info={"age": 5, "sex": "M"},
context={"sus_region": "SP"},
)
twin.diagnoses[:3] # top hypotheses (ranked)
twin.cohort.members[:5] # patients-like-mine
twin.subgraph # reasoning subgraph
twin.trajectory.horizons # 6/12/24m predictions
twin.risk.survival_curve # months β P(alive)
twin.drugs.candidates[:3] # repurposing
twin.next_questions[:3] # active learning
twin.sus_check.pcdt_url # PCDT compliance
twin.viz_data # ready for react-force-graph
API endpoints
| Method | Path | Purpose |
|---|---|---|
| POST | /api/gemeo/build |
create twin from case |
| GET | /api/gemeo/{id} |
full twin |
| POST | /api/gemeo/{id}/evolve |
add new clinical data |
| POST | /api/gemeo/{id}/whatif |
counterfactual |
| POST | /api/gemeo/{id}/feedback |
record correction |
| GET | /api/gemeo/{id}/{cohort,subgraph,trajectory,risk,drugs,trials,next-questions,sus,viz} |
per-capability getters |
| GET | /api/gemeo/health |
bridge + feedback stats |
Architecture
Two-tier:
- Bootstrap (today) β wraps existing swarm-py modules + raras-app artifacts. Everything works on day-1, no training needed.
- Phase-2 SOTA (training) β
gemeo/train/scaffolds for HGT, TxGNN, TGNN, NeuralSurv, CF-GNN. When checkpoints land ingemeo/artifacts/, the runtime auto-detects and overrides bootstrap paths.
ββββββββββββββββββββββββββ
β raras-app β
β data/graph-ml/*.npz β β read-only via gemeo.bridge
β Patient.embedding (Neo4j)β
β /grafo (force-graph) β β consumes /api/gemeo/{id}/viz
βββββββββββββββ¬βββββββββββββββ
β
βββββββββββββββΌβββββββββββββββ
β gemeo (this module) β
β β
β bridge.py ββ load .npz β
β encoder.py ββ HGT or boot β
β cohort.py ββ kNN+graph β
β subgraph.py ββ KG sparsify β
β trajectory ββ TGNN or LLM β
β risk.py ββ NeuralSurv β
β repurpose ββ TxGNN+SUS β
β whatif.py ββ CF-GNN β
β ask.py ββ info-gain β
β ground_sus ββ PCDT/UF β
β feedback ββ jsonl ledgerβ
β viz.py ββ force-graph β
β core.py ββ orchestratorβ
β api.py ββ FastAPI β
βββββββββββββββ¬βββββββββββββββ
β
βββββββββββββββΌβββββββββββββββ
β swarm-py existing infra β
β digital_twin_workflow β
β patient_space (KG) β
β trajectory_engine, risk_qua β
β drug_repurposer, trial_ β
β matcher, brazilian_context β
ββββββββββββββββββββββββββββββ
What's bootstrap vs. learned
| Capability | Bootstrap (works today) | Phase-2 SOTA |
|---|---|---|
| Patient embedding | Weighted mean of fused-768/3072-dim disease+HPO+gene embeddings (matches raras-app) | HGT trained on PrimeKG with disease link-pred + patient contrastive losses |
| Cohort | Neo4j vector kNN + Cypher overlap | same retrieval, learned embedding |
| Subgraph | Cypher 1-hop sparsification | KG sparsification trained on diagnostic outcomes |
| Trajectory | LLM over disease natural history | TRANS-style TGNN over snapshot chains |
| Risk / survival | Rule-based severity β exponential survival | NeuralSurv Bayesian survival on KG-walk features |
| Drug repurposing | KG walks DiseaseβGeneβDrug | TxGNN fine-tuned on PrimeKG + SUS auxiliary head |
| What-if | Heuristic: mutate snapshot, re-run | CF-GNNExplainer + do-calculus |
| Active learning | Info-gain over KG annotation frequencies | Bayesian acquisition over learned dx posterior |
Citation
Timmers D, Kawassaki A. Gemeo: Heterogeneous graph foundation model for rare disease digital twins grounded in Brazilian SUS. Raras, 2026.