| # Gemeo |
|
|
| > **SOTA digital twin module for rare disease patients, grounded in Brazilian SUS.** |
| > A learned, graph-native, continuously-evolving twin that fuses a Heterogeneous |
| > Graph Transformer over PrimeKG with the country's public-health constraints. |
|
|
| ``` |
| gemeo = patient embedding |
| + cohort retrieval (patients-like-mine) |
| + reasoning subgraph (KG sparsification) |
| + trajectory (TGNN over snapshot chains) |
| + risk + survival (NeuralSurv) |
| + drug repurposing (TxGNN fine-tuned) |
| + active learning (info-gain on KG) |
| + counterfactual (what-if engine) |
| + SUS grounding (PCDT/CEAF/UF) |
| + feedback loop |
| + viz payload |
| ``` |
|
|
| ## Installation |
|
|
| Already part of `rarasnet-swarm-py`. Auto-mounted in `main.py` at `/api/gemeo/*`. |
|
|
| Optional Phase-2 training: |
| ```bash |
| pip install torch_geometric tqdm |
| python -m gemeo.train.primekg |
| python -m gemeo.train.hgt |
| ``` |
|
|
| ## Quickstart |
|
|
| ```python |
| from gemeo import build_gemeo, what_if |
| |
| twin = await build_gemeo( |
| case_text="Menino, 5 anos, ataxia progressiva, telangiectasia, AFP elevado.", |
| patient_info={"age": 5, "sex": "M"}, |
| context={"sus_region": "SP"}, |
| ) |
| |
| twin.diagnoses[:3] # top hypotheses (ranked) |
| twin.cohort.members[:5] # patients-like-mine |
| twin.subgraph # reasoning subgraph |
| twin.trajectory.horizons # 6/12/24m predictions |
| twin.risk.survival_curve # months β P(alive) |
| twin.drugs.candidates[:3] # repurposing |
| twin.next_questions[:3] # active learning |
| twin.sus_check.pcdt_url # PCDT compliance |
| twin.viz_data # ready for react-force-graph |
| ``` |
|
|
| ## API endpoints |
|
|
| | Method | Path | Purpose | |
| |---|---|---| |
| | POST | `/api/gemeo/build` | create twin from case | |
| | GET | `/api/gemeo/{id}` | full twin | |
| | POST | `/api/gemeo/{id}/evolve` | add new clinical data | |
| | POST | `/api/gemeo/{id}/whatif` | counterfactual | |
| | POST | `/api/gemeo/{id}/feedback` | record correction | |
| | GET | `/api/gemeo/{id}/{cohort,subgraph,trajectory,risk,drugs,trials,next-questions,sus,viz}` | per-capability getters | |
| | GET | `/api/gemeo/health` | bridge + feedback stats | |
|
|
| ## Architecture |
|
|
| Two-tier: |
|
|
| - **Bootstrap (today)** β wraps existing swarm-py modules + raras-app artifacts. |
| Everything works on day-1, no training needed. |
| - **Phase-2 SOTA (training)** β `gemeo/train/` scaffolds for HGT, TxGNN, |
| TGNN, NeuralSurv, CF-GNN. When checkpoints land in `gemeo/artifacts/`, |
| the runtime auto-detects and overrides bootstrap paths. |
|
|
| ``` |
| ββββββββββββββββββββββββββ |
| β raras-app β |
| β data/graph-ml/*.npz β β read-only via gemeo.bridge |
| β Patient.embedding (Neo4j)β |
| β /grafo (force-graph) β β consumes /api/gemeo/{id}/viz |
| βββββββββββββββ¬βββββββββββββββ |
| β |
| βββββββββββββββΌβββββββββββββββ |
| β gemeo (this module) β |
| β β |
| β bridge.py ββ load .npz β |
| β encoder.py ββ HGT or boot β |
| β cohort.py ββ kNN+graph β |
| β subgraph.py ββ KG sparsify β |
| β trajectory ββ TGNN or LLM β |
| β risk.py ββ NeuralSurv β |
| β repurpose ββ TxGNN+SUS β |
| β whatif.py ββ CF-GNN β |
| β ask.py ββ info-gain β |
| β ground_sus ββ PCDT/UF β |
| β feedback ββ jsonl ledgerβ |
| β viz.py ββ force-graph β |
| β core.py ββ orchestratorβ |
| β api.py ββ FastAPI β |
| βββββββββββββββ¬βββββββββββββββ |
| β |
| βββββββββββββββΌβββββββββββββββ |
| β swarm-py existing infra β |
| β digital_twin_workflow β |
| β patient_space (KG) β |
| β trajectory_engine, risk_qua β |
| β drug_repurposer, trial_ β |
| β matcher, brazilian_context β |
| ββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ## What's bootstrap vs. learned |
|
|
| | Capability | Bootstrap (works today) | Phase-2 SOTA | |
| |---|---|---| |
| | **Patient embedding** | Weighted mean of fused-768/3072-dim disease+HPO+gene embeddings (matches raras-app) | HGT trained on PrimeKG with disease link-pred + patient contrastive losses | |
| | **Cohort** | Neo4j vector kNN + Cypher overlap | same retrieval, learned embedding | |
| | **Subgraph** | Cypher 1-hop sparsification | KG sparsification trained on diagnostic outcomes | |
| | **Trajectory** | LLM over disease natural history | TRANS-style TGNN over snapshot chains | |
| | **Risk / survival** | Rule-based severity β exponential survival | NeuralSurv Bayesian survival on KG-walk features | |
| | **Drug repurposing** | KG walks DiseaseβGeneβDrug | TxGNN fine-tuned on PrimeKG + SUS auxiliary head | |
| | **What-if** | Heuristic: mutate snapshot, re-run | CF-GNNExplainer + do-calculus | |
| | **Active learning** | Info-gain over KG annotation frequencies | Bayesian acquisition over learned dx posterior | |
|
|
| ## Citation |
|
|
| Timmers D, Kawassaki A. *Gemeo: Heterogeneous graph foundation model for rare disease digital twins grounded in Brazilian SUS.* Raras, 2026. |
|
|