# Gemeo > **SOTA digital twin module for rare disease patients, grounded in Brazilian SUS.** > A learned, graph-native, continuously-evolving twin that fuses a Heterogeneous > Graph Transformer over PrimeKG with the country's public-health constraints. ``` gemeo = patient embedding + cohort retrieval (patients-like-mine) + reasoning subgraph (KG sparsification) + trajectory (TGNN over snapshot chains) + risk + survival (NeuralSurv) + drug repurposing (TxGNN fine-tuned) + active learning (info-gain on KG) + counterfactual (what-if engine) + SUS grounding (PCDT/CEAF/UF) + feedback loop + viz payload ``` ## Installation Already part of `rarasnet-swarm-py`. Auto-mounted in `main.py` at `/api/gemeo/*`. Optional Phase-2 training: ```bash pip install torch_geometric tqdm python -m gemeo.train.primekg python -m gemeo.train.hgt ``` ## Quickstart ```python from gemeo import build_gemeo, what_if twin = await build_gemeo( case_text="Menino, 5 anos, ataxia progressiva, telangiectasia, AFP elevado.", patient_info={"age": 5, "sex": "M"}, context={"sus_region": "SP"}, ) twin.diagnoses[:3] # top hypotheses (ranked) twin.cohort.members[:5] # patients-like-mine twin.subgraph # reasoning subgraph twin.trajectory.horizons # 6/12/24m predictions twin.risk.survival_curve # months → P(alive) twin.drugs.candidates[:3] # repurposing twin.next_questions[:3] # active learning twin.sus_check.pcdt_url # PCDT compliance twin.viz_data # ready for react-force-graph ``` ## API endpoints | Method | Path | Purpose | |---|---|---| | POST | `/api/gemeo/build` | create twin from case | | GET | `/api/gemeo/{id}` | full twin | | POST | `/api/gemeo/{id}/evolve` | add new clinical data | | POST | `/api/gemeo/{id}/whatif` | counterfactual | | POST | `/api/gemeo/{id}/feedback` | record correction | | GET | `/api/gemeo/{id}/{cohort,subgraph,trajectory,risk,drugs,trials,next-questions,sus,viz}` | per-capability getters | | GET | `/api/gemeo/health` | bridge + feedback stats | ## Architecture Two-tier: - **Bootstrap (today)** — wraps existing swarm-py modules + raras-app artifacts. Everything works on day-1, no training needed. - **Phase-2 SOTA (training)** — `gemeo/train/` scaffolds for HGT, TxGNN, TGNN, NeuralSurv, CF-GNN. When checkpoints land in `gemeo/artifacts/`, the runtime auto-detects and overrides bootstrap paths. ``` ┌────────────────────────┐ │ raras-app │ │ data/graph-ml/*.npz │ ← read-only via gemeo.bridge │ Patient.embedding (Neo4j)│ │ /grafo (force-graph) │ ← consumes /api/gemeo/{id}/viz └─────────────┬──────────────┘ │ ┌─────────────▼──────────────┐ │ gemeo (this module) │ │ │ │ bridge.py ── load .npz │ │ encoder.py ── HGT or boot │ │ cohort.py ── kNN+graph │ │ subgraph.py ── KG sparsify │ │ trajectory ── TGNN or LLM │ │ risk.py ── NeuralSurv │ │ repurpose ── TxGNN+SUS │ │ whatif.py ── CF-GNN │ │ ask.py ── info-gain │ │ ground_sus ── PCDT/UF │ │ feedback ── jsonl ledger│ │ viz.py ── force-graph │ │ core.py ── orchestrator│ │ api.py ── FastAPI │ └─────────────┬──────────────┘ │ ┌─────────────▼──────────────┐ │ swarm-py existing infra │ │ digital_twin_workflow │ │ patient_space (KG) │ │ trajectory_engine, risk_qua │ │ drug_repurposer, trial_ │ │ matcher, brazilian_context │ └────────────────────────────┘ ``` ## What's bootstrap vs. learned | Capability | Bootstrap (works today) | Phase-2 SOTA | |---|---|---| | **Patient embedding** | Weighted mean of fused-768/3072-dim disease+HPO+gene embeddings (matches raras-app) | HGT trained on PrimeKG with disease link-pred + patient contrastive losses | | **Cohort** | Neo4j vector kNN + Cypher overlap | same retrieval, learned embedding | | **Subgraph** | Cypher 1-hop sparsification | KG sparsification trained on diagnostic outcomes | | **Trajectory** | LLM over disease natural history | TRANS-style TGNN over snapshot chains | | **Risk / survival** | Rule-based severity → exponential survival | NeuralSurv Bayesian survival on KG-walk features | | **Drug repurposing** | KG walks Disease→Gene→Drug | TxGNN fine-tuned on PrimeKG + SUS auxiliary head | | **What-if** | Heuristic: mutate snapshot, re-run | CF-GNNExplainer + do-calculus | | **Active learning** | Info-gain over KG annotation frequencies | Bayesian acquisition over learned dx posterior | ## Citation Timmers D, Kawassaki A. *Gemeo: Heterogeneous graph foundation model for rare disease digital twins grounded in Brazilian SUS.* Raras, 2026.