--- license: apache-2.0 language: [en, pt] tags: - world-model - patient-digital-twin - reference-architecture - diffusion-forcing - meds - primekg - rare-disease library_name: pytorch pipeline_tag: time-series-forecasting --- # GEMEO Architecture v1.0 > **A reference architecture for patient world models.** Six principles, > pluggable substrate, three open instances. *Not a model* — a recipe for > building a model. ![GEMEO Architecture v2.0 — Propose → Simulate → Verify](./figure1_gemeo_architecture.png) > **New to GEMEO? Want to run it on your own data?** See [**ADAPTING_TO_A_NEW_DATABASE.md**](./ADAPTING_TO_A_NEW_DATABASE.md) — a step-by-step guide to instantiate `gemeo-` on any MEDS v0.4.1 EHR. This repo contains the **architecture specification** and a **reference implementation** (Apache-2.0 source, no weights). To use: 1. Read [`gemeo_architecture_spec_v1.md`](./gemeo_architecture_spec_v1.md) for the 6-principle conformance definition. 2. Copy `reference_impl/` into your repo, adapt to your substrate (any MEDS v0.4.1-compliant EHR), train. 3. Name your instance `gemeo--v` and submit a conformance report. ## Open instances (May 2026) **Architecture spec:** v1.0 = single world model (6 principles). **v2.0 = three-pillar Propose→Simulate→Verify** (10 principles, recurrence-aware, L3-targeted) → [`gemeo_architecture_spec_v2.md`](./gemeo_architecture_spec_v2.md). | Instance | Substrate | Headline | Status | |---|---|---|---| | [`Raras-AI/gemeo-sus`](https://huggingface.co/Raras-AI/gemeo-sus) **(flagship, recurrence-aware)** | DATASUS (42K) | **new-onset Top-1 53.7% vs 38.2%; discontinuation AUROC 0.838 vs 0.696; will-change 0.906; transition-12mo 0.827 — wins every novelty & long-context task** | ✅ released | | [`Raras-AI/gemeo-twin-stack`](https://huggingface.co/Raras-AI/gemeo-twin-stack) | application layer | NeuralSurv + 6-mode digital twin | ✅ released | | `Raras-AI/gemeo-mayo-v?` | Mayo Clinic Platform | multimodal substrate; full L3 counterfactual validation | in proposal | | `Raras-AI/gemeo-mimic-demo` | MIMIC-IV-DEMO | non-SUS architecture proof | in progress | > **Flagship results.** The recurrence-aware objective makes the model predict *novel* events, not repeats — so the wins are real signal, not autocorrelation. On the public [RareBench-BR Trajectory v2](https://huggingface.co/datasets/Raras-AI/rarebench-br-trajectory) benchmark, with mandatory baselines on the same candidate space and 95% bootstrap CIs, GEMEO leads on **every** novelty and long-context task: new-onset Top-1 **53.7%** vs 38.2%, will-change AUROC **0.906**, transition-within-12mo **0.827**, treatment discontinuation **0.838** vs 0.696. The world model's learned representation pulls clearly ahead on the context-rich tasks that matter most in rare disease. ## The six architectural principles 1. **Diffusion Forcing backbone** with per-token σ ∼ 𝒰(0, 1) 2. **Gated KG cross-attention** with tanh(α), α init = 0; real PrimeKG edges 3. **MEDS v0.4.1 substrate** — `(subject_id, time, code, value)` 4. **Bootstrap-then-learn** pattern per inference mode 5. **Bidirectional health-system grounding** (formulary re-rank) 6. **Audit-driven training** (Chinchilla scaling + SOTA component validation) Full definitions and conformance tests in [`gemeo_architecture_spec_v1.md`](./gemeo_architecture_spec_v1.md). ## Licensing & scope **Open the recipe, keep the spice.** - **This repo (architecture spec + reference implementation):** Apache-2.0 — adopt and build on it freely, with attribution. A reference architecture only becomes a standard if people can use it. - **Trained weights** (`gemeo-sus`, …) and the **RareBench-BR benchmark:** CC-BY-NC 4.0 — no commercial reuse of our trained artifacts/data without a separate agreement. - **Held back on purpose:** the proprietary DATASUS ETL (raw extraction, CNS-hash linkage, trajectory construction), the cohort/preprocessing heuristics, and the future multimodal (Mayo) substrate. The reference `meds_export.py` consumes an already-built trajectory file — it does not produce one. ## Citation ```bibtex @misc{gemeo_arch_v1_2026, title = {GEMEO Architecture Specification v1.0: Reference architecture for patient world models}, author = {Timmers, Dimas and Kawassaki, Alexandre and the Raras AI team}, year = {2026}, url = {https://huggingface.co/Raras-AI/gemeo-arch}, } ``` ⚠️ Research only. Not a medical device. No clinical use.