| --- |
| license: apache-2.0 |
| language: [en, pt] |
| tags: |
| - world-model |
| - patient-digital-twin |
| - reference-architecture |
| - diffusion-forcing |
| - meds |
| - primekg |
| - rare-disease |
| library_name: pytorch |
| pipeline_tag: time-series-forecasting |
| --- |
| |
| # GEMEO Architecture v1.0 |
|
|
| > **A reference architecture for patient world models.** Six principles, |
| > pluggable substrate, three open instances. *Not a model* — a recipe for |
| > building a model. |
|
|
|  |
|
|
| > **New to GEMEO? Want to run it on your own data?** See [**ADAPTING_TO_A_NEW_DATABASE.md**](./ADAPTING_TO_A_NEW_DATABASE.md) — a step-by-step guide to instantiate `gemeo-<your-substrate>` on any MEDS v0.4.1 EHR. |
|
|
| This repo contains the **architecture specification** and a **reference |
| implementation** (Apache-2.0 source, no weights). To use: |
|
|
| 1. Read [`gemeo_architecture_spec_v1.md`](./gemeo_architecture_spec_v1.md) |
| for the 6-principle conformance definition. |
| 2. Copy `reference_impl/` into your repo, adapt to your substrate (any |
| MEDS v0.4.1-compliant EHR), train. |
| 3. Name your instance `gemeo-<substrate>-v<n>` and submit a conformance |
| report. |
|
|
| ## Open instances (May 2026) |
|
|
| **Architecture spec:** v1.0 = single world model (6 principles). **v2.0 = three-pillar Propose→Simulate→Verify** (10 principles, recurrence-aware, L3-targeted) → [`gemeo_architecture_spec_v2.md`](./gemeo_architecture_spec_v2.md). |
|
|
| | Instance | Substrate | Headline | Status | |
| |---|---|---|---| |
| | [`Raras-AI/gemeo-sus`](https://huggingface.co/Raras-AI/gemeo-sus) **(flagship, recurrence-aware)** | DATASUS (42K) | **new-onset Top-1 53.7% vs 38.2%; discontinuation AUROC 0.838 vs 0.696; will-change 0.906; transition-12mo 0.827 — wins every novelty & long-context task** | ✅ released | |
| | [`Raras-AI/gemeo-twin-stack`](https://huggingface.co/Raras-AI/gemeo-twin-stack) | application layer | NeuralSurv + 6-mode digital twin | ✅ released | |
| | `Raras-AI/gemeo-mayo-v?` | Mayo Clinic Platform | multimodal substrate; full L3 counterfactual validation | in proposal | |
| | `Raras-AI/gemeo-mimic-demo` | MIMIC-IV-DEMO | non-SUS architecture proof | in progress | |
|
|
| > **Flagship results.** The recurrence-aware objective makes the model predict *novel* events, not repeats — so the wins are real signal, not autocorrelation. On the public [RareBench-BR Trajectory v2](https://huggingface.co/datasets/Raras-AI/rarebench-br-trajectory) benchmark, with mandatory baselines on the same candidate space and 95% bootstrap CIs, GEMEO leads on **every** novelty and long-context task: new-onset Top-1 **53.7%** vs 38.2%, will-change AUROC **0.906**, transition-within-12mo **0.827**, treatment discontinuation **0.838** vs 0.696. The world model's learned representation pulls clearly ahead on the context-rich tasks that matter most in rare disease. |
|
|
| ## The six architectural principles |
|
|
| 1. **Diffusion Forcing backbone** with per-token σ ∼ 𝒰(0, 1) |
| 2. **Gated KG cross-attention** with tanh(α), α init = 0; real PrimeKG edges |
| 3. **MEDS v0.4.1 substrate** — `(subject_id, time, code, value)` |
| 4. **Bootstrap-then-learn** pattern per inference mode |
| 5. **Bidirectional health-system grounding** (formulary re-rank) |
| 6. **Audit-driven training** (Chinchilla scaling + SOTA component validation) |
|
|
| Full definitions and conformance tests in [`gemeo_architecture_spec_v1.md`](./gemeo_architecture_spec_v1.md). |
|
|
| ## Licensing & scope |
|
|
| **Open the recipe, keep the spice.** |
|
|
| - **This repo (architecture spec + reference implementation):** Apache-2.0 — adopt and build on it freely, with attribution. A reference architecture only becomes a standard if people can use it. |
| - **Trained weights** (`gemeo-sus`, …) and the **RareBench-BR benchmark:** CC-BY-NC 4.0 — no commercial reuse of our trained artifacts/data without a separate agreement. |
| - **Held back on purpose:** the proprietary DATASUS ETL (raw extraction, CNS-hash linkage, trajectory construction), the cohort/preprocessing heuristics, and the future multimodal (Mayo) substrate. The reference `meds_export.py` consumes an already-built trajectory file — it does not produce one. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{gemeo_arch_v1_2026, |
| title = {GEMEO Architecture Specification v1.0: |
| Reference architecture for patient world models}, |
| author = {Timmers, Dimas and Kawassaki, Alexandre and the Raras AI team}, |
| year = {2026}, |
| url = {https://huggingface.co/Raras-AI/gemeo-arch}, |
| } |
| ``` |
|
|
| ⚠️ Research only. Not a medical device. No clinical use. |
|
|