GEMEO Architecture v1.0
A reference architecture for patient world models. Six principles, pluggable substrate, three open instances. Not a model — a recipe for building a model.
New to GEMEO? Want to run it on your own data? See ADAPTING_TO_A_NEW_DATABASE.md — a step-by-step guide to instantiate
gemeo-<your-substrate>on any MEDS v0.4.1 EHR.
This repo contains the architecture specification and a reference implementation (Apache-2.0 source, no weights). To use:
- Read
gemeo_architecture_spec_v1.mdfor the 6-principle conformance definition. - Copy
reference_impl/into your repo, adapt to your substrate (any MEDS v0.4.1-compliant EHR), train. - Name your instance
gemeo-<substrate>-v<n>and submit a conformance report.
Open instances (May 2026)
Architecture spec: v1.0 = single world model (6 principles). v2.0 = three-pillar Propose→Simulate→Verify (10 principles, recurrence-aware, L3-targeted) → gemeo_architecture_spec_v2.md.
| Instance | Substrate | Headline | Status |
|---|---|---|---|
Raras-AI/gemeo-sus (flagship, recurrence-aware) |
DATASUS (42K) | new-onset Top-1 53.7% vs 38.2%; discontinuation AUROC 0.838 vs 0.696; will-change 0.906; transition-12mo 0.827 — wins every novelty & long-context task | ✅ released |
Raras-AI/gemeo-twin-stack |
application layer | NeuralSurv + 6-mode digital twin | ✅ released |
Raras-AI/gemeo-mayo-v? |
Mayo Clinic Platform | multimodal substrate; full L3 counterfactual validation | in proposal |
Raras-AI/gemeo-mimic-demo |
MIMIC-IV-DEMO | non-SUS architecture proof | in progress |
Flagship results. The recurrence-aware objective makes the model predict novel events, not repeats — so the wins are real signal, not autocorrelation. On the public RareBench-BR Trajectory v2 benchmark, with mandatory baselines on the same candidate space and 95% bootstrap CIs, GEMEO leads on every novelty and long-context task: new-onset Top-1 53.7% vs 38.2%, will-change AUROC 0.906, transition-within-12mo 0.827, treatment discontinuation 0.838 vs 0.696. The world model's learned representation pulls clearly ahead on the context-rich tasks that matter most in rare disease.
The six architectural principles
- Diffusion Forcing backbone with per-token σ ∼ 𝒰(0, 1)
- Gated KG cross-attention with tanh(α), α init = 0; real PrimeKG edges
- MEDS v0.4.1 substrate —
(subject_id, time, code, value) - Bootstrap-then-learn pattern per inference mode
- Bidirectional health-system grounding (formulary re-rank)
- Audit-driven training (Chinchilla scaling + SOTA component validation)
Full definitions and conformance tests in gemeo_architecture_spec_v1.md.
Licensing & scope
Open the recipe, keep the spice.
- This repo (architecture spec + reference implementation): Apache-2.0 — adopt and build on it freely, with attribution. A reference architecture only becomes a standard if people can use it.
- Trained weights (
gemeo-sus, …) and the RareBench-BR benchmark: CC-BY-NC 4.0 — no commercial reuse of our trained artifacts/data without a separate agreement. - Held back on purpose: the proprietary DATASUS ETL (raw extraction, CNS-hash linkage, trajectory construction), the cohort/preprocessing heuristics, and the future multimodal (Mayo) substrate. The reference
meds_export.pyconsumes an already-built trajectory file — it does not produce one.
Citation
@misc{gemeo_arch_v1_2026,
title = {GEMEO Architecture Specification v1.0:
Reference architecture for patient world models},
author = {Timmers, Dimas and Kawassaki, Alexandre and the Raras AI team},
year = {2026},
url = {https://huggingface.co/Raras-AI/gemeo-arch},
}
⚠️ Research only. Not a medical device. No clinical use.
