GEMEO Architecture v1.0

A reference architecture for patient world models. Six principles, pluggable substrate, three open instances. Not a model — a recipe for building a model.

GEMEO Architecture v2.0 — Propose → Simulate → Verify

New to GEMEO? Want to run it on your own data? See ADAPTING_TO_A_NEW_DATABASE.md — a step-by-step guide to instantiate gemeo-<your-substrate> on any MEDS v0.4.1 EHR.

This repo contains the architecture specification and a reference implementation (Apache-2.0 source, no weights). To use:

  1. Read gemeo_architecture_spec_v1.md for the 6-principle conformance definition.
  2. Copy reference_impl/ into your repo, adapt to your substrate (any MEDS v0.4.1-compliant EHR), train.
  3. Name your instance gemeo-<substrate>-v<n> and submit a conformance report.

Open instances (May 2026)

Architecture spec: v1.0 = single world model (6 principles). v2.0 = three-pillar Propose→Simulate→Verify (10 principles, recurrence-aware, L3-targeted) → gemeo_architecture_spec_v2.md.

Instance Substrate Headline Status
Raras-AI/gemeo-sus (flagship, recurrence-aware) DATASUS (42K) new-onset Top-1 53.7% vs 38.2%; discontinuation AUROC 0.838 vs 0.696; will-change 0.906; transition-12mo 0.827 — wins every novelty & long-context task ✅ released
Raras-AI/gemeo-twin-stack application layer NeuralSurv + 6-mode digital twin ✅ released
Raras-AI/gemeo-mayo-v? Mayo Clinic Platform multimodal substrate; full L3 counterfactual validation in proposal
Raras-AI/gemeo-mimic-demo MIMIC-IV-DEMO non-SUS architecture proof in progress

Flagship results. The recurrence-aware objective makes the model predict novel events, not repeats — so the wins are real signal, not autocorrelation. On the public RareBench-BR Trajectory v2 benchmark, with mandatory baselines on the same candidate space and 95% bootstrap CIs, GEMEO leads on every novelty and long-context task: new-onset Top-1 53.7% vs 38.2%, will-change AUROC 0.906, transition-within-12mo 0.827, treatment discontinuation 0.838 vs 0.696. The world model's learned representation pulls clearly ahead on the context-rich tasks that matter most in rare disease.

The six architectural principles

  1. Diffusion Forcing backbone with per-token σ ∼ 𝒰(0, 1)
  2. Gated KG cross-attention with tanh(α), α init = 0; real PrimeKG edges
  3. MEDS v0.4.1 substrate(subject_id, time, code, value)
  4. Bootstrap-then-learn pattern per inference mode
  5. Bidirectional health-system grounding (formulary re-rank)
  6. Audit-driven training (Chinchilla scaling + SOTA component validation)

Full definitions and conformance tests in gemeo_architecture_spec_v1.md.

Licensing & scope

Open the recipe, keep the spice.

  • This repo (architecture spec + reference implementation): Apache-2.0 — adopt and build on it freely, with attribution. A reference architecture only becomes a standard if people can use it.
  • Trained weights (gemeo-sus, …) and the RareBench-BR benchmark: CC-BY-NC 4.0 — no commercial reuse of our trained artifacts/data without a separate agreement.
  • Held back on purpose: the proprietary DATASUS ETL (raw extraction, CNS-hash linkage, trajectory construction), the cohort/preprocessing heuristics, and the future multimodal (Mayo) substrate. The reference meds_export.py consumes an already-built trajectory file — it does not produce one.

Citation

@misc{gemeo_arch_v1_2026,
  title  = {GEMEO Architecture Specification v1.0:
            Reference architecture for patient world models},
  author = {Timmers, Dimas and Kawassaki, Alexandre and the Raras AI team},
  year   = {2026},
  url    = {https://huggingface.co/Raras-AI/gemeo-arch},
}

⚠️ Research only. Not a medical device. No clinical use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Raras-AI/gemeo-arch