gemeo-arch / README.md
timmers's picture
Upload README.md with huggingface_hub
e0a9486 verified
---
license: apache-2.0
language: [en, pt]
tags:
- world-model
- patient-digital-twin
- reference-architecture
- diffusion-forcing
- meds
- primekg
- rare-disease
library_name: pytorch
pipeline_tag: time-series-forecasting
---
# GEMEO Architecture v1.0
> **A reference architecture for patient world models.** Six principles,
> pluggable substrate, three open instances. *Not a model* — a recipe for
> building a model.
![GEMEO Architecture v2.0 — Propose → Simulate → Verify](./figure1_gemeo_architecture.png)
> **New to GEMEO? Want to run it on your own data?** See [**ADAPTING_TO_A_NEW_DATABASE.md**](./ADAPTING_TO_A_NEW_DATABASE.md) — a step-by-step guide to instantiate `gemeo-<your-substrate>` on any MEDS v0.4.1 EHR.
This repo contains the **architecture specification** and a **reference
implementation** (Apache-2.0 source, no weights). To use:
1. Read [`gemeo_architecture_spec_v1.md`](./gemeo_architecture_spec_v1.md)
for the 6-principle conformance definition.
2. Copy `reference_impl/` into your repo, adapt to your substrate (any
MEDS v0.4.1-compliant EHR), train.
3. Name your instance `gemeo-<substrate>-v<n>` and submit a conformance
report.
## Open instances (May 2026)
**Architecture spec:** v1.0 = single world model (6 principles). **v2.0 = three-pillar Propose→Simulate→Verify** (10 principles, recurrence-aware, L3-targeted) → [`gemeo_architecture_spec_v2.md`](./gemeo_architecture_spec_v2.md).
| Instance | Substrate | Headline | Status |
|---|---|---|---|
| [`Raras-AI/gemeo-sus`](https://huggingface.co/Raras-AI/gemeo-sus) **(flagship, recurrence-aware)** | DATASUS (42K) | **new-onset Top-1 53.7% vs 38.2%; discontinuation AUROC 0.838 vs 0.696; will-change 0.906; transition-12mo 0.827 — wins every novelty & long-context task** | ✅ released |
| [`Raras-AI/gemeo-twin-stack`](https://huggingface.co/Raras-AI/gemeo-twin-stack) | application layer | NeuralSurv + 6-mode digital twin | ✅ released |
| `Raras-AI/gemeo-mayo-v?` | Mayo Clinic Platform | multimodal substrate; full L3 counterfactual validation | in proposal |
| `Raras-AI/gemeo-mimic-demo` | MIMIC-IV-DEMO | non-SUS architecture proof | in progress |
> **Flagship results.** The recurrence-aware objective makes the model predict *novel* events, not repeats — so the wins are real signal, not autocorrelation. On the public [RareBench-BR Trajectory v2](https://huggingface.co/datasets/Raras-AI/rarebench-br-trajectory) benchmark, with mandatory baselines on the same candidate space and 95% bootstrap CIs, GEMEO leads on **every** novelty and long-context task: new-onset Top-1 **53.7%** vs 38.2%, will-change AUROC **0.906**, transition-within-12mo **0.827**, treatment discontinuation **0.838** vs 0.696. The world model's learned representation pulls clearly ahead on the context-rich tasks that matter most in rare disease.
## The six architectural principles
1. **Diffusion Forcing backbone** with per-token σ ∼ 𝒰(0, 1)
2. **Gated KG cross-attention** with tanh(α), α init = 0; real PrimeKG edges
3. **MEDS v0.4.1 substrate**`(subject_id, time, code, value)`
4. **Bootstrap-then-learn** pattern per inference mode
5. **Bidirectional health-system grounding** (formulary re-rank)
6. **Audit-driven training** (Chinchilla scaling + SOTA component validation)
Full definitions and conformance tests in [`gemeo_architecture_spec_v1.md`](./gemeo_architecture_spec_v1.md).
## Licensing & scope
**Open the recipe, keep the spice.**
- **This repo (architecture spec + reference implementation):** Apache-2.0 — adopt and build on it freely, with attribution. A reference architecture only becomes a standard if people can use it.
- **Trained weights** (`gemeo-sus`, …) and the **RareBench-BR benchmark:** CC-BY-NC 4.0 — no commercial reuse of our trained artifacts/data without a separate agreement.
- **Held back on purpose:** the proprietary DATASUS ETL (raw extraction, CNS-hash linkage, trajectory construction), the cohort/preprocessing heuristics, and the future multimodal (Mayo) substrate. The reference `meds_export.py` consumes an already-built trajectory file — it does not produce one.
## Citation
```bibtex
@misc{gemeo_arch_v1_2026,
title = {GEMEO Architecture Specification v1.0:
Reference architecture for patient world models},
author = {Timmers, Dimas and Kawassaki, Alexandre and the Raras AI team},
year = {2026},
url = {https://huggingface.co/Raras-AI/gemeo-arch},
}
```
⚠️ Research only. Not a medical device. No clinical use.