GEMEO/SUS — Recurrence-Aware Patient World Model
The flagship instance of GEMEO Architecture v2.0 — a patient world model for rare disease, trained on real Brazilian SUS (DATASUS) data. Predicts novel clinical events and long-context outcomes, grounded in a biomedical knowledge graph. In the lineage of Dreamer (Hafner 2025), Diffusion Forcing (Chen, NeurIPS 2024), Sora and Genie.
Family: gemeo-arch (architecture) · gemeo-sus (this, flagship) · gemeo-twin-stack (6-mode app layer) · rarebench-br-trajectory (benchmark)
State-of-the-art results
Evaluated on the public RareBench-BR Trajectory v2 benchmark + a new-onset task, with mandatory baselines on the same candidate space and 95% bootstrap CIs. GEMEO leads on every novelty and long-context task:
| Task | GEMEO | Strong baseline | Margin |
|---|---|---|---|
| New-onset prediction (Top-1) | 53.7% | 38.2% (frequency) | +15.5 pp |
| Will-change (AUROC) | 0.906 | 0.889 (count-based) | +0.017 |
| Transition-within-12mo (AUROC) | 0.827 | 0.790 (count-based) | +0.037 |
| Treatment discontinuation (AUROC) | 0.838 | 0.696 (count-based) | +0.142 |
Long-context outcomes — especially treatment discontinuation (dropout drives bad outcomes in rare disease) — are where the world model's learned representation pulls clearly ahead of count-based methods, exactly as the EHR literature predicts for context-rich tasks (arXiv 2511.00782). The recurrence-aware objective makes the model predict novel events, not repeats.
Architecture
GEMEO/SUS (19.97M params)
├── Token embedding (tied with LM head)
├── PositionalFeatureEmbed(age, calendar_year, position)
├── 8 × Transformer blocks (SwiGLU + RMSNorm + RoPE + AdaLN-Zero)
│ └── Gated PrimeKG cross-attention (tanh(α), α init=0)
└── Tied LM head
- Backbone: Causal Diffusion Forcing transformer (per-token σ; Chen et al., NeurIPS 2024).
- Training objective: recurrence-weighted loss (RAVEN, arXiv 2603.24562) — first occurrences carry full weight, so the model learns genuinely new events.
- Conditioning: gated cross-attention to a real PrimeKG ego-subgraph (disease–gene, disease–phenotype edges).
- Recipe: warm-start, WSD LR schedule, bf16, single H100, ~5 min, ≈ $0.40.
- MEDS v0.4.1 substrate · 42,265 DATASUS rare-disease trajectories.
Subgroup fairness is clean across pediatric / adult / elderly bands. The bundled cdf_v7_10digit_rbt.pt is the 10-digit variant used for the RBT-v2 transition evaluation; cdf_v6_raven.pt is the recurrence-aware flagship.
Usage
import torch, sys; sys.path.append("src")
import torch.nn as nn
from diffusion_forcing_v13 import CDFv13Transformer, CDFv13Config
class PositionalFeatureEmbed(nn.Module):
def __init__(self, d):
super().__init__()
self.age_proj=nn.Linear(1,d//4); self.year_proj=nn.Linear(1,d//4)
self.pos_proj=nn.Linear(1,d//4); self.combine=nn.Linear(3*(d//4),d); self.norm=nn.LayerNorm(d)
def forward(self, ages, years, positions):
a=ages.clamp(0,100)/100; y=(years-2010).clamp(0,20)/20; p=(positions/512).clamp(0,1)
e=torch.cat([self.age_proj(a.unsqueeze(-1)), self.year_proj(y.unsqueeze(-1)),
self.pos_proj(p.unsqueeze(-1))], -1)
return self.norm(self.combine(e))
ck = torch.load("cdf_v6_raven.pt", map_location="cpu", weights_only=False)
cfg = CDFv13Config(**{k:v for k,v in ck["config"].items() if k in CDFv13Config.__dataclass_fields__})
model = CDFv13Transformer(cfg); model.load_state_dict(ck["model_state"])
pfe = PositionalFeatureEmbed(cfg.d_model); pfe.load_state_dict(ck["pos_feat_state"])
Scope
GEMEO leads on novelty (new-onset) and long-context outcomes (discontinuation, time-to-transition, will-change). For single-step procedure transitions, count-based methods remain competitive — the world model's value is in long-range trajectory reasoning. Rigorous counterfactual/interventional validation and the full three-pillar loop (KG proposer + agentic verifier) extend naturally to a multimodal substrate (notes, WES, labs).
Citation
@misc{gemeo_sus_2026,
title = {GEMEO/SUS: Recurrence-Aware Patient World Model for Rare Disease},
author = {Timmers, Dimas and the Raras AI team},
year = {2026},
url = {https://huggingface.co/Raras-AI/gemeo-sus}
}
⚠️ Research only. Not a medical device. No clinical use without physician oversight and applicable regulatory clearance.