GEMEO/SUS — Recurrence-Aware Patient World Model

The flagship instance of GEMEO Architecture v2.0 — a patient world model for rare disease, trained on real Brazilian SUS (DATASUS) data. Predicts novel clinical events and long-context outcomes, grounded in a biomedical knowledge graph. In the lineage of Dreamer (Hafner 2025), Diffusion Forcing (Chen, NeurIPS 2024), Sora and Genie.

Family: gemeo-arch (architecture) · gemeo-sus (this, flagship) · gemeo-twin-stack (6-mode app layer) · rarebench-br-trajectory (benchmark)

State-of-the-art results

Evaluated on the public RareBench-BR Trajectory v2 benchmark + a new-onset task, with mandatory baselines on the same candidate space and 95% bootstrap CIs. GEMEO leads on every novelty and long-context task:

Task GEMEO Strong baseline Margin
New-onset prediction (Top-1) 53.7% 38.2% (frequency) +15.5 pp
Will-change (AUROC) 0.906 0.889 (count-based) +0.017
Transition-within-12mo (AUROC) 0.827 0.790 (count-based) +0.037
Treatment discontinuation (AUROC) 0.838 0.696 (count-based) +0.142

Long-context outcomes — especially treatment discontinuation (dropout drives bad outcomes in rare disease) — are where the world model's learned representation pulls clearly ahead of count-based methods, exactly as the EHR literature predicts for context-rich tasks (arXiv 2511.00782). The recurrence-aware objective makes the model predict novel events, not repeats.

Architecture

GEMEO/SUS (19.97M params)
├── Token embedding (tied with LM head)
├── PositionalFeatureEmbed(age, calendar_year, position)
├── 8 × Transformer blocks (SwiGLU + RMSNorm + RoPE + AdaLN-Zero)
│   └── Gated PrimeKG cross-attention (tanh(α), α init=0)
└── Tied LM head
  • Backbone: Causal Diffusion Forcing transformer (per-token σ; Chen et al., NeurIPS 2024).
  • Training objective: recurrence-weighted loss (RAVEN, arXiv 2603.24562) — first occurrences carry full weight, so the model learns genuinely new events.
  • Conditioning: gated cross-attention to a real PrimeKG ego-subgraph (disease–gene, disease–phenotype edges).
  • Recipe: warm-start, WSD LR schedule, bf16, single H100, ~5 min, ≈ $0.40.
  • MEDS v0.4.1 substrate · 42,265 DATASUS rare-disease trajectories.

Subgroup fairness is clean across pediatric / adult / elderly bands. The bundled cdf_v7_10digit_rbt.pt is the 10-digit variant used for the RBT-v2 transition evaluation; cdf_v6_raven.pt is the recurrence-aware flagship.

Usage

import torch, sys; sys.path.append("src")
import torch.nn as nn
from diffusion_forcing_v13 import CDFv13Transformer, CDFv13Config

class PositionalFeatureEmbed(nn.Module):
    def __init__(self, d):
        super().__init__()
        self.age_proj=nn.Linear(1,d//4); self.year_proj=nn.Linear(1,d//4)
        self.pos_proj=nn.Linear(1,d//4); self.combine=nn.Linear(3*(d//4),d); self.norm=nn.LayerNorm(d)
    def forward(self, ages, years, positions):
        a=ages.clamp(0,100)/100; y=(years-2010).clamp(0,20)/20; p=(positions/512).clamp(0,1)
        e=torch.cat([self.age_proj(a.unsqueeze(-1)), self.year_proj(y.unsqueeze(-1)),
                     self.pos_proj(p.unsqueeze(-1))], -1)
        return self.norm(self.combine(e))

ck = torch.load("cdf_v6_raven.pt", map_location="cpu", weights_only=False)
cfg = CDFv13Config(**{k:v for k,v in ck["config"].items() if k in CDFv13Config.__dataclass_fields__})
model = CDFv13Transformer(cfg); model.load_state_dict(ck["model_state"])
pfe = PositionalFeatureEmbed(cfg.d_model); pfe.load_state_dict(ck["pos_feat_state"])

Scope

GEMEO leads on novelty (new-onset) and long-context outcomes (discontinuation, time-to-transition, will-change). For single-step procedure transitions, count-based methods remain competitive — the world model's value is in long-range trajectory reasoning. Rigorous counterfactual/interventional validation and the full three-pillar loop (KG proposer + agentic verifier) extend naturally to a multimodal substrate (notes, WES, labs).

Citation

@misc{gemeo_sus_2026,
  title  = {GEMEO/SUS: Recurrence-Aware Patient World Model for Rare Disease},
  author = {Timmers, Dimas and the Raras AI team},
  year   = {2026},
  url    = {https://huggingface.co/Raras-AI/gemeo-sus}
}

⚠️ Research only. Not a medical device. No clinical use without physician oversight and applicable regulatory clearance.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Raras-AI/gemeo-sus