NTV3-AR-650M unconditional enhancer generator (T1)
Unconditional NTv3 generation baseline for the brain 7-cell
enhancer task. There is no cell-type signal at training or
inference time, so this model learns p(enhancer) and emits a
single mode-collapsed enhancer for every gold input regardless of
the requested cell type. Useful as a lower bound for cell-type
specificity (see the MDLM/AR rows in
explcre/biomodel_reasoning_calling_study2
regureasoner_loop/docs/EXPERIMENTS.md).
Trained on train.enhancer_generation.strat7c.n35k (35k rows
stratified across 7 brain cell types). MDLM-8m / MDLM-650m use the
weighted-CE diffusion loss (1/t schedule); AR-650m uses standard
next-token CE.
Files
best.pt # EMA / best-checkpoint state dict
metrics.json # 28-column eval against gold n7k held-out test
training_log.jsonl # per-step train metrics
manifest.json # run config snapshot
predictions_preview.jsonl # first 200 sampled rows (for spot-check)
Headline eval (held-out gold n7k)
| parse_rate | argmax_acc | on_target | off_target | specificity | FID |
|---|---|---|---|---|---|
| 1.000 | 0.150 | -16.296 | -16.339 | 0.042 | 75.584 |
Random argmax baseline over 7 cells = 0.143; the unconditional baselines all sit at chance because the model has no signal to differentiate cell types. FID is misleadingly low (the average enhancer is statistically plausible) but cell-type recall is absent — see the cell-type-conditioned MDLM repos for the conditional fix.
Citation
See the parent repo's EXPERIMENTS.md for the full per-baseline
comparison table.