NTV3-AR-650M unconditional enhancer generator (T1)

Unconditional NTv3 generation baseline for the brain 7-cell enhancer task. There is no cell-type signal at training or inference time, so this model learns p(enhancer) and emits a single mode-collapsed enhancer for every gold input regardless of the requested cell type. Useful as a lower bound for cell-type specificity (see the MDLM/AR rows in explcre/biomodel_reasoning_calling_study2 regureasoner_loop/docs/EXPERIMENTS.md).

Trained on train.enhancer_generation.strat7c.n35k (35k rows stratified across 7 brain cell types). MDLM-8m / MDLM-650m use the weighted-CE diffusion loss (1/t schedule); AR-650m uses standard next-token CE.

Files

best.pt                    # EMA / best-checkpoint state dict
metrics.json               # 28-column eval against gold n7k held-out test
training_log.jsonl         # per-step train metrics
manifest.json              # run config snapshot
predictions_preview.jsonl  # first 200 sampled rows (for spot-check)

Headline eval (held-out gold n7k)

parse_rate	argmax_acc	on_target	off_target	specificity	FID
1.000	0.150	-16.296	-16.339	0.042	75.584

Random argmax baseline over 7 cells = 0.143; the unconditional baselines all sit at chance because the model has no signal to differentiate cell types. FID is misleadingly low (the average enhancer is statistically plausible) but cell-type recall is absent — see the cell-type-conditioned MDLM repos for the conditional fix.

Citation

See the parent repo's EXPERIMENTS.md for the full per-baseline comparison table.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support