BioTitan: Neural Long-Term Memory for Genomic Foundation Modeling

First application of the TITANS architecture to single-cell genomics, enabling test-time adaptive gene embeddings.

BioTitan applies TITANS (Behrouz et al., Google Research, NeurIPS 2025) to single-cell transcriptomics. Unlike existing genomic foundation models whose gene representations are fixed after training, BioTitan's neural memory updates its weights during inference β€” gene embeddings improve as the model processes more cells, without any retraining.

Headline Result

Test-time memory adaptation closes 54% of the gap to Geneformer V1 β€” without any retraining.

BioTitan Static:    0.636 avg AUC (53 tasks)
BioTitan CTX 254K:  0.716 avg AUC  ← +12.6% relative improvement, zero retraining
Geneformer V1:      0.782 avg AUC  (trained on 120Γ— more data)

On Expression tasks (23 tasks) β€” the family where single-cell models are expected to excel β€” BioTitan CTX reaches 0.815, outperforming Gene2vec (0.773) and approaching Geneformer (0.869), trained on 120Γ— less data.

Contextualization saturates at ~60K cells (+0.002 from 60K→254K), indicating that clinically-relevant sample sizes are sufficient for effective memory adaptation.

IBM Gene Benchmark (53 Tasks, 5 Families)

All results verified on the same machine using BiomedSciAI/gene-benchmark. Geneformer and Gene2vec baselines reproduced locally. Published baselines from the IBM benchmark paper (Kan-Tor et al., 2024).

Task Family Averages

Family Geneformer V1 Gene2vec BioTitan Static BioTitan CTX Tasks
Expression 0.869 0.773 0.732 0.815 23
Genomic Properties 0.782 0.725 0.640 0.687 7
Regulatory Functions 0.759 0.769 0.623 0.704 4
Localization 0.725 0.668 0.616 0.699 2
Protein Properties 0.678 0.641 0.571 0.598 17
Overall 0.782 0.715 0.636 0.716 53

Comparison with All Published Baselines

Family averages from the IBM benchmark paper's Figure 2 heatmap; BioTitan run locally.

Expression / Localization (23 tasks) β€” BioTitan's strongest family:

Model Type Avg AUC
Geneformer RNA-seq (30M cells) 0.869
cellPLM RNA-seq (11M cells) ~0.85
ScGPT-H RNA-seq (33M cells) ~0.84
Gene2vec Bulk co-expression ~0.82
BioTitan CTX RNA-seq (255K cells) 0.815
ScGPT-B RNA-seq (10.3M blood) ~0.75
ESM-1 / ESM-2 Protein sequence ~0.74–0.75
MPNet / DNABert-2 Text / DNA ~0.72
MTEB-S / MTEB-L Text ~0.67–0.71
Bag of Words Text ~0.69

BioTitan CTX outperforms all text, protein, and DNA models on expression tasks β€” and all RNA-seq models trained on fewer diverse tissues.

Genomic Properties (7 tasks):

Model Type Avg AUC
ESM-2 Protein sequence 0.84
MTEB-L / Bag of Words Text 0.81
ScGPT-H / MPNet Mixed 0.80
Geneformer RNA-seq (30M cells) 0.79
DNABert-2 DNA sequence 0.79
cellPLM RNA-seq (11M cells) 0.76
Gene2vec Bulk co-expression 0.73
BioTitan CTX RNA-seq (255K cells) 0.687
ScGPT-B RNA-seq (10.3M blood) 0.67

Regulatory Functions (4 tasks):

Model Type Avg AUC
MTEB-S Text (335M) 0.81
ESM-1 / ESM-2 Protein sequence 0.79
ScGPT-H RNA-seq (33M cells) 0.77
cellPLM RNA-seq (11M cells) 0.75
Geneformer / Bag of Words Mixed 0.74
Gene2vec Bulk co-expression 0.73
BioTitan CTX RNA-seq (255K cells) 0.704
ScGPT-B RNA-seq (10.3M blood) 0.68
DNABert-2 DNA sequence 0.66

Selected Binary Tasks (detail)

11 of 53 tasks. Overall averages in the family table above are computed across all 53 tasks (including 42 categorical tasks not shown here).

Task Geneformer V1 Gene2vec BioTitan Static BioTitan CTX
Dosage sensitive TFs 0.919 0.878 0.723 0.891
Bivalent vs lys4-only 0.925 0.894 0.797 0.889
Bivalent vs non-methylated 0.827 0.688 0.616 0.676
CCD Transcript 0.797 0.744 0.638 0.647
N1 network 0.805 0.796 0.733 0.719
HLA class I vs II 0.745 0.925 0.445 0.730
Gene2Gene 0.730 0.695 0.643 0.702
TF vs non-TF 0.749 0.719 0.630 0.698
N1 targets 0.736 0.635 0.684 0.668
Long vs short range TF 0.726 0.614 0.520 0.459
CCD Protein 0.552 0.559 0.539 0.545

What This Tells Us

1. Test-time learning is a unique capability. Contextualization improved BioTitan by +0.080 AUC across 53 tasks (0.636β†’0.716), closing 54% of the gap to Geneformer without any retraining. No other model in this benchmark can do this β€” their embeddings are architecturally fixed after training.

2. BioTitan excels where expression models should. On Expression tasks (23 tasks), BioTitan CTX (0.815) outperforms every non-RNA-seq model and places 5th among all 13 models evaluated, despite training on 120Γ— less data.

3. The gap is data, not architecture. Among RNA-seq models, performance scales with training data: ScGPT-B (10M, single tissue) < BioTitan CTX (255K, 8 tissues) < Gene2vec (bulk) < cellPLM (11M) < Geneformer (30M) < ScGPT-H (33M). BioTitan sits where its data volume predicts β€” and test-time learning pushes it above its "data class."

4. Contextualization saturates efficiently. Moving from 60K to 254K inference cells yields only +0.002 avg AUC. This means clinically-relevant sample sizes (~10K–60K cells) are sufficient for effective memory adaptation β€” a practical advantage for real-world deployment.

What Is Test-Time Learning?

Existing models (Geneformer, scGPT, AIDO.Cell, scFoundation, cellPLM) process every cell identically at inference β€” their weights are frozen. BioTitan's TITANS memory MLP updates its own weights during the forward pass via gradient descent on a surprise signal:

Cell 1:      Memory is fresh. Gene representations are generic.
Cell 1,000:  Memory has learned tissue-specific co-expression patterns.
Cell 60,000: Memory has seen diverse cellular contexts.
             Gene representations are now RICHER than the static embedding table.
             Further cells provide diminishing returns.

This happens at inference speed (~36 cells/sec on RTX 3090). No optimizer, no backward pass through the full model, no labeled data needed.

Practical implications:

  • Feed the model a patient's cells β†’ memory adapts β†’ adapted gene representations in minutes
  • No retraining, no fine-tuning, no GPU cluster needed for adaptation
  • The same model binary works for every patient, every tissue, every disease
  • ~60K cells is sufficient for near-optimal adaptation

Architecture

TITANS Memory-as-Context (MAC) variant with 6 stacked blocks:

Component Details
Parameters 18.7M
Architecture TITANS MAC (6 layers, 256 dim, 4 heads)
Gene vocabulary 25,424 (Geneformer-compatible tokenization)
Memory 2-layer MLP per block, chunk-wise gradient updates (128 tokens/step)
Persistent memory 32 learnable tokens per block
FFN SwiGLU, hidden dim 512
Pre-training Masked gene prediction (15% masking rate)
Training data 254,394 cells from Tabula Sapiens (8 human tissues)
Compute 2 epochs, AdamW, cosine LR, 2Γ—RTX 3090 (~8 hours)

Using Pre-computed Embeddings (no code needed)

import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Load contextualized gene embeddings
df = pd.read_parquet("gene_embeddings_ctx_254k.parquet")
# columns: symbol, dim_0, dim_1, ..., dim_255

# Get embedding for a specific gene
tp53 = df[df['symbol'] == 'TP53'].iloc[:, 1:].values

# Find most similar genes
symbols = df['symbol'].values
embeddings = df.iloc[:, 1:].values
sims = cosine_similarity(tp53, embeddings)[0]
top_10 = np.argsort(-sims)[1:11]
for i in top_10:
    print(f"  {symbols[i]}: {sims[i]:.3f}")

Loading Model Weights

pip install titans-trainer
from titans_trainer import TitansModel

model = TitansModel.from_pretrained("biotitan-20m-tabula-sapiens.pt")
model.eval()

# Get surprise scores (test-time learning)
surprise = model.get_surprise_scores(token_ids)  # (batch, seq_len, n_layers)

# Get cell embeddings
cell_emb = model.get_embeddings(token_ids)  # (batch, 256)

Run IBM Gene Benchmark

# Use pre-computed embeddings directly with the benchmark
# No BioTitan code needed -- just the parquet file
# See: https://github.com/BiomedSciAI/gene-benchmark

Training Framework

BioTitan was trained using titans-trainer, a HuggingFace-style training framework for the TITANS architecture.

Training Data

Tabula Sapiens β€” 254,394 cells from 8 human tissues (Blood, Lung, Heart, Liver, Kidney, Pancreas, Neural, Bone Marrow), tokenized using rank-value encoding with median normalization.

Limitations

  • Gene-level only. Cell-level tasks (cell type annotation, perturbation prediction) not yet benchmarked.
  • Small training set. 255K cells vs 30–50M for Geneformer/scGPT/AIDO.Cell. Performance scales with data β€” scaling is expected to close the remaining gap.
  • 8 tissues. Broader tissue coverage would improve gene representation diversity.
  • Contextualization overhead. Extracting contextualized embeddings requires a forward pass over reference cells (~36 cells/sec on RTX 3090). Static embeddings are instant.
  • Some tasks regress with contextualization. 3 of 11 binary tasks show small decreases, suggesting memory saturation effects on certain task types.
  • Model weights require titans-trainer. pip install titans-trainer to load the .pt file. Pre-computed embeddings in parquet format can be used without any dependencies.

Roadmap

  • Scale to 30M cells (Genecorpus-30M) β€” expected to match/exceed Geneformer
  • 150M parameter model
  • Full IBM benchmark (multi-label and regression tasks)
  • Cell-level benchmarks (cell type annotation, zero-shot clustering)
  • Disease-specific test-time learning demo (cardiomyopathy, Alzheimer's)
  • BERT ablation (same architecture without TITANS memory)

Citation

@article{yermekov2026biotitan,
  title={BioTitan: Neural Long-Term Memory for Genomic Foundation Modeling},
  author={Yermekov, Akbar},
  year={2026}
}

@article{behrouz2025titans,
  title={Titans: Learning to Memorize at Test Time},
  author={Behrouz, Ali and Zhong, Peilin and Mirrokni, Vahab},
  journal={NeurIPS},
  year={2025}
}

License

Apache 2.0

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for pafos-ai/biotitan