BioTitan: Neural Long-Term Memory for Genomic Foundation Modeling
First application of the TITANS architecture to single-cell genomics, enabling test-time adaptive gene embeddings.
BioTitan applies TITANS (Behrouz et al., Google Research, NeurIPS 2025) to single-cell transcriptomics. Unlike existing genomic foundation models whose gene representations are fixed after training, BioTitan's neural memory updates its weights during inference β gene embeddings improve as the model processes more cells, without any retraining.
Headline Result
Test-time memory adaptation closes 54% of the gap to Geneformer V1 β without any retraining.
BioTitan Static: 0.636 avg AUC (53 tasks)
BioTitan CTX 254K: 0.716 avg AUC β +12.6% relative improvement, zero retraining
Geneformer V1: 0.782 avg AUC (trained on 120Γ more data)
On Expression tasks (23 tasks) β the family where single-cell models are expected to excel β BioTitan CTX reaches 0.815, outperforming Gene2vec (0.773) and approaching Geneformer (0.869), trained on 120Γ less data.
Contextualization saturates at ~60K cells (+0.002 from 60Kβ254K), indicating that clinically-relevant sample sizes are sufficient for effective memory adaptation.
IBM Gene Benchmark (53 Tasks, 5 Families)
All results verified on the same machine using BiomedSciAI/gene-benchmark. Geneformer and Gene2vec baselines reproduced locally. Published baselines from the IBM benchmark paper (Kan-Tor et al., 2024).
Task Family Averages
| Family | Geneformer V1 | Gene2vec | BioTitan Static | BioTitan CTX | Tasks |
|---|---|---|---|---|---|
| Expression | 0.869 | 0.773 | 0.732 | 0.815 | 23 |
| Genomic Properties | 0.782 | 0.725 | 0.640 | 0.687 | 7 |
| Regulatory Functions | 0.759 | 0.769 | 0.623 | 0.704 | 4 |
| Localization | 0.725 | 0.668 | 0.616 | 0.699 | 2 |
| Protein Properties | 0.678 | 0.641 | 0.571 | 0.598 | 17 |
| Overall | 0.782 | 0.715 | 0.636 | 0.716 | 53 |
Comparison with All Published Baselines
Family averages from the IBM benchmark paper's Figure 2 heatmap; BioTitan run locally.
Expression / Localization (23 tasks) β BioTitan's strongest family:
| Model | Type | Avg AUC |
|---|---|---|
| Geneformer | RNA-seq (30M cells) | 0.869 |
| cellPLM | RNA-seq (11M cells) | ~0.85 |
| ScGPT-H | RNA-seq (33M cells) | ~0.84 |
| Gene2vec | Bulk co-expression | ~0.82 |
| BioTitan CTX | RNA-seq (255K cells) | 0.815 |
| ScGPT-B | RNA-seq (10.3M blood) | ~0.75 |
| ESM-1 / ESM-2 | Protein sequence | ~0.74β0.75 |
| MPNet / DNABert-2 | Text / DNA | ~0.72 |
| MTEB-S / MTEB-L | Text | ~0.67β0.71 |
| Bag of Words | Text | ~0.69 |
BioTitan CTX outperforms all text, protein, and DNA models on expression tasks β and all RNA-seq models trained on fewer diverse tissues.
Genomic Properties (7 tasks):
| Model | Type | Avg AUC |
|---|---|---|
| ESM-2 | Protein sequence | 0.84 |
| MTEB-L / Bag of Words | Text | 0.81 |
| ScGPT-H / MPNet | Mixed | 0.80 |
| Geneformer | RNA-seq (30M cells) | 0.79 |
| DNABert-2 | DNA sequence | 0.79 |
| cellPLM | RNA-seq (11M cells) | 0.76 |
| Gene2vec | Bulk co-expression | 0.73 |
| BioTitan CTX | RNA-seq (255K cells) | 0.687 |
| ScGPT-B | RNA-seq (10.3M blood) | 0.67 |
Regulatory Functions (4 tasks):
| Model | Type | Avg AUC |
|---|---|---|
| MTEB-S | Text (335M) | 0.81 |
| ESM-1 / ESM-2 | Protein sequence | 0.79 |
| ScGPT-H | RNA-seq (33M cells) | 0.77 |
| cellPLM | RNA-seq (11M cells) | 0.75 |
| Geneformer / Bag of Words | Mixed | 0.74 |
| Gene2vec | Bulk co-expression | 0.73 |
| BioTitan CTX | RNA-seq (255K cells) | 0.704 |
| ScGPT-B | RNA-seq (10.3M blood) | 0.68 |
| DNABert-2 | DNA sequence | 0.66 |
Selected Binary Tasks (detail)
11 of 53 tasks. Overall averages in the family table above are computed across all 53 tasks (including 42 categorical tasks not shown here).
| Task | Geneformer V1 | Gene2vec | BioTitan Static | BioTitan CTX |
|---|---|---|---|---|
| Dosage sensitive TFs | 0.919 | 0.878 | 0.723 | 0.891 |
| Bivalent vs lys4-only | 0.925 | 0.894 | 0.797 | 0.889 |
| Bivalent vs non-methylated | 0.827 | 0.688 | 0.616 | 0.676 |
| CCD Transcript | 0.797 | 0.744 | 0.638 | 0.647 |
| N1 network | 0.805 | 0.796 | 0.733 | 0.719 |
| HLA class I vs II | 0.745 | 0.925 | 0.445 | 0.730 |
| Gene2Gene | 0.730 | 0.695 | 0.643 | 0.702 |
| TF vs non-TF | 0.749 | 0.719 | 0.630 | 0.698 |
| N1 targets | 0.736 | 0.635 | 0.684 | 0.668 |
| Long vs short range TF | 0.726 | 0.614 | 0.520 | 0.459 |
| CCD Protein | 0.552 | 0.559 | 0.539 | 0.545 |
What This Tells Us
1. Test-time learning is a unique capability. Contextualization improved BioTitan by +0.080 AUC across 53 tasks (0.636β0.716), closing 54% of the gap to Geneformer without any retraining. No other model in this benchmark can do this β their embeddings are architecturally fixed after training.
2. BioTitan excels where expression models should. On Expression tasks (23 tasks), BioTitan CTX (0.815) outperforms every non-RNA-seq model and places 5th among all 13 models evaluated, despite training on 120Γ less data.
3. The gap is data, not architecture. Among RNA-seq models, performance scales with training data: ScGPT-B (10M, single tissue) < BioTitan CTX (255K, 8 tissues) < Gene2vec (bulk) < cellPLM (11M) < Geneformer (30M) < ScGPT-H (33M). BioTitan sits where its data volume predicts β and test-time learning pushes it above its "data class."
4. Contextualization saturates efficiently. Moving from 60K to 254K inference cells yields only +0.002 avg AUC. This means clinically-relevant sample sizes (~10Kβ60K cells) are sufficient for effective memory adaptation β a practical advantage for real-world deployment.
What Is Test-Time Learning?
Existing models (Geneformer, scGPT, AIDO.Cell, scFoundation, cellPLM) process every cell identically at inference β their weights are frozen. BioTitan's TITANS memory MLP updates its own weights during the forward pass via gradient descent on a surprise signal:
Cell 1: Memory is fresh. Gene representations are generic.
Cell 1,000: Memory has learned tissue-specific co-expression patterns.
Cell 60,000: Memory has seen diverse cellular contexts.
Gene representations are now RICHER than the static embedding table.
Further cells provide diminishing returns.
This happens at inference speed (~36 cells/sec on RTX 3090). No optimizer, no backward pass through the full model, no labeled data needed.
Practical implications:
- Feed the model a patient's cells β memory adapts β adapted gene representations in minutes
- No retraining, no fine-tuning, no GPU cluster needed for adaptation
- The same model binary works for every patient, every tissue, every disease
- ~60K cells is sufficient for near-optimal adaptation
Architecture
TITANS Memory-as-Context (MAC) variant with 6 stacked blocks:
| Component | Details |
|---|---|
| Parameters | 18.7M |
| Architecture | TITANS MAC (6 layers, 256 dim, 4 heads) |
| Gene vocabulary | 25,424 (Geneformer-compatible tokenization) |
| Memory | 2-layer MLP per block, chunk-wise gradient updates (128 tokens/step) |
| Persistent memory | 32 learnable tokens per block |
| FFN | SwiGLU, hidden dim 512 |
| Pre-training | Masked gene prediction (15% masking rate) |
| Training data | 254,394 cells from Tabula Sapiens (8 human tissues) |
| Compute | 2 epochs, AdamW, cosine LR, 2ΓRTX 3090 (~8 hours) |
Using Pre-computed Embeddings (no code needed)
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# Load contextualized gene embeddings
df = pd.read_parquet("gene_embeddings_ctx_254k.parquet")
# columns: symbol, dim_0, dim_1, ..., dim_255
# Get embedding for a specific gene
tp53 = df[df['symbol'] == 'TP53'].iloc[:, 1:].values
# Find most similar genes
symbols = df['symbol'].values
embeddings = df.iloc[:, 1:].values
sims = cosine_similarity(tp53, embeddings)[0]
top_10 = np.argsort(-sims)[1:11]
for i in top_10:
print(f" {symbols[i]}: {sims[i]:.3f}")
Loading Model Weights
pip install titans-trainer
from titans_trainer import TitansModel
model = TitansModel.from_pretrained("biotitan-20m-tabula-sapiens.pt")
model.eval()
# Get surprise scores (test-time learning)
surprise = model.get_surprise_scores(token_ids) # (batch, seq_len, n_layers)
# Get cell embeddings
cell_emb = model.get_embeddings(token_ids) # (batch, 256)
Run IBM Gene Benchmark
# Use pre-computed embeddings directly with the benchmark
# No BioTitan code needed -- just the parquet file
# See: https://github.com/BiomedSciAI/gene-benchmark
Training Framework
BioTitan was trained using titans-trainer, a HuggingFace-style training framework for the TITANS architecture.
Training Data
Tabula Sapiens β 254,394 cells from 8 human tissues (Blood, Lung, Heart, Liver, Kidney, Pancreas, Neural, Bone Marrow), tokenized using rank-value encoding with median normalization.
Limitations
- Gene-level only. Cell-level tasks (cell type annotation, perturbation prediction) not yet benchmarked.
- Small training set. 255K cells vs 30β50M for Geneformer/scGPT/AIDO.Cell. Performance scales with data β scaling is expected to close the remaining gap.
- 8 tissues. Broader tissue coverage would improve gene representation diversity.
- Contextualization overhead. Extracting contextualized embeddings requires a forward pass over reference cells (~36 cells/sec on RTX 3090). Static embeddings are instant.
- Some tasks regress with contextualization. 3 of 11 binary tasks show small decreases, suggesting memory saturation effects on certain task types.
- Model weights require titans-trainer.
pip install titans-trainerto load the .pt file. Pre-computed embeddings in parquet format can be used without any dependencies.
Roadmap
- Scale to 30M cells (Genecorpus-30M) β expected to match/exceed Geneformer
- 150M parameter model
- Full IBM benchmark (multi-label and regression tasks)
- Cell-level benchmarks (cell type annotation, zero-shot clustering)
- Disease-specific test-time learning demo (cardiomyopathy, Alzheimer's)
- BERT ablation (same architecture without TITANS memory)
Citation
@article{yermekov2026biotitan,
title={BioTitan: Neural Long-Term Memory for Genomic Foundation Modeling},
author={Yermekov, Akbar},
year={2026}
}
@article{behrouz2025titans,
title={Titans: Learning to Memorize at Test Time},
author={Behrouz, Ali and Zhong, Peilin and Mirrokni, Vahab},
journal={NeurIPS},
year={2025}
}
License
Apache 2.0
- Downloads last month
- 11