BioTitan: TITANS genomic foundation model with test-time learning

18.7M parameter model trained on 254K Tabula Sapiens cells. Test-time memory adaptation improves gene embeddings by +12.6% AUC
(0.636 -> 0.716 across 53 IBM gene-benchmark tasks), closing 54% of the gap to Geneformer V1 (30M cells) without retraining.

Includes pre-computed gene embeddings (static + contextualized).

Files changed (7) hide show

.gitattributes +3 -0
README.md +215 -0
biotitan-20m-tabula-sapiens.pt +3 -0
config.json +48 -0
gene_embeddings_ctx_254k.parquet +3 -0
gene_embeddings_static.parquet +3 -0
token_dictionary.pkl +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,3 @@

+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,215 @@

+---
+language: en
+license: apache-2.0
+tags:
+  - genomics
+  - single-cell
+  - transcriptomics
+  - gene-expression
+  - foundation-model
+  - titans
+  - test-time-learning
+  - biology
+datasets:
+  - tabula-sapiens
+library_name: pytorch
+pipeline_tag: feature-extraction
+---
+# BioTitan: Neural Long-Term Memory for Genomic Foundation Modeling
+**First application of the TITANS architecture to single-cell genomics, enabling test-time adaptive gene embeddings.**
+BioTitan applies [TITANS](https://arxiv.org/abs/2501.00663) (Behrouz et al., Google Research, NeurIPS 2025) to single-cell transcriptomics. Unlike existing genomic foundation models whose gene representations are fixed after training, BioTitan's neural memory **updates its weights during inference** — gene embeddings improve as the model processes more cells, without any retraining.
+## Headline Result
+Test-time memory adaptation closes **54% of the gap** to Geneformer V1 — without any retraining.
+```
+BioTitan Static:    0.636 avg AUC (53 tasks)
+BioTitan CTX 254K:  0.716 avg AUC  ← +12.6% relative improvement, zero retraining
+Geneformer V1:      0.782 avg AUC  (trained on 120× more data)
+```
+On Expression tasks (23 tasks) — the family where single-cell models are expected to excel — BioTitan CTX reaches **0.815**, outperforming Gene2vec (0.773) and approaching Geneformer (0.869), trained on 120× less data.
+Contextualization saturates at ~60K cells (+0.002 from 60K→254K), indicating that clinically-relevant sample sizes are sufficient for effective memory adaptation.
+## IBM Gene Benchmark (53 Tasks, 5 Families)
+All results verified on the same machine using [BiomedSciAI/gene-benchmark](https://github.com/BiomedSciAI/gene-benchmark). Geneformer and Gene2vec baselines reproduced locally. Published baselines from the [IBM benchmark paper](https://arxiv.org/abs/2412.04075) (Kan-Tor et al., 2024).
+### Task Family Averages
+| Family | Geneformer V1 | Gene2vec | BioTitan Static | **BioTitan CTX** | Tasks |
+|--------|:---:|:---:|:---:|:---:|:---:|
+| Expression | **0.869** | 0.773 | 0.732 | **0.815** | 23 |
+| Genomic Properties | **0.782** | 0.725 | 0.640 | 0.687 | 7 |
+| Regulatory Functions | 0.759 | **0.769** | 0.623 | 0.704 | 4 |
+| Localization | **0.725** | 0.668 | 0.616 | 0.699 | 2 |
+| Protein Properties | **0.678** | 0.641 | 0.571 | 0.598 | 17 |
+| **Overall** | **0.782** | 0.715 | 0.636 | **0.716** | **53** |
+### Comparison with All Published Baselines
+Family averages from the IBM benchmark paper's Figure 2 heatmap; BioTitan run locally.
+**Expression / Localization (23 tasks) — BioTitan's strongest family:**
+| Model | Type | Avg AUC |
+|-------|------|:---:|
+| Geneformer | RNA-seq (30M cells) | **0.869** |
+| cellPLM | RNA-seq (11M cells) | ~0.85 |
+| ScGPT-H | RNA-seq (33M cells) | ~0.84 |
+| Gene2vec | Bulk co-expression | ~0.82 |
+| **BioTitan CTX** | **RNA-seq (255K cells)** | **0.815** |
+| ScGPT-B | RNA-seq (10.3M blood) | ~0.75 |
+| ESM-1 / ESM-2 | Protein sequence | ~0.74–0.75 |
+| MPNet / DNABert-2 | Text / DNA | ~0.72 |
+| MTEB-S / MTEB-L | Text | ~0.67–0.71 |
+| Bag of Words | Text | ~0.69 |
+BioTitan CTX outperforms all text, protein, and DNA models on expression tasks — and all RNA-seq models trained on fewer diverse tissues.
+**Genomic Properties (7 tasks):**
+| Model | Type | Avg AUC |
+|-------|------|:---:|
+| ESM-2 | Protein sequence | 0.84 |
+| MTEB-L / Bag of Words | Text | 0.81 |
+| ScGPT-H / MPNet | Mixed | 0.80 |
+| Geneformer | RNA-seq (30M cells) | 0.79 |
+| DNABert-2 | DNA sequence | 0.79 |
+| cellPLM | RNA-seq (11M cells) | 0.76 |
+| Gene2vec | Bulk co-expression | 0.73 |
+| **BioTitan CTX** | **RNA-seq (255K cells)** | **0.687** |
+| ScGPT-B | RNA-seq (10.3M blood) | 0.67 |
+**Regulatory Functions (4 tasks):**
+| Model | Type | Avg AUC |
+|-------|------|:---:|
+| MTEB-S | Text (335M) | 0.81 |
+| ESM-1 / ESM-2 | Protein sequence | 0.79 |
+| ScGPT-H | RNA-seq (33M cells) | 0.77 |
+| cellPLM | RNA-seq (11M cells) | 0.75 |
+| Geneformer / Bag of Words | Mixed | 0.74 |
+| Gene2vec | Bulk co-expression | 0.73 |
+| **BioTitan CTX** | **RNA-seq (255K cells)** | **0.704** |
+| ScGPT-B | RNA-seq (10.3M blood) | 0.68 |
+| DNABert-2 | DNA sequence | 0.66 |
+### Selected Binary Tasks (detail)
+11 of 53 tasks. Overall averages in the family table above are computed across all 53 tasks (including 42 categorical tasks not shown here).
+| Task | Geneformer V1 | Gene2vec | BioTitan Static | **BioTitan CTX** |
+|------|:---:|:---:|:---:|:---:|
+| Dosage sensitive TFs | **0.919** | 0.878 | 0.723 | 0.891 |
+| Bivalent vs lys4-only | **0.925** | 0.894 | 0.797 | 0.889 |
+| Bivalent vs non-methylated | **0.827** | 0.688 | 0.616 | 0.676 |
+| CCD Transcript | **0.797** | 0.744 | 0.638 | 0.647 |
+| N1 network | **0.805** | 0.796 | 0.733 | 0.719 |
+| HLA class I vs II | 0.745 | **0.925** | 0.445 | 0.730 |
+| Gene2Gene | **0.730** | 0.695 | 0.643 | 0.702 |
+| TF vs non-TF | **0.749** | 0.719 | 0.630 | 0.698 |
+| N1 targets | **0.736** | 0.635 | 0.684 | 0.668 |
+| Long vs short range TF | **0.726** | 0.614 | 0.520 | 0.459 |
+| CCD Protein | 0.552 | **0.559** | 0.539 | 0.545 |
+### What This Tells Us
+**1. Test-time learning is a unique capability.** Contextualization improved BioTitan by +0.080 AUC across 53 tasks (0.636→0.716), closing 54% of the gap to Geneformer without any retraining. No other model in this benchmark can do this — their embeddings are architecturally fixed after training.
+**2. BioTitan excels where expression models should.** On Expression tasks (23 tasks), BioTitan CTX (0.815) outperforms every non-RNA-seq model and places 5th among all 13 models evaluated, despite training on 120× less data.
+**3. The gap is data, not architecture.** Among RNA-seq models, performance scales with training data: ScGPT-B (10M, single tissue) < BioTitan CTX (255K, 8 tissues) < Gene2vec (bulk) < cellPLM (11M) < Geneformer (30M) < ScGPT-H (33M). BioTitan sits where its data volume predicts — and test-time learning pushes it above its "data class."
+**4. Contextualization saturates efficiently.** Moving from 60K to 254K inference cells yields only +0.002 avg AUC. This means clinically-relevant sample sizes (~10K–60K cells) are sufficient for effective memory adaptation — a practical advantage for real-world deployment.
+## What Is Test-Time Learning?
+Existing models (Geneformer, scGPT, AIDO.Cell, scFoundation, cellPLM) process every cell identically at inference — their weights are frozen. BioTitan's TITANS memory MLP updates its own weights during the forward pass via gradient descent on a surprise signal:
+```
+Cell 1:      Memory is fresh. Gene representations are generic.
+Cell 1,000:  Memory has learned tissue-specific co-expression patterns.
+Cell 60,000: Memory has seen diverse cellular contexts.
+             Gene representations are now RICHER than the static embedding table.
+             Further cells provide diminishing returns.
+```
+This happens at inference speed (~36 cells/sec on RTX 3090). No optimizer, no backward pass through the full model, no labeled data needed.
+**Practical implications:**
+- Feed the model a patient's cells → memory adapts → adapted gene representations in minutes
+- No retraining, no fine-tuning, no GPU cluster needed for adaptation
+- The same model binary works for every patient, every tissue, every disease
+- ~60K cells is sufficient for near-optimal adaptation
+## Architecture
+TITANS Memory-as-Context (MAC) variant with 6 stacked blocks:
+| Component | Details |
+|-----------|---------|
+| Parameters | 18.7M |
+| Architecture | TITANS MAC (6 layers, 256 dim, 4 heads) |
+| Gene vocabulary | 25,424 (Geneformer-compatible tokenization) |
+| Memory | 2-layer MLP per block, chunk-wise gradient updates (128 tokens/step) |
+| Persistent memory | 32 learnable tokens per block |
+| FFN | SwiGLU, hidden dim 512 |
+| Pre-training | Masked gene prediction (15% masking rate) |
+| Training data | 254,394 cells from Tabula Sapiens (8 human tissues) |
+| Compute | 2 epochs, AdamW, cosine LR, 2×RTX 3090 (~8 hours) |
+## Training Framework
+BioTitan was trained using [titans-trainer](https://github.com/pafos-ai/titans-trainer), a HuggingFace-style training framework for the TITANS architecture.
+```bash
+pip install titans-trainer
+```
+## Training Data
+[Tabula Sapiens](https://tabula-sapiens-portal.ds.czbiohub.org/) — 254,394 cells from 8 human tissues (Blood, Lung, Heart, Liver, Kidney, Pancreas, Neural, Bone Marrow), tokenized using rank-value encoding with median normalization.
+## Limitations
+- **Gene-level only.** Cell-level tasks (cell type annotation, perturbation prediction) not yet benchmarked.
+- **Small training set.** 255K cells vs 30–50M for Geneformer/scGPT/AIDO.Cell. Performance scales with data — scaling is expected to close the remaining gap.
+- **8 tissues.** Broader tissue coverage would improve gene representation diversity.
+- **Contextualization overhead.** Extracting contextualized embeddings requires a forward pass over reference cells (~36 cells/sec on RTX 3090). Static embeddings are instant.
+- **Some tasks regress with contextualization.** 3 of 11 binary tasks show small decreases, suggesting memory saturation effects on certain task types.
+## Roadmap
+- [ ] Scale to 30M cells (Genecorpus-30M) — expected to match/exceed Geneformer
+- [ ] 150M parameter model
+- [ ] Full IBM benchmark (multi-label and regression tasks)
+- [ ] Cell-level benchmarks (cell type annotation, zero-shot clustering)
+- [ ] Disease-specific test-time learning demo (cardiomyopathy, Alzheimer's)
+- [ ] BERT ablation (same architecture without TITANS memory)
+## Citation
+```bibtex
+@article{yermekov2026biotitan,
+  title={BioTitan: Neural Long-Term Memory for Genomic Foundation Modeling},
+  author={Yermekov, Akbar},
+  year={2026}
+}
+@article{behrouz2025titans,
+  title={Titans: Learning to Memorize at Test Time},
+  author={Behrouz, Ali and Zhong, Peilin and Mirrokni, Vahab},
+  journal={NeurIPS},
+  year={2025}
+}
+```
+## License
+Apache 2.0

biotitan-20m-tabula-sapiens.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e79b40c7fdd87dbf2cd5e831b6545a65fb99b5705f039d7cf2e7bd6a8e7473b
+size 226574859

config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "model_type": "biotitan",
+  "architecture": "titans",
+  "n_genes": 25424,
+  "d_model": 256,
+  "n_layers": 6,
+  "n_heads": 4,
+  "d_ff": 512,
+  "max_seq_len": 2048,
+  "memory_depth": 2,
+  "n_persistent": 32,
+  "dropout": 0.02,
+  "vocab_size": 25426,
+  "pad_token_id": 0,
+  "mask_token_id": 25425,
+  "training": {
+    "dataset": "tabula_sapiens",
+    "n_cells": 254394,
+    "epochs": 2,
+    "batch_size": 32,
+    "learning_rate": 5e-4,
+    "weight_decay": 0.001,
+    "warmup_steps": 300,
+    "mask_prob": 0.15,
+    "optimizer": "AdamW",
+    "mixed_precision": true
+  },
+  "files": {
+    "model_weights": "biotitan-20m-tabula-sapiens.pt",
+    "token_dictionary": "token_dictionary.pkl",
+    "contextualized_embeddings": "gene_embeddings_ctx_254k.parquet",
+    "static_embeddings": "gene_embeddings_static.parquet"
+  },
+  "benchmark_results": {
+    "overall_53_tasks": {
+      "static_auc": 0.636,
+      "ctx_254k_auc": 0.716,
+      "geneformer_v1_auc": 0.782
+    },
+    "family_averages": {
+      "expression_23_tasks": { "static": 0.732, "ctx_254k": 0.815, "geneformer": 0.869 },
+      "genomic_properties_7_tasks": { "static": 0.640, "ctx_254k": 0.687, "geneformer": 0.782 },
+      "regulatory_functions_4_tasks": { "static": 0.623, "ctx_254k": 0.704, "geneformer": 0.759 },
+      "localization_2_tasks": { "static": 0.616, "ctx_254k": 0.699, "geneformer": 0.725 },
+      "protein_properties_17_tasks": { "static": 0.571, "ctx_254k": 0.598, "geneformer": 0.678 }
+    }
+  }
+}

gene_embeddings_ctx_254k.parquet ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1e2309268d0025cce8a0f27d80a1cf0c621fe3f905200aa0d21172e3ce691b3b
+size 52740260

gene_embeddings_static.parquet ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63980543960452f144b25f384b6a658680b1e8275916805b8daa96e06594d2a2
+size 54446255

token_dictionary.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab9dc40973fa5224d77b793e2fd114cacf3d08423ed9c4c49caf0ba9c7f218f1
+size 788424