TERRA-112M

JEPA-based spatial-transcriptomics foundation model (TERRA). Code & docs: https://github.com/Lotfollahi-lab/terra

Training data

Trained on the full HST-Corpus-112M (~112M cells; no held-out split). For benchmarking and downstream analyses on held-out data, see TERRA-96M. See the manuscript for details.

Files

model_checkpoint.pt — target-encoder weights (inference)
model_config.yaml — model / tokenization config
token_dictionary.pkl — gene-token vocabulary
ensembl_dictionary.pkl — gene-name to Ensembl-ID mapping (harmonization)
gene_count_dictionary.pkl — gene occurrence counts (rare-gene filtering)

Usage

from app.huggingface import download_pretrained
from app.inference import harmonize_tokenize_embed_pipeline

d = download_pretrained("Lotfollahi-lab/TERRA-112M")
adata = harmonize_tokenize_embed_pipeline(
    adata=adata,
    model_folder_path=d,            # gene-reference files auto-resolved from here
    # ... sample_key / batch_key / etc.
)

Citation

Downloads last month: -; Downloads are not tracked for this model. How to track