TERRA-112M
JEPA-based spatial-transcriptomics foundation model (TERRA). Code & docs: https://github.com/Lotfollahi-lab/terra
Training data
Trained on the full HST-Corpus-112M (~112M cells; no held-out split). For benchmarking and downstream analyses on held-out data, see TERRA-96M. See the manuscript for details.
Files
model_checkpoint.ptโ target-encoder weights (inference)model_config.yamlโ model / tokenization configtoken_dictionary.pklโ gene-token vocabularyensembl_dictionary.pklโ gene-name to Ensembl-ID mapping (harmonization)gene_count_dictionary.pklโ gene occurrence counts (rare-gene filtering)
Usage
from app.huggingface import download_pretrained
from app.inference import harmonize_tokenize_embed_pipeline
d = download_pretrained("Lotfollahi-lab/TERRA-112M")
adata = harmonize_tokenize_embed_pipeline(
adata=adata,
model_folder_path=d, # gene-reference files auto-resolved from here
# ... sample_key / batch_key / etc.
)
Citation
<add paper / bioRxiv reference>