elonlit
/

GeneJEPA

+---
+license: mit
+language:
+- en
+library_name: pytorch
+tags:
+- scRNA-seq
+- single-cell
+- self-supervised-learning
+- JEPA
+- biology
+datasets:
+- vevotx/Tahoe-100M
+pretty_name: GeneJEPA (Perceiver JEPA for scRNA-seq)
+pipeline_tag: feature-extraction
+---
+# GeneJEPA — A Perceiver-style JEPA for scRNA-seq
+**GeneJEPA** is a Joint-Embedding Predictive Architecture (JEPA) trained for self-supervised representation learning on single-cell RNA-seq.
+It uses a Perceiver-style encoder to handle sparse, high-dimensional gene count vectors and learns from masked block prediction—no labels required.
+> **Why?** Produce compact cell embeddings you can use for clustering, transfer learning, linear probes, and downstream biological tasks.
+---
+## Repository contents
+This model repo intentionally contains **artifacts only** (no training code):
+- **`genejepa-epoch=49.ckpt`** — final PyTorch Lightning checkpoint (student encoder + predictor + EMA state, etc.)
+- **`gene_metadata.parquet`** — mapping between foundation token IDs and gene identifiers used to build the embedding vocab.
+- **`global_stats.json`** — global `log1p(counts)` normalization stats (`mean`, `std`) computed over a large sample of training data.
+---
+## Model summary
+- **Backbone:** Perceiver-style encoder over tokenized genes (identity + Fourier features of expression value)
+- **Latents:** 512
+- **Dimensionality:** 768
+- **Blocks:** 24 transformer blocks on the latent array
+- **Heads:** 12
+- **Masking:** stochastic, block-wise targets with context complement
+- **Predictor:** BYOL-style MLP head
+- **EMA teacher:** maintained during training (for targets)
+> Default tokenizer Fourier settings: `N_f=64`, `min_freq=0.1`, `max_freq=100.0`, `freq_scale=1.0`.
+## Download artifacts
+```python
+from huggingface_hub import hf_hub_download
+ckpt_path = hf_hub_download(repo_id="<your-username>/<your-model-id>",
+                            filename="genejepa-epoch=49.ckpt")
+meta_path = hf_hub_download(repo_id="<your-username>/<your-model-id>",
+                            filename="gene_metadata.parquet")
+stats_path = hf_hub_download(repo_id="<your-username>/<your-model-id>",
+                             filename="global_stats.json")
+```
+## Contact
+elonlit@biostate.ai