X-Cell

A diffusion language model for genome-scale perturbation prediction across diverse cellular contexts.

Status: Model weights and inference code coming soon. The Python API, model weights, and tutorials are under active development. Watch the GitHub repository for release updates.

X-Cell Architecture

Model Description

X-Cell predicts genome-scale transcriptional responses to genetic perturbations across diverse cellular contexts. Trained on X-Atlas/Pisces (25.6M perturbed single cells, 7 CRISPRi Perturb-seq screens), X-Cell integrates multi-modal biological priors through cross-attention and generalizes zero-shot to unseen cell types and perturbations.

Key Results

  • 5x higher Pearson delta than the next-best method on held-out iPSC perturbations
  • Zero-shot T-cell inactivation — predicts CD3 complex inactivators and novel regulators (LRBA, APPL2)
  • LLM-class scaling laws — train loss scales as L(N) ~ N^-0.32 (R^2 = 0.96)
  • Zero-shot cell type generalization to melanocyte progenitors and primary human CD4+ T cells

Model

Model Parameters Description
X-Cell Mini 55M Fast inference; initialized from scGPT

Architecture

X-Cell is a set-level diffusion transformer that operates on sets of cells (not individual cells) and refines predictions iteratively via a masked diffusion process. Key components:

  • Diffusion-based training with 4-step coarse-to-fine refinement at inference
  • Multi-modal biological priors via Flamingo-style cross-attention (ESM-2, STRING, GenePT, DepMap, JUMP-Cell Painting, scGPT)
  • Tied output embeddings with PaLM-style 1/sqrt(d) scaling

Intended Use

X-Cell is designed for predicting transcriptional responses to CRISPRi gene knockdowns. It is intended for research use in computational biology and genomics.

Training Data

Trained on X-Atlas/Pisces — the largest CRISPRi Perturb-seq compendium to date:

Screen Context Perturbations Cells
HCT116 Colorectal cancer 18,924 3.4M
HEK293T Kidney epithelial 18,312 4.5M
HepG2 Hepatocellular carcinoma 9,735 2.6M
iPSC Induced pluripotent stem cells 10,095 4.2M
Jurkat Resting T lymphoblastic leukemia 10,872 2.8M
Jurkat Active CD3/CD28-stimulated T cells 10,878 2.8M
iPSC Multi-Diff Multi-lineage differentiation 12,175 5.1M

Dataset: Xaira-Therapeutics/X-Atlas-Pisces

Usage (Coming Soon)

from xcell import XCell

model = XCell.from_pretrained("Xaira-Therapeutics/X-Cell", variant="mini")
predictions = model.predict("control_cells.h5ad", perturbation="BRCA1")

Full documentation: Xaira-Therapeutics.github.io/x-cell

Citation

@article{xcell2026,
  title   = {X-Cell: Scaling Causal Perturbation Prediction Across Diverse
             Cellular Contexts via Diffusion Language Models},
  year    = {2026},
}

License

This model is released under the CC BY-NC-SA 4.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Xaira-Therapeutics/X-Cell