MacroPert / README.md
Ham-Kris
Fix HF metadata: remove invalid GEO dataset IDs
df4af0e unverified
metadata
license: mit
language:
  - en
tags:
  - biology
  - single-cell
  - scRNA-seq
  - macrophage
  - IBD
  - inflammatory-bowel-disease
  - perturbation
  - gene-knockout
  - scGPT
  - transformers
metrics:
  - pearsonr
  - roc_auc

MacroPert β€” IBD Macrophage Perturbation Model

A fine-tuned scGPT model for predicting macrophage polarization state and IBD disease status from single-cell RNA-seq, combined with in silico gene knockout predictions for 13 IBD target genes.

Model Description

Macrophages in inflammatory bowel disease (IBD) exist along a continuous spectrum between pro-inflammatory (M1) and anti-inflammatory (M2) states, rather than in discrete classes. MacroPert captures this continuous polarization spectrum and predicts how individual gene knockouts shift macrophage state.

The repository contains:

File Description
scgpt_ibd_v2/best_model.pt scGPT v2 fine-tuned weights β€” continuous polarization model
scgpt_ibd_v2/test_metrics.json Test set performance metrics
scgpt_ibd_v1/best_model.pt scGPT v1 weights β€” discrete 4-class baseline (archived)
scgpt_ibd_v1/test_metrics.json v1 test metrics
results/ibd_ko_predictions_combined.json In silico KO predictions for 8 IBD target genes

Performance

Metric scGPT v1 (4-class) scGPT v2 (continuous)
IBD AUROC 0.915 0.989
Polarization Pearson r β€” 0.909
Classification Accuracy 0.754 β€”

Training Data

Fine-tuned on four IBD macrophage scRNA-seq datasets:

GEO Accession Cells Condition
GSE134809 13,794 IBD macrophages (primary)
GSE116222 ~8,000 Ulcerative colitis macrophages
GSE182270 ~6,000 IBD macrophage substates
GSE148810 ~5,000 Crohn's disease macrophages

Model Architecture

  • Base: scGPT pretrained transformer
  • Fine-tuning objective:
    L = 0.5 Γ— L_soft_contrastive + 0.3 Γ— L_MSE(pol) + 0.2 Γ— L_BCE(ibd)
    
  • Polarization score: pol_score = z-score(M1_score βˆ’ M2_score) using literature-based gene signatures
    • M1 genes: TNF, IL1B, IL6, CXCL10, NOS2, CD80, CD86, CCL5, CXCL9, PTGS2, IRF5, HIF1A, CXCL8, IL12A
    • M2 genes: CD163, MRC1, ARG1, IL10, TGFB1, CCL18, CD209, FOLR2, SOCS3, HMOX1, CLEC7A
  • Soft contrastive loss: Gaussian kernel exp(βˆ’dΒ²/2σ²) over polarization score distances β€” shapes embedding space continuously without discrete cluster boundaries
  • Output heads: polarization regression + IBD binary classification

In Silico KO Predictions

KO effects for 13 IBD target genes were predicted using two methods:

Gene Method Key Predicted Effect
HIF1A CellOracle (GRN) PTGS2↓, CXCL8↓, CD74↓
IRF5 CellOracle (GRN) ISG15↓, IFI6↓, IFIT3↓, IRF7↓
IL6 OmniPath propagation JAK2↓, IL6ST↓, JAK1↓, IL6R↓, TYK2↓
SOCS3 OmniPath propagation JAK2↓, STAT5A↓, STAT1↓, STAT3↓; AKT1↑
TNF OmniPath propagation TNFRSF1A↓, TNFRSF1B↓, PIK3CG↓, AKT1↓
TGFB1 OmniPath propagation PIK3R1↓, RAC1↓, TGFBR1↓, GRB2↓
IL1B OmniPath propagation IL1R2↓, MYD88↓, STAT3↓, NR3C1↓
PTGS2 OmniPath propagation IL4R↓, IL2RG↓, NFE2L2↓

Full predictions (top 15 up/down per gene) are in results/ibd_ko_predictions_combined.json.

Usage

import torch
import json

# Load model weights
checkpoint = torch.load("scgpt_ibd_v2/best_model.pt", map_location="cpu")

# Load KO predictions
with open("results/ibd_ko_predictions_combined.json") as f:
    ko_predictions = json.load(f)

# Example: top downregulated genes after IL6 KO
il6_ko = ko_predictions["IL6"]
print(il6_ko["top_downregulated"][:5])
# [['JAK2', -0.207], ['IL6ST', ...], ['JAK1', ...], ...]

Citation

If you use this model, please cite the underlying datasets and tools:

  • scGPT: Cui et al., Nature Methods 2024
  • CellOracle: Kamimoto et al., Nature 2023
  • OmniPath: TΓΌrei et al., Nature Methods 2021
  • GSE134809: Smillie et al., Cell 2019