MacroPert β IBD Macrophage Perturbation Model
A fine-tuned scGPT model for predicting macrophage polarization state and IBD disease status from single-cell RNA-seq, combined with in silico gene knockout predictions for 13 IBD target genes.
Model Description
Macrophages in inflammatory bowel disease (IBD) exist along a continuous spectrum between pro-inflammatory (M1) and anti-inflammatory (M2) states, rather than in discrete classes. MacroPert captures this continuous polarization spectrum and predicts how individual gene knockouts shift macrophage state.
The repository contains:
| File | Description |
|---|---|
scgpt_ibd_v2/best_model.pt |
scGPT v2 fine-tuned weights β continuous polarization model |
scgpt_ibd_v2/test_metrics.json |
Test set performance metrics |
scgpt_ibd_v1/best_model.pt |
scGPT v1 weights β discrete 4-class baseline (archived) |
scgpt_ibd_v1/test_metrics.json |
v1 test metrics |
results/ibd_ko_predictions_combined.json |
In silico KO predictions for 8 IBD target genes |
Performance
| Metric | scGPT v1 (4-class) | scGPT v2 (continuous) |
|---|---|---|
| IBD AUROC | 0.915 | 0.989 |
| Polarization Pearson r | β | 0.909 |
| Classification Accuracy | 0.754 | β |
Training Data
Fine-tuned on four IBD macrophage scRNA-seq datasets:
| GEO Accession | Cells | Condition |
|---|---|---|
| GSE134809 | 13,794 | IBD macrophages (primary) |
| GSE116222 | ~8,000 | Ulcerative colitis macrophages |
| GSE182270 | ~6,000 | IBD macrophage substates |
| GSE148810 | ~5,000 | Crohn's disease macrophages |
Model Architecture
- Base: scGPT pretrained transformer
- Fine-tuning objective:
L = 0.5 Γ L_soft_contrastive + 0.3 Γ L_MSE(pol) + 0.2 Γ L_BCE(ibd) - Polarization score:
pol_score = z-score(M1_score β M2_score)using literature-based gene signatures- M1 genes: TNF, IL1B, IL6, CXCL10, NOS2, CD80, CD86, CCL5, CXCL9, PTGS2, IRF5, HIF1A, CXCL8, IL12A
- M2 genes: CD163, MRC1, ARG1, IL10, TGFB1, CCL18, CD209, FOLR2, SOCS3, HMOX1, CLEC7A
- Soft contrastive loss: Gaussian kernel
exp(βdΒ²/2ΟΒ²)over polarization score distances β shapes embedding space continuously without discrete cluster boundaries - Output heads: polarization regression + IBD binary classification
In Silico KO Predictions
KO effects for 13 IBD target genes were predicted using two methods:
| Gene | Method | Key Predicted Effect |
|---|---|---|
| HIF1A | CellOracle (GRN) | PTGS2β, CXCL8β, CD74β |
| IRF5 | CellOracle (GRN) | ISG15β, IFI6β, IFIT3β, IRF7β |
| IL6 | OmniPath propagation | JAK2β, IL6STβ, JAK1β, IL6Rβ, TYK2β |
| SOCS3 | OmniPath propagation | JAK2β, STAT5Aβ, STAT1β, STAT3β; AKT1β |
| TNF | OmniPath propagation | TNFRSF1Aβ, TNFRSF1Bβ, PIK3CGβ, AKT1β |
| TGFB1 | OmniPath propagation | PIK3R1β, RAC1β, TGFBR1β, GRB2β |
| IL1B | OmniPath propagation | IL1R2β, MYD88β, STAT3β, NR3C1β |
| PTGS2 | OmniPath propagation | IL4Rβ, IL2RGβ, NFE2L2β |
Full predictions (top 15 up/down per gene) are in results/ibd_ko_predictions_combined.json.
Usage
import torch
import json
# Load model weights
checkpoint = torch.load("scgpt_ibd_v2/best_model.pt", map_location="cpu")
# Load KO predictions
with open("results/ibd_ko_predictions_combined.json") as f:
ko_predictions = json.load(f)
# Example: top downregulated genes after IL6 KO
il6_ko = ko_predictions["IL6"]
print(il6_ko["top_downregulated"][:5])
# [['JAK2', -0.207], ['IL6ST', ...], ['JAK1', ...], ...]
Citation
If you use this model, please cite the underlying datasets and tools:
- scGPT: Cui et al., Nature Methods 2024
- CellOracle: Kamimoto et al., Nature 2023
- OmniPath: TΓΌrei et al., Nature Methods 2021
- GSE134809: Smillie et al., Cell 2019