--- license: mit language: - en tags: - biology - single-cell - scRNA-seq - macrophage - IBD - inflammatory-bowel-disease - perturbation - gene-knockout - scGPT - transformers metrics: - pearsonr - roc_auc --- # MacroPert — IBD Macrophage Perturbation Model A fine-tuned scGPT model for predicting macrophage polarization state and IBD disease status from single-cell RNA-seq, combined with in silico gene knockout predictions for 13 IBD target genes. ## Model Description Macrophages in inflammatory bowel disease (IBD) exist along a continuous spectrum between pro-inflammatory (M1) and anti-inflammatory (M2) states, rather than in discrete classes. **MacroPert** captures this continuous polarization spectrum and predicts how individual gene knockouts shift macrophage state. The repository contains: | File | Description | |------|-------------| | `scgpt_ibd_v2/best_model.pt` | scGPT v2 fine-tuned weights — continuous polarization model | | `scgpt_ibd_v2/test_metrics.json` | Test set performance metrics | | `scgpt_ibd_v1/best_model.pt` | scGPT v1 weights — discrete 4-class baseline (archived) | | `scgpt_ibd_v1/test_metrics.json` | v1 test metrics | | `results/ibd_ko_predictions_combined.json` | In silico KO predictions for 8 IBD target genes | ## Performance | Metric | scGPT v1 (4-class) | scGPT v2 (continuous) | |--------|--------------------|-----------------------| | IBD AUROC | 0.915 | **0.989** | | Polarization Pearson r | — | **0.909** | | Classification Accuracy | 0.754 | — | ## Training Data Fine-tuned on four IBD macrophage scRNA-seq datasets: | GEO Accession | Cells | Condition | |---------------|-------|-----------| | GSE134809 | 13,794 | IBD macrophages (primary) | | GSE116222 | ~8,000 | Ulcerative colitis macrophages | | GSE182270 | ~6,000 | IBD macrophage substates | | GSE148810 | ~5,000 | Crohn's disease macrophages | ## Model Architecture - **Base:** scGPT pretrained transformer - **Fine-tuning objective:** ``` L = 0.5 × L_soft_contrastive + 0.3 × L_MSE(pol) + 0.2 × L_BCE(ibd) ``` - **Polarization score:** `pol_score = z-score(M1_score − M2_score)` using literature-based gene signatures - M1 genes: TNF, IL1B, IL6, CXCL10, NOS2, CD80, CD86, CCL5, CXCL9, PTGS2, IRF5, HIF1A, CXCL8, IL12A - M2 genes: CD163, MRC1, ARG1, IL10, TGFB1, CCL18, CD209, FOLR2, SOCS3, HMOX1, CLEC7A - **Soft contrastive loss:** Gaussian kernel `exp(−d²/2σ²)` over polarization score distances — shapes embedding space continuously without discrete cluster boundaries - **Output heads:** polarization regression + IBD binary classification ## In Silico KO Predictions KO effects for 13 IBD target genes were predicted using two methods: | Gene | Method | Key Predicted Effect | |------|--------|---------------------| | HIF1A | CellOracle (GRN) | PTGS2↓, CXCL8↓, CD74↓ | | IRF5 | CellOracle (GRN) | ISG15↓, IFI6↓, IFIT3↓, IRF7↓ | | IL6 | OmniPath propagation | JAK2↓, IL6ST↓, JAK1↓, IL6R↓, TYK2↓ | | SOCS3 | OmniPath propagation | JAK2↓, STAT5A↓, STAT1↓, STAT3↓; AKT1↑ | | TNF | OmniPath propagation | TNFRSF1A↓, TNFRSF1B↓, PIK3CG↓, AKT1↓ | | TGFB1 | OmniPath propagation | PIK3R1↓, RAC1↓, TGFBR1↓, GRB2↓ | | IL1B | OmniPath propagation | IL1R2↓, MYD88↓, STAT3↓, NR3C1↓ | | PTGS2 | OmniPath propagation | IL4R↓, IL2RG↓, NFE2L2↓ | Full predictions (top 15 up/down per gene) are in `results/ibd_ko_predictions_combined.json`. ## Usage ```python import torch import json # Load model weights checkpoint = torch.load("scgpt_ibd_v2/best_model.pt", map_location="cpu") # Load KO predictions with open("results/ibd_ko_predictions_combined.json") as f: ko_predictions = json.load(f) # Example: top downregulated genes after IL6 KO il6_ko = ko_predictions["IL6"] print(il6_ko["top_downregulated"][:5]) # [['JAK2', -0.207], ['IL6ST', ...], ['JAK1', ...], ...] ``` ## Citation If you use this model, please cite the underlying datasets and tools: - **scGPT:** Cui et al., *Nature Methods* 2024 - **CellOracle:** Kamimoto et al., *Nature* 2023 - **OmniPath:** Türei et al., *Nature Methods* 2021 - **GSE134809:** Smillie et al., *Cell* 2019