| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - biology |
| - single-cell |
| - scRNA-seq |
| - macrophage |
| - IBD |
| - inflammatory-bowel-disease |
| - perturbation |
| - gene-knockout |
| - scGPT |
| - transformers |
| metrics: |
| - pearsonr |
| - roc_auc |
| --- |
| |
| # MacroPert β IBD Macrophage Perturbation Model |
|
|
| A fine-tuned scGPT model for predicting macrophage polarization state and IBD disease status from single-cell RNA-seq, combined with in silico gene knockout predictions for 13 IBD target genes. |
|
|
| ## Model Description |
|
|
| Macrophages in inflammatory bowel disease (IBD) exist along a continuous spectrum between pro-inflammatory (M1) and anti-inflammatory (M2) states, rather than in discrete classes. **MacroPert** captures this continuous polarization spectrum and predicts how individual gene knockouts shift macrophage state. |
|
|
| The repository contains: |
|
|
| | File | Description | |
| |------|-------------| |
| | `scgpt_ibd_v2/best_model.pt` | scGPT v2 fine-tuned weights β continuous polarization model | |
| | `scgpt_ibd_v2/test_metrics.json` | Test set performance metrics | |
| | `scgpt_ibd_v1/best_model.pt` | scGPT v1 weights β discrete 4-class baseline (archived) | |
| | `scgpt_ibd_v1/test_metrics.json` | v1 test metrics | |
| | `results/ibd_ko_predictions_combined.json` | In silico KO predictions for 8 IBD target genes | |
|
|
| ## Performance |
|
|
| | Metric | scGPT v1 (4-class) | scGPT v2 (continuous) | |
| |--------|--------------------|-----------------------| |
| | IBD AUROC | 0.915 | **0.989** | |
| | Polarization Pearson r | β | **0.909** | |
| | Classification Accuracy | 0.754 | β | |
|
|
| ## Training Data |
|
|
| Fine-tuned on four IBD macrophage scRNA-seq datasets: |
|
|
| | GEO Accession | Cells | Condition | |
| |---------------|-------|-----------| |
| | GSE134809 | 13,794 | IBD macrophages (primary) | |
| | GSE116222 | ~8,000 | Ulcerative colitis macrophages | |
| | GSE182270 | ~6,000 | IBD macrophage substates | |
| | GSE148810 | ~5,000 | Crohn's disease macrophages | |
|
|
| ## Model Architecture |
|
|
| - **Base:** scGPT pretrained transformer |
| - **Fine-tuning objective:** |
| ``` |
| L = 0.5 Γ L_soft_contrastive + 0.3 Γ L_MSE(pol) + 0.2 Γ L_BCE(ibd) |
| ``` |
| - **Polarization score:** `pol_score = z-score(M1_score β M2_score)` using literature-based gene signatures |
| - M1 genes: TNF, IL1B, IL6, CXCL10, NOS2, CD80, CD86, CCL5, CXCL9, PTGS2, IRF5, HIF1A, CXCL8, IL12A |
| - M2 genes: CD163, MRC1, ARG1, IL10, TGFB1, CCL18, CD209, FOLR2, SOCS3, HMOX1, CLEC7A |
| - **Soft contrastive loss:** Gaussian kernel `exp(βdΒ²/2ΟΒ²)` over polarization score distances β shapes embedding space continuously without discrete cluster boundaries |
| - **Output heads:** polarization regression + IBD binary classification |
|
|
| ## In Silico KO Predictions |
|
|
| KO effects for 13 IBD target genes were predicted using two methods: |
|
|
| | Gene | Method | Key Predicted Effect | |
| |------|--------|---------------------| |
| | HIF1A | CellOracle (GRN) | PTGS2β, CXCL8β, CD74β | |
| | IRF5 | CellOracle (GRN) | ISG15β, IFI6β, IFIT3β, IRF7β | |
| | IL6 | OmniPath propagation | JAK2β, IL6STβ, JAK1β, IL6Rβ, TYK2β | |
| | SOCS3 | OmniPath propagation | JAK2β, STAT5Aβ, STAT1β, STAT3β; AKT1β | |
| | TNF | OmniPath propagation | TNFRSF1Aβ, TNFRSF1Bβ, PIK3CGβ, AKT1β | |
| | TGFB1 | OmniPath propagation | PIK3R1β, RAC1β, TGFBR1β, GRB2β | |
| | IL1B | OmniPath propagation | IL1R2β, MYD88β, STAT3β, NR3C1β | |
| | PTGS2 | OmniPath propagation | IL4Rβ, IL2RGβ, NFE2L2β | |
|
|
| Full predictions (top 15 up/down per gene) are in `results/ibd_ko_predictions_combined.json`. |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| import json |
| |
| # Load model weights |
| checkpoint = torch.load("scgpt_ibd_v2/best_model.pt", map_location="cpu") |
| |
| # Load KO predictions |
| with open("results/ibd_ko_predictions_combined.json") as f: |
| ko_predictions = json.load(f) |
| |
| # Example: top downregulated genes after IL6 KO |
| il6_ko = ko_predictions["IL6"] |
| print(il6_ko["top_downregulated"][:5]) |
| # [['JAK2', -0.207], ['IL6ST', ...], ['JAK1', ...], ...] |
| ``` |
|
|
| ## Citation |
|
|
| If you use this model, please cite the underlying datasets and tools: |
|
|
| - **scGPT:** Cui et al., *Nature Methods* 2024 |
| - **CellOracle:** Kamimoto et al., *Nature* 2023 |
| - **OmniPath:** TΓΌrei et al., *Nature Methods* 2021 |
| - **GSE134809:** Smillie et al., *Cell* 2019 |
|
|