Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
| 3 |
library_name: pytorch
|
| 4 |
tags:
|
| 5 |
- biology
|
|
@@ -27,6 +27,8 @@ A multimodal deep learning ensemble for predicting T-cell functional states from
|
|
| 27 |
|
| 28 |
**89.6% accuracy** | **macro F1 0.88** | **7 functional states** | **top-5 ensemble**
|
| 29 |
|
|
|
|
|
|
|
| 30 |
## Model Description
|
| 31 |
|
| 32 |
This repository contains the weights for a top-5 ensemble of `FullGenesVJClassifier` models. Each model takes three input modalities:
|
|
@@ -71,15 +73,17 @@ Classification of T-cell functional states from paired scRNA-seq + TCR-seq data.
|
|
| 71 |
|
| 72 |
## Training Data
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
-
| Dataset | Platform | Cells | Tissue |
|
| 77 |
|---|---|---|---|
|
| 78 |
| GSE144469 | 10x Genomics | ~60,000 | Colitis (colon) |
|
| 79 |
| GSE179994 | 10x Genomics | ~77,000 | PBMC (exhaustion study) |
|
| 80 |
| GSE181061 | 10x Genomics | ~31,000 | ccRCC (tumor-infiltrating) |
|
| 81 |
| GSE108989 | Smart-seq2 | ~12,000 | CRC (tumor + blood) |
|
| 82 |
|
|
|
|
|
|
|
| 83 |
Preprocessing: QC β normalization (scanpy) β 3,000 HVGs β Harmony batch correction β CDR3/V/J extraction via scirpy.
|
| 84 |
|
| 85 |
## Evaluation
|
|
@@ -109,37 +113,20 @@ Preprocessing: QC β normalization (scanpy) β 3,000 HVGs β Harmony batch co
|
|
| 109 |
|
| 110 |
## How to Use
|
| 111 |
|
| 112 |
-
### Quick Start
|
| 113 |
|
| 114 |
```bash
|
| 115 |
-
|
| 116 |
-
tcell-
|
|
|
|
|
|
|
| 117 |
```
|
| 118 |
|
| 119 |
-
Model weights (~300 MB)
|
| 120 |
-
|
| 121 |
-
```bash
|
| 122 |
-
tcell-predict data.h5ad -o results/ # custom output dir
|
| 123 |
-
tcell-predict data.h5ad --true-labels cell_type # evaluate vs ground truth
|
| 124 |
-
tcell-predict data.h5ad --device cpu # force CPU
|
| 125 |
-
```
|
| 126 |
|
| 127 |
Output: interactive HTML report, predictions.csv, annotated .h5ad.
|
| 128 |
|
| 129 |
-
###
|
| 130 |
-
|
| 131 |
-
```python
|
| 132 |
-
from src.hub import ensure_weights
|
| 133 |
-
from src.inference import load_ensemble, ensemble_predict
|
| 134 |
-
from src.data import InferenceDataset, prepare_inference_features
|
| 135 |
-
|
| 136 |
-
model_dir = ensure_weights() # auto-downloads from this repo
|
| 137 |
-
models = load_ensemble(model_dir, device)
|
| 138 |
-
dataset = InferenceDataset(gex, tcr_a_emb, tcr_b_emb, vj_encoded)
|
| 139 |
-
predictions, probabilities, agreement = ensemble_predict(models, dataset, device)
|
| 140 |
-
```
|
| 141 |
-
|
| 142 |
-
### Manual Download
|
| 143 |
|
| 144 |
```python
|
| 145 |
from huggingface_hub import snapshot_download
|
|
@@ -178,14 +165,14 @@ snapshot_download("VirialyD/tcell-classifier", local_dir="./weights")
|
|
| 178 |
## Citation
|
| 179 |
|
| 180 |
```bibtex
|
| 181 |
-
@software{
|
| 182 |
author = {Shirokikh, Polina},
|
| 183 |
title = {Multimodal T-Cell Functional State Classifier},
|
| 184 |
-
year = {
|
| 185 |
url = {https://github.com/polinavd/multimodal-tcell-classifier}
|
| 186 |
}
|
| 187 |
```
|
| 188 |
|
| 189 |
## License
|
| 190 |
|
| 191 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
license: mit
|
| 3 |
library_name: pytorch
|
| 4 |
tags:
|
| 5 |
- biology
|
|
|
|
| 27 |
|
| 28 |
**89.6% accuracy** | **macro F1 0.88** | **7 functional states** | **top-5 ensemble**
|
| 29 |
|
| 30 |
+
**GitHub**: [polinavd/multimodal-tcell-classifier](https://github.com/polinavd/multimodal-tcell-classifier)
|
| 31 |
+
|
| 32 |
## Model Description
|
| 33 |
|
| 34 |
This repository contains the weights for a top-5 ensemble of `FullGenesVJClassifier` models. Each model takes three input modalities:
|
|
|
|
| 73 |
|
| 74 |
## Training Data
|
| 75 |
|
| 76 |
+
**136,667 T-cells** (after QC filtering) from 4 public scRNA-seq datasets:
|
| 77 |
|
| 78 |
+
| Dataset | Platform | Cells* | Tissue |
|
| 79 |
|---|---|---|---|
|
| 80 |
| GSE144469 | 10x Genomics | ~60,000 | Colitis (colon) |
|
| 81 |
| GSE179994 | 10x Genomics | ~77,000 | PBMC (exhaustion study) |
|
| 82 |
| GSE181061 | 10x Genomics | ~31,000 | ccRCC (tumor-infiltrating) |
|
| 83 |
| GSE108989 | Smart-seq2 | ~12,000 | CRC (tumor + blood) |
|
| 84 |
|
| 85 |
+
*Cell counts are pre-QC; 136,667 cells remain after quality control filtering.
|
| 86 |
+
|
| 87 |
Preprocessing: QC β normalization (scanpy) β 3,000 HVGs β Harmony batch correction β CDR3/V/J extraction via scirpy.
|
| 88 |
|
| 89 |
## Evaluation
|
|
|
|
| 113 |
|
| 114 |
## How to Use
|
| 115 |
|
| 116 |
+
### Quick Start
|
| 117 |
|
| 118 |
```bash
|
| 119 |
+
git clone https://github.com/polinavd/multimodal-tcell-classifier.git
|
| 120 |
+
cd multimodal-tcell-classifier
|
| 121 |
+
pip install -r requirements.txt
|
| 122 |
+
python predict_report.py --input your_data.h5ad --output ./results
|
| 123 |
```
|
| 124 |
|
| 125 |
+
Model weights (~300 MB) are downloaded automatically from this HuggingFace repo on first run.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
Output: interactive HTML report, predictions.csv, annotated .h5ad.
|
| 128 |
|
| 129 |
+
### Manual Weight Download
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
|
| 131 |
```python
|
| 132 |
from huggingface_hub import snapshot_download
|
|
|
|
| 165 |
## Citation
|
| 166 |
|
| 167 |
```bibtex
|
| 168 |
+
@software{shirokikh2025multimodal,
|
| 169 |
author = {Shirokikh, Polina},
|
| 170 |
title = {Multimodal T-Cell Functional State Classifier},
|
| 171 |
+
year = {2025},
|
| 172 |
url = {https://github.com/polinavd/multimodal-tcell-classifier}
|
| 173 |
}
|
| 174 |
```
|
| 175 |
|
| 176 |
## License
|
| 177 |
|
| 178 |
+
MIT License β see [LICENSE](https://github.com/polinavd/multimodal-tcell-classifier/blob/main/LICENSE) for details.
|