| --- |
| title: README |
| emoji: 🐨 |
| colorFrom: purple |
| colorTo: blue |
| sdk: static |
| pinned: true |
| license: bsd-3-clause |
| short_description: Ensemble of experts for cell-type annotation |
| thumbnail: >- |
| https://cdn-uploads.huggingface.co/production/uploads/63d7697f2e397d9f8e30e677/tvABibiml6K2sccfXLybG.png |
| --- |
| # **popV** |
|
|
| Welcome to the **popV** framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained |
| models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process. |
| By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high |
| uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. |
| This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our [GitHub |
| repository](https://github.com/YosefLab/popV) to add your dataset. |
|
|
| --- |
|
|
| ## **Model Overview** |
| popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your |
| own dataset, please refer to our [tutorial]() |
|
|
| ### Algorithms |
|
|
| Currently implemented algorithms are: |
|
|
| - K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn) |
| - K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama) |
| - K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools) |
| - K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch) |
| - Random forest classification |
| - Support vector machine classification |
| - [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification |
| - [scANVI](https://github.com/scverse/scvi-tools) label transfer |
| - [Celltypist](https://www.celltypist.org) cell type classification |
|
|
| --- |
|
|
| ## **Key Applications** |
| The purpose of these models is to perform cell-type label transfer. |
| We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales |
| well to 100k cells. PopV has three levels of prediction complexities: |
|
|
| - retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU. |
| - inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU. |
| - fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding). |
|
|
| --- |
|
|
| ## **Publications** |
| - **[Original popV paper](https://www.nature.com/articles/s41588-024-01993-3)**: |
| - Published in *Nature Genetics*, this paper introduces popV and benchmarks it. |
|
|
| ## **Contact** |
| - GitHub: [https://github.com/YosefLab/popV](https://github.com/YosefLab/popV) |
| - User questions: [Discourse](https://discourse.scverse.org) |
|
|
|
|
| <!--- |
| - **[MultiVI](https://docs.scvi-tools.org/en/stable/user_guide/models/multivi.html)**: |
| - A multi-modal model for joint analysis of RNA, ATAC and protein data, enabling integrative insights from diverse omics data. |
| --> |