scvi-tools
/

tabula-sapiens-ear-scanvi

+---
+library_name: scvi-tools
+license: cc-by-4.0
+tags:
+- biology
+- genomics
+- single-cell
+- model_cls_name:SCANVI
+- scvi_version:1.4.2
+- anndata_version:0.12.7
+- modality:rna
+- tissue:various
+- annotated:True
+---
+ScANVI is a variational inference model for single-cell RNA-seq data that can learn an underlying
+latent space, integrate technical batches and impute dropouts.
+In addition, to scVI, ScANVI is a semi-supervised model that can leverage labeled data to learn a
+cell-type classifier in the latent space and afterward predict cell types of new data.
+The learned low-dimensional latent representation of the data can be used for visualization and
+clustering.
+scANVI takes as input a scRNA-seq gene expression matrix with cells and genes as well as a
+cell-type annotation for a subset of cells.
+We provide an extensive [user guide](https://docs.scvi-tools.org/en/stable/user_guide/models/scanvi.html).
+- See our original manuscript for further details of the model:
+[scANVI manuscript](https://www.embopress.org/doi/full/10.15252/msb.20209620).
+- See our manuscript on [scvi-hub](https://www.biorxiv.org/content/10.1101/2024.03.01.582887v2)
+how to leverage pre-trained models.
+This model can be used for fine tuning on new data using our Arches framework:
+[Arches tutorial](https://docs.scvi-tools.org/en/stable/tutorials/notebooks/scrna/scarches_scvi_tools.html).
+# Model Description
+Tabula Sapiens is a benchmark, first-draft human cell atlas of nearly 500,000 cells from 24 organs of 15 normal human subjects.
+# Metrics
+We provide here key performance metrics for the uploaded model, if provided by the data uploader.
+<details>
+<summary><strong>Coefficient of variation</strong></summary>
+The cell-wise coefficient of variation summarizes how well variation between different cells is
+preserved by the generated model expression. Below a squared Pearson correlation coefficient of 0.4
+, we would recommend not to use generated data for downstream analysis, while the generated latent
+space might still be useful for analysis.
+**Cell-wise Coefficient of Variation**:
+Not provided by uploader
+The gene-wise coefficient of variation summarizes how well variation between different genes is
+preserved by the generated model expression. This value is usually quite high.
+**Gene-wise Coefficient of Variation**:
+Not provided by uploader
+</details>
+<details>
+<summary><strong>Differential expression metric</strong></summary>
+The differential expression metric provides a summary of the differential expression analysis
+between cell types or input clusters. We provide here the F1-score, Pearson Correlation
+Coefficient of Log-Foldchanges, Spearman Correlation Coefficient, and Area Under the Precision
+Recall Curve (AUPRC) for the differential expression analysis using Wilcoxon Rank Sum test for each
+cell-type.
+**Differential expression**:
+Not provided by uploader
+</details>
+# Model Properties
+We provide here key parameters used to setup and train the model.
+<details>
+<summary><strong>Model Parameters</strong></summary>
+These provide the settings to setup the original model:
+```json
+{
+    "n_hidden": 128,
+    "n_latent": 20,
+    "n_layers": 3,
+    "dropout_rate": 0.05,
+    "dispersion": "gene",
+    "gene_likelihood": "nb",
+    "use_observed_lib_size": true,
+    "linear_classifier": false,
+    "datamodule": null,
+    "latent_distribution": "normal",
+    "use_batch_norm": "none",
+    "use_layer_norm": "both",
+    "encode_covariates": true
+}
+```
+</details>
+<details>
+<summary><strong>Setup Data Arguments</strong></summary>
+Arguments passed to setup_anndata of the original model:
+```json
+{
+    "labels_key": "cell_type",
+    "unlabeled_category": "unknown",
+    "layer": "counts",
+    "batch_key": "donor_assay",
+    "size_factor_key": null,
+    "categorical_covariate_keys": null,
+    "continuous_covariate_keys": null,
+    "use_minified": false
+}
+```
+</details>
+<details>
+<summary><strong>Data Registry</strong></summary>
+Registry elements for AnnData manager:
+| Registry Key             | scvi-tools Location                  |
+|--------------------------|--------------------------------------|
+| X                         | adata.layers['counts']               |
+| batch                     | adata.obs['_scvi_batch']             |
+| labels                    | adata.obs['_scvi_labels']            |
+| latent_qzm                | adata.obsm['scanvi_latent_qzm']      |
+| latent_qzv                | adata.obsm['scanvi_latent_qzv']      |
+| minify_type               | adata.uns['_scvi_adata_minify_type'] |
+| observed_lib_size         | adata.obs['observed_lib_size']       |
+- **Data is Minified**: False
+</details>
+<details>
+<summary><strong>Summary Statistics</strong></summary>
+| Summary Stat Key          | Value |
+|--------------------------|-------|
+| n_batch                   | 2 |
+| n_cells                   | 3055 |
+| n_extra_categorical_covs  | 0 |
+| n_extra_continuous_covs   | 0 |
+| n_labels                  | 17 |
+| n_latent_qzm              | 20 |
+| n_latent_qzv              | 20 |
+| n_vars                    | 3000 |
+</details>
+<details>
+<summary><strong>Training</strong></summary>
+<!-- If your model is not uploaded with any data (e.g., minified data) on the Model Hub, then make
+sure to provide this field if you want users to be able to access your training data. See the
+scvi-tools documentation for details. -->
+**Training data url**: Not provided by uploader
+If provided by the original uploader, for those interested in understanding or replicating the
+training process, the code is available at the link below.
+**Training Code URL**: https://github.com/YosefLab/scvi-hub-models/blob/main/src/scvi_hub_models/TS_train_all_tissues.ipynb
+</details>
+# References
+The Tabula Sapiens Consortium. The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science, May 2022. doi:10.1126/science.abl4896