genbio-ai
/

AIDO.Cell-3M

PyTorch

Safetensors

cellfoundation

Model card Files Files and versions

xet

Community

NickDST commited on Dec 3, 2024

Commit

69f6e65

verified ·

1 Parent(s): 4ba5602

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -49

README.md CHANGED Viewed

@@ -1,57 +1,14 @@
-# AIDO.Cell 3M
-AIDO.Cell 3M is our smallest cellular foundation model trained on 50 million cells over a diverse
-set of tissues and organs. The AIDO.Cell models are capable of handling the entire human transcriptome as input,
-thus learning accurate and general representations of the human cell's entire transcriptional context.
-AIDO.Cell achieves state-of-the-art results in tasks such as zero-shot clustering, cell-type classification, and perturbation modeling.
-## Model Architectural Details
-AIDO.Cell uses an auto-discretization strategy for encoding continuous gene expression values, and uses a bidirectional transformer encoder as its backbone.
-To learn semantically meaningful representations, we employed an BERT-style encoder-only dense transformer architecture. We make minor updates to this architecture to align with current best practices, including using SwiGLU and LayerNorms.
- Below are more details about the model architecture:
-| Model | Layers | Hidden | Heads | Intermediate Hidden Size |
-| ----- |:------:| ------ | ----- | ------------------------ |
-| 3M    |   6    | 128    | 4      | 320                         |
-| 10M   |   8    | 256       | 8      | 640                         |
-| 100M  |   18   | 650       | 20      | 1664                         |
-| 650M  |   32   | 1280       | 20      | 3392                         |
-## Pre-training of AIDO.Cell
-Here we briefly introduce the details of pre-training of AIDIO.Cell. For more detailed information, please refer to [our paper]()).
-AIDO.Cell uses the Read Depth-Aware (RDA) pretraining objective where a single cell expression is downsampled into a low read depth, and the model
-learns to predict the expression count of higher read depth of masked genes.
-### Data
-AIDO.Cell was pretrained on a diverse dataset of 50 million cells from over 100 tissue types. We
-followed the list of data curated by scFoundation in the supplementary. This list includes datasets
-from the Gene Expression Omnibus (GEO), the Deeply Integrated human Single-Cell Omnics
-data (DISCO), the human ensemble cell atlas (hECA), Single Cell Portal and more.
-After preprocessing and quality control, the training dataset contained 50 million cells, or 963 total
-billion gene tokens. We partitioned the dataset to set aside 100,000 cells as our validation set.
-### Training Details
- We trained our models with bfloat-16 precision to optimize on memory and speed. The training took place over 256 H100 GPUs over three days for
-the 100M, and eight days for the 650M.
-## Evaluation of AIDO.Cell
-We evaluated AIDO.Cell on a series of both zero shots and fine tuned tasks in single cell genomics. For more detailed information, please refer to [our paper]()).
 ## How to Use
-### Build any downstream models from this backbone
-#### Embedding
-#### Sequence Level Classification
-#### Token Level Classification
-#### Or use our one-liner CLI to finetune or evaluate any of the above!
-For more information, visit: [Model Generator](https://github.com/genbio-ai/test)
-## Citation
 Please cite AIDO.Cell using the following BibTeX code:
 ```
 @inproceedings{ho2024scaling,
@@ -59,5 +16,4 @@ title={Scaling Dense Representations for Single Cell with Transcriptome-Scale Co
 author={Nicholas Ho, Caleb N. Ellington, Jinyu Hou, Sohan Addagudi, Shentong Mo, Tianhua Tao, Dian Li, Yonghao Zhuang, Hongyi Wang, Xingyi Cheng, Le Song, Eric P. Xing},
 booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
 year={2024}
-}
-```

+## AIDO.Cell
+For a more detailed description, refer to the SOTA model in this collection https://huggingface.co/genbio-ai/cellfoundation-100m
 ## How to Use
+For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
+## Citation
 Please cite AIDO.Cell using the following BibTeX code:
 ```
 @inproceedings{ho2024scaling,
 author={Nicholas Ho, Caleb N. Ellington, Jinyu Hou, Sohan Addagudi, Shentong Mo, Tianhua Tao, Dian Li, Yonghao Zhuang, Hongyi Wang, Xingyi Cheng, Le Song, Eric P. Xing},
 booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
 year={2024}
+}