| # UCE 4LAYER Model | |
| ## Model Information | |
| - **Model**: Universal Cell Embeddings (UCE) | |
| - **Variant**: 4-layer Transformer | |
| - **Source**: https://github.com/snap-stanford/UCE | |
| - **Paper**: [Universal Cell Embeddings: A Foundation Model for Cell Biology](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1) | |
| ## Architecture | |
| - **Layers**: 4 | |
| - **Model Dimension**: 1280 | |
| - **Attention Heads**: 20 | |
| - **Hidden Dimension**: 5120 | |
| - **Output Dimension**: 1280 | |
| - **Token Dimension**: 5120 (ESM2 protein embeddings) | |
| ## Usage | |
| ```python | |
| from perturblab.model.uce import UCEModel | |
| # Load pretrained model | |
| model = UCEModel.from_pretrained('./weights/uce-4layer') | |
| # Generate embeddings | |
| result = model.predict_embeddings( | |
| data=adata, # or PerturbationData | |
| species='human', | |
| batch_size=25 | |
| ) | |
| cell_embeddings = result['cell_embeddings'] # (n_cells, 1280) | |
| gene_embeddings = result['gene_embeddings'] # (n_cells, seq_len, 1280) | |
| ``` | |
| ## Files | |
| - `model.pt`: Model state dict | |
| - `tokens.pt`: Token embeddings (ESM2-650M + chromosome tokens) | |
| - `config.json`: Model configuration | |
| - `species_chrom.csv`: Gene to chromosome mapping | |
| - `species_offsets.pkl`: Species offsets in token file | |
| - `protein_embeddings/`: Protein embeddings for each species | |
| - `README.md`: This file | |
| ## Citation | |
| ```bibtex | |
| @article{rosen2023universal, | |
| title={Universal Cell Embeddings: A Foundation Model for Cell Biology}, | |
| author={Rosen, Yanay and Roohani, Yusuf and Agrawal, Ayush and Samotorcan, Leon and Consortium, Tabula Sapiens and Quake, Stephen R and Leskovec, Jure}, | |
| journal={bioRxiv}, | |
| pages={2023--11}, | |
| year={2023}, | |
| publisher={Cold Spring Harbor Laboratory} | |
| } | |
| ``` | |
| ## License | |
| MIT License (see original repository for details) | |