# UCE 4LAYER Model ## Model Information - **Model**: Universal Cell Embeddings (UCE) - **Variant**: 4-layer Transformer - **Source**: https://github.com/snap-stanford/UCE - **Paper**: [Universal Cell Embeddings: A Foundation Model for Cell Biology](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1) ## Architecture - **Layers**: 4 - **Model Dimension**: 1280 - **Attention Heads**: 20 - **Hidden Dimension**: 5120 - **Output Dimension**: 1280 - **Token Dimension**: 5120 (ESM2 protein embeddings) ## Usage ```python from perturblab.model.uce import UCEModel # Load pretrained model model = UCEModel.from_pretrained('./weights/uce-4layer') # Generate embeddings result = model.predict_embeddings( data=adata, # or PerturbationData species='human', batch_size=25 ) cell_embeddings = result['cell_embeddings'] # (n_cells, 1280) gene_embeddings = result['gene_embeddings'] # (n_cells, seq_len, 1280) ``` ## Files - `model.pt`: Model state dict - `tokens.pt`: Token embeddings (ESM2-650M + chromosome tokens) - `config.json`: Model configuration - `species_chrom.csv`: Gene to chromosome mapping - `species_offsets.pkl`: Species offsets in token file - `protein_embeddings/`: Protein embeddings for each species - `README.md`: This file ## Citation ```bibtex @article{rosen2023universal, title={Universal Cell Embeddings: A Foundation Model for Cell Biology}, author={Rosen, Yanay and Roohani, Yusuf and Agrawal, Ayush and Samotorcan, Leon and Consortium, Tabula Sapiens and Quake, Stephen R and Leskovec, Jure}, journal={bioRxiv}, pages={2023--11}, year={2023}, publisher={Cold Spring Harbor Laboratory} } ``` ## License MIT License (see original repository for details)