uce-4-layer / README.md
krkawzq's picture
Upload UCE 4-layer model
85266d6 verified
# UCE 4LAYER Model
## Model Information
- **Model**: Universal Cell Embeddings (UCE)
- **Variant**: 4-layer Transformer
- **Source**: https://github.com/snap-stanford/UCE
- **Paper**: [Universal Cell Embeddings: A Foundation Model for Cell Biology](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1)
## Architecture
- **Layers**: 4
- **Model Dimension**: 1280
- **Attention Heads**: 20
- **Hidden Dimension**: 5120
- **Output Dimension**: 1280
- **Token Dimension**: 5120 (ESM2 protein embeddings)
## Usage
```python
from perturblab.model.uce import UCEModel
# Load pretrained model
model = UCEModel.from_pretrained('./weights/uce-4layer')
# Generate embeddings
result = model.predict_embeddings(
data=adata, # or PerturbationData
species='human',
batch_size=25
)
cell_embeddings = result['cell_embeddings'] # (n_cells, 1280)
gene_embeddings = result['gene_embeddings'] # (n_cells, seq_len, 1280)
```
## Files
- `model.pt`: Model state dict
- `tokens.pt`: Token embeddings (ESM2-650M + chromosome tokens)
- `config.json`: Model configuration
- `species_chrom.csv`: Gene to chromosome mapping
- `species_offsets.pkl`: Species offsets in token file
- `protein_embeddings/`: Protein embeddings for each species
- `README.md`: This file
## Citation
```bibtex
@article{rosen2023universal,
title={Universal Cell Embeddings: A Foundation Model for Cell Biology},
author={Rosen, Yanay and Roohani, Yusuf and Agrawal, Ayush and Samotorcan, Leon and Consortium, Tabula Sapiens and Quake, Stephen R and Leskovec, Jure},
journal={bioRxiv},
pages={2023--11},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}
```
## License
MIT License (see original repository for details)