File size: 1,720 Bytes
85266d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# UCE 4LAYER Model
## Model Information
- **Model**: Universal Cell Embeddings (UCE)
- **Variant**: 4-layer Transformer
- **Source**: https://github.com/snap-stanford/UCE
- **Paper**: [Universal Cell Embeddings: A Foundation Model for Cell Biology](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1)
## Architecture
- **Layers**: 4
- **Model Dimension**: 1280
- **Attention Heads**: 20
- **Hidden Dimension**: 5120
- **Output Dimension**: 1280
- **Token Dimension**: 5120 (ESM2 protein embeddings)
## Usage
```python
from perturblab.model.uce import UCEModel
# Load pretrained model
model = UCEModel.from_pretrained('./weights/uce-4layer')
# Generate embeddings
result = model.predict_embeddings(
data=adata, # or PerturbationData
species='human',
batch_size=25
)
cell_embeddings = result['cell_embeddings'] # (n_cells, 1280)
gene_embeddings = result['gene_embeddings'] # (n_cells, seq_len, 1280)
```
## Files
- `model.pt`: Model state dict
- `tokens.pt`: Token embeddings (ESM2-650M + chromosome tokens)
- `config.json`: Model configuration
- `species_chrom.csv`: Gene to chromosome mapping
- `species_offsets.pkl`: Species offsets in token file
- `protein_embeddings/`: Protein embeddings for each species
- `README.md`: This file
## Citation
```bibtex
@article{rosen2023universal,
title={Universal Cell Embeddings: A Foundation Model for Cell Biology},
author={Rosen, Yanay and Roohani, Yusuf and Agrawal, Ayush and Samotorcan, Leon and Consortium, Tabula Sapiens and Quake, Stephen R and Leskovec, Jure},
journal={bioRxiv},
pages={2023--11},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}
```
## License
MIT License (see original repository for details)
|