File size: 1,720 Bytes
85266d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# UCE 4LAYER Model

## Model Information

- **Model**: Universal Cell Embeddings (UCE)
- **Variant**: 4-layer Transformer
- **Source**: https://github.com/snap-stanford/UCE
- **Paper**: [Universal Cell Embeddings: A Foundation Model for Cell Biology](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1)

## Architecture

- **Layers**: 4
- **Model Dimension**: 1280
- **Attention Heads**: 20
- **Hidden Dimension**: 5120
- **Output Dimension**: 1280
- **Token Dimension**: 5120 (ESM2 protein embeddings)

## Usage

```python
from perturblab.model.uce import UCEModel

# Load pretrained model
model = UCEModel.from_pretrained('./weights/uce-4layer')

# Generate embeddings
result = model.predict_embeddings(
    data=adata,  # or PerturbationData
    species='human',
    batch_size=25
)

cell_embeddings = result['cell_embeddings']  # (n_cells, 1280)
gene_embeddings = result['gene_embeddings']  # (n_cells, seq_len, 1280)
```

## Files

- `model.pt`: Model state dict
- `tokens.pt`: Token embeddings (ESM2-650M + chromosome tokens)
- `config.json`: Model configuration
- `species_chrom.csv`: Gene to chromosome mapping
- `species_offsets.pkl`: Species offsets in token file
- `protein_embeddings/`: Protein embeddings for each species
- `README.md`: This file

## Citation

```bibtex
@article{rosen2023universal,
  title={Universal Cell Embeddings: A Foundation Model for Cell Biology},
  author={Rosen, Yanay and Roohani, Yusuf and Agrawal, Ayush and Samotorcan, Leon and Consortium, Tabula Sapiens and Quake, Stephen R and Leskovec, Jure},
  journal={bioRxiv},
  pages={2023--11},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}
```

## License

MIT License (see original repository for details)