Upload scelmo-gene-ncbi - scELMo gene embeddings from GenePT (NCBI-based)
Browse files- README.md +49 -0
- config.json +10 -0
- gene_embeddings.pkl +3 -0
README.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scELMo Gene Embeddings
|
| 2 |
+
|
| 3 |
+
This directory contains gene embeddings generated using scELMo methodology.
|
| 4 |
+
|
| 5 |
+
## Source
|
| 6 |
+
|
| 7 |
+
These embeddings are converted from the official scELMo repository:
|
| 8 |
+
**https://github.com/HelloWorldLTY/scELMo**
|
| 9 |
+
|
| 10 |
+
## Model Information
|
| 11 |
+
|
| 12 |
+
- **Model**: genepT-ncbi
|
| 13 |
+
- **Embedding Dimension**: 1536
|
| 14 |
+
- **Type**: gene
|
| 15 |
+
- **Aggregation Mode**: wa
|
| 16 |
+
- **API Model**: text-embedding-ada-002
|
| 17 |
+
|
| 18 |
+
## Files
|
| 19 |
+
|
| 20 |
+
- `gene_embeddings.pkl`: Gene embeddings dictionary in PerturbLab format
|
| 21 |
+
- Format: `{'embeddings': {gene_name: embedding_array}, 'gene_list': [gene_names]}`
|
| 22 |
+
- `config.json`: Model configuration
|
| 23 |
+
|
| 24 |
+
## Usage
|
| 25 |
+
|
| 26 |
+
```python
|
| 27 |
+
from perturblab.model.scelmo import scELMoModel
|
| 28 |
+
|
| 29 |
+
# Load model
|
| 30 |
+
model = scELMoModel.from_pretrained('scelmo-gene-ncbi')
|
| 31 |
+
|
| 32 |
+
# Use embeddings
|
| 33 |
+
embeddings = model.predict_embeddings(adata, aggregation_mode='wa')
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Citation
|
| 37 |
+
|
| 38 |
+
If you use these embeddings, please cite the original scELMo paper:
|
| 39 |
+
|
| 40 |
+
```bibtex
|
| 41 |
+
@article{liu2023scelmo,
|
| 42 |
+
title={scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis},
|
| 43 |
+
author={Liu, Tianyu and Chen, Tianqi and Zheng, Wangjie and Luo, Xiao and Zhao, Hongyu},
|
| 44 |
+
journal={Cell Patterns (in press)},
|
| 45 |
+
pages={2023--12},
|
| 46 |
+
year={2025},
|
| 47 |
+
publisher={Cell Press}
|
| 48 |
+
}
|
| 49 |
+
```
|
config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_series": "scelmo",
|
| 3 |
+
"model_name": "genepT-ncbi",
|
| 4 |
+
"model_type": "embedding_extractor",
|
| 5 |
+
"embedding_dim": 1536,
|
| 6 |
+
"aggregation_mode": "wa",
|
| 7 |
+
"api_model": "text-embedding-ada-002",
|
| 8 |
+
"source_type": "gene",
|
| 9 |
+
"description": "Gene embeddings from GenePT (NCBI-based)"
|
| 10 |
+
}
|
gene_embeddings.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5fea7d87bd3862c41c3e2d6622d3143c9f01bf6a3dab2343817e5682fd60096e
|
| 3 |
+
size 419038659
|