krkawzq commited on
Commit
a73a35d
·
verified ·
1 Parent(s): abece87

Upload scelmo-gene-ncbi - scELMo gene embeddings from GenePT (NCBI-based)

Browse files
Files changed (3) hide show
  1. README.md +49 -0
  2. config.json +10 -0
  3. gene_embeddings.pkl +3 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scELMo Gene Embeddings
2
+
3
+ This directory contains gene embeddings generated using scELMo methodology.
4
+
5
+ ## Source
6
+
7
+ These embeddings are converted from the official scELMo repository:
8
+ **https://github.com/HelloWorldLTY/scELMo**
9
+
10
+ ## Model Information
11
+
12
+ - **Model**: genepT-ncbi
13
+ - **Embedding Dimension**: 1536
14
+ - **Type**: gene
15
+ - **Aggregation Mode**: wa
16
+ - **API Model**: text-embedding-ada-002
17
+
18
+ ## Files
19
+
20
+ - `gene_embeddings.pkl`: Gene embeddings dictionary in PerturbLab format
21
+ - Format: `{'embeddings': {gene_name: embedding_array}, 'gene_list': [gene_names]}`
22
+ - `config.json`: Model configuration
23
+
24
+ ## Usage
25
+
26
+ ```python
27
+ from perturblab.model.scelmo import scELMoModel
28
+
29
+ # Load model
30
+ model = scELMoModel.from_pretrained('scelmo-gene-ncbi')
31
+
32
+ # Use embeddings
33
+ embeddings = model.predict_embeddings(adata, aggregation_mode='wa')
34
+ ```
35
+
36
+ ## Citation
37
+
38
+ If you use these embeddings, please cite the original scELMo paper:
39
+
40
+ ```bibtex
41
+ @article{liu2023scelmo,
42
+ title={scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis},
43
+ author={Liu, Tianyu and Chen, Tianqi and Zheng, Wangjie and Luo, Xiao and Zhao, Hongyu},
44
+ journal={Cell Patterns (in press)},
45
+ pages={2023--12},
46
+ year={2025},
47
+ publisher={Cell Press}
48
+ }
49
+ ```
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_series": "scelmo",
3
+ "model_name": "genepT-ncbi",
4
+ "model_type": "embedding_extractor",
5
+ "embedding_dim": 1536,
6
+ "aggregation_mode": "wa",
7
+ "api_model": "text-embedding-ada-002",
8
+ "source_type": "gene",
9
+ "description": "Gene embeddings from GenePT (NCBI-based)"
10
+ }
gene_embeddings.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5fea7d87bd3862c41c3e2d6622d3143c9f01bf6a3dab2343817e5682fd60096e
3
+ size 419038659