Troyejcan commited on
Commit
ab1c93e
·
verified ·
1 Parent(s): 4675380

Upload 17 files

Browse files
.cache/huggingface/.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ *
.cache/huggingface/download/.gitattributes.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ a6344aac8c09253b3b630fb776ae94478aa0275b
3
+ 1754299690.505366
.cache/huggingface/download/README.md.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ a901d1e62b082c5c6ca8bae9ce4b4b464133e205
3
+ 1754299690.436856
.cache/huggingface/download/embedding_associations_age_cell_type_drugs_pathways_openai_large.parquet.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ fa4a4d748d2e9e8f056a3f24fe3d73ebf6ae871e0de53860204d3d84cb000ed2
3
+ 1754299788.4877183
.cache/huggingface/download/embedding_associations_age_drugs_pathways_openai_large.parquet.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ 520d91352d5c46dfc1e5639bfb3a819f84abf13f8283cafe6435f9fb3c8847a4
3
+ 1754299773.7203612
.cache/huggingface/download/embedding_associations_cell_type_openai_large.parquet.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ 68a6e2a8ff7f7d1968801752db92fc8c5545d395cba4644b48b2e52a8b0b635a
3
+ 1754299790.0907319
.cache/huggingface/download/embedding_associations_cell_type_tissue_drug_pathway_openai_large.parquet.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ 0cdb35246efdad13e8a27e2ea22849fd180b6313be1e32a7261a77a34f584539
3
+ 1754299792.1593375
.cache/huggingface/download/embedding_original_ada_text.parquet.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ a1ebb884cd6a4701047099cecfe528c014ef5c576bca0a0aac6897295f33ee60
3
+ 1754299758.271861
.cache/huggingface/download/embedding_original_large_3.parquet.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 6c0d1775347791af7970f5360c610687b0a19f3d
2
+ 5b92715a6ed89d1f103492565fec6fc790514568b61656fcb920a9b767634fbb
3
+ 1754299822.628316
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - honicky/genept-composable-embeddings-source-data
4
+ ---
5
+
6
+ # GenePT Composable Embeddings
7
+
8
+ This model is a set of embeddings for a list of about 33K functional genes, created using OpenAI embedding models (and other in the future) to embed text about the genes. Details about the process and evaluations can be found in the paper:
9
+
10
+ Chen YT, Zou J. (2023+) GenePT: A Simple But Effective Foundation Model for Genes and Cells Built From ChatGPT. bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2023.10.16.562533v2.
11
+
12
+ and on GitHub: https://github.com/yiqunchen/GenePT
13
+
14
+ In this repsitory, we (not the original authors) are collecting modifications of the original embeddings with the intent of creating a set of composable embeddings for genes. These embeddings will encode specific information about each gene regarding a set of factors, such as aging, drug interactions, pathways, etc. The repository also contains the original embeddings.
15
+
16
+ ## Dataset
17
+
18
+ The base dataset was collected from NCBI and UniProt, and contains a set of gene descriptions. We have used `gtp-4o-mini` (and potentially other models in the future) to generate descriptions of the genes, and other factors as mentioned above. We have collected the source datasets in the `honicky/genept-composable-embeddings-source-data` Dataset repository.
19
+
20
+ ## Model
21
+
22
+ The model is used by multiplying the gene embedding vectors by the expression level for each genes and summing the results (e.g. a matrix multiplication). See the original paper for more details.
23
+
24
+ ## Code
25
+
26
+ The https://github.com/honicky/GenePT-tools repository contains the latest code for building and using the models, as well as some example notebooks.
27
+
28
+ ## License
29
+
30
+ The original models and data in this repository is licensed under the MIT license. The original GenePT weights are governed by the license of the original GenePT repository.
embedding_associations_age_cell_type_drugs_pathways_openai_large.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa4a4d748d2e9e8f056a3f24fe3d73ebf6ae871e0de53860204d3d84cb000ed2
3
+ size 1037227120
embedding_associations_age_drugs_pathways_openai_large.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:520d91352d5c46dfc1e5639bfb3a819f84abf13f8283cafe6435f9fb3c8847a4
3
+ size 829238967
embedding_associations_cell_type_openai_large.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68a6e2a8ff7f7d1968801752db92fc8c5545d395cba4644b48b2e52a8b0b635a
3
+ size 1037229495
embedding_associations_cell_type_tissue_drug_pathway_openai_large.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0cdb35246efdad13e8a27e2ea22849fd180b6313be1e32a7261a77a34f584539
3
+ size 1037217767
embedding_original_ada_text.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1ebb884cd6a4701047099cecfe528c014ef5c576bca0a0aac6897295f33ee60
3
+ size 581700397
embedding_original_large_3.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b92715a6ed89d1f103492565fec6fc790514568b61656fcb920a9b767634fbb
3
+ size 1387965064