maayanlab
/

gsfm

@@ -7,7 +7,42 @@ tags:
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 - pytorch_model_hub_mixin
 ---
+# GSFM
+Trained on millions of gene sets automatically extracted from literature and raw RNA-seq data, GSFM learns to recover held-out genes from gene sets. The resulting model exhibits state of the art performance on gene function prediction.
+## Website
+<https://gsfm.maayanlab.cloud/>
+## Usage
+```bash
+# install gsfm python library from its source on huggingface
+GIT_LFS_SKIP_SMUDGE=1 pip install git+https://huggingface.co/maayanlab/gsfm
+```
+```python
+import torch
+from gsfm import Vocab, GSFM
+# load gsfm vocabulary and model weights
+vocab = Vocab.from_pretrained('maayanlab/gsfm')
+gsfm = GSFM.from_pretrained('maayanlab/gsfm')
+# convert gene symbols into token ids
+token_ids = torch.tensor(vocab(['ACE1', 'ACE2']))[None, :]
+# use model to predict missing genes from the set
+logits = torch.squeeze(gsfm(token_ids))
+top_10 = sorted(zip(logits, vocab.vocab))[-10:]
+top_10
+# get gene embedding
+gene_embeddings = gsfm.embedding(token_ids)
+gene_embeddings
+# get model middle layer
+gene_set_encoding = gsfm.encode(token_ids)
+gene_set_encoding
+```