--- tags: - bioinformatics - gene - gene set - model_hub_mixin - pytorch_model_hub_mixin --- # GSFM Trained on millions of gene sets automatically extracted from literature and raw RNA-seq data, GSFM learns to recover held-out genes from gene sets. The resulting model exhibits state of the art performance on gene function prediction. **Deprecation Notice**: This repo was replaced with -- you can now access different versions of the model, stored on huggingface, directions in that repository. ## Website ## Usage ```bash # install gsfm python library from its source on huggingface GIT_LFS_SKIP_SMUDGE=1 pip install git+https://huggingface.co/maayanlab/gsfm ``` ```python import torch from gsfm import Vocab, GSFM # load gsfm vocabulary and model weights vocab = Vocab.from_pretrained('maayanlab/gsfm') gsfm = GSFM.from_pretrained('maayanlab/gsfm') gsfm.eval() # convert gene symbols into token ids token_ids = torch.tensor(vocab(['ACE1', 'ACE2']))[None, :] # use model to predict missing genes from the set logits = torch.squeeze(gsfm(token_ids)) top_10 = sorted(zip(logits, vocab.vocab))[-10:] top_10 # get model middle layer gene_set_encoding = gsfm.encode(token_ids) gene_set_encoding ```