maayanlab
/

gsfm

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions

gsfm / README.md

u8sand's picture

Update README.md

6f04e09 verified 3 months ago

|

history blame contribute delete

1.28 kB

	---
	tags:
	- bioinformatics
	- gene
	- gene set
	- model_hub_mixin
	- pytorch_model_hub_mixin
	---

	# GSFM

	Trained on millions of gene sets automatically extracted from literature and raw RNA-seq data, GSFM learns to recover held-out genes from gene sets. The resulting model exhibits state of the art performance on gene function prediction.

	Deprecation Notice: This repo was replaced with <https://github.com/MaayanLab/gsfm> -- you can now access different versions of the model, stored on huggingface, directions in that repository.

	## Website

	<https://gsfm.maayanlab.cloud/>

	## Usage

	```bash
	# install gsfm python library from its source on huggingface
	GIT_LFS_SKIP_SMUDGE=1 pip install git+https://huggingface.co/maayanlab/gsfm
	```

	```python
	import torch
	from gsfm import Vocab, GSFM

	# load gsfm vocabulary and model weights
	vocab = Vocab.from_pretrained('maayanlab/gsfm')
	gsfm = GSFM.from_pretrained('maayanlab/gsfm')
	gsfm.eval()

	# convert gene symbols into token ids
	token_ids = torch.tensor(vocab(['ACE1', 'ACE2']))[None, :]

	# use model to predict missing genes from the set
	logits = torch.squeeze(gsfm(token_ids))
	top_10 = sorted(zip(logits, vocab.vocab))[-10:]
	top_10

	# get model middle layer
	gene_set_encoding = gsfm.encode(token_ids)
	gene_set_encoding
	```