Instructions to use ctheodoris/Geneformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ctheodoris/Geneformer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="ctheodoris/Geneformer")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("ctheodoris/Geneformer") model = AutoModelForMaskedLM.from_pretrained("ctheodoris/Geneformer") - Inference
- Notebooks
- Google Colab
- Kaggle
Did something happen to the token_dictionary_gc95M.pkl?
I've been playing with this a lot recently and due to our infrastructure I download/install geneformer fresh each time I load. the past few days i've had no problems loading the token_dictionary_gc95M file
when I try to extract embeddings, i get:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/Geneformer/geneformer/emb_extractor.py", line 521, in init
self.gene_token_dict = pickle.load(f)
_pickle.UnpicklingError: invalid load key, 'v'.
and if i try to load the .pkl directly i get
token_file = "../Geneformer/geneformer/token_dictionary_gc95M.pkl"
with open(token_file, "rb") as file:
token_dict = pickle.load(file)
Traceback (most recent call last):
File "", line 2, in
_pickle.UnpicklingError: invalid load key, 'v'.
am I the only one? using python 3.10.16.
Thanks for your question - this can happen when you aren't using git lfs.
yep totally right - our git lfs install had a problem on respin, once we fixed that everything's peachy again! tyty!
After installing just simply copy/download the "*.pkl" files in the geneformer folder. I faced the same issue but it was downloading the corrupted version of .pkl files due to git lfs issue