"torch.cuda.is_available() is False", "Can't load the configuration of 'path/to/pretrained_model/" and... in hyperparam_optimiz_for_disease_classifier.py

#212

by kamisama0101 - opened Aug 11, 2023

Aug 11, 2023

Thank you for your great work in geneformer!
However, when I redid the disease classification, I found a series of problems:

"File "/home/.../anaconda3/envs/geneformer3/lib/python3.9/site-packages/torch/serialization.py", line 166, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False."
Another:
"File "/home/.../anaconda3/envs/geneformer3/lib/python3.9/site-packages/ray/worker.py", line 1833, in get
raise value
ray.exceptions.RaySystemError: System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU."

It is really strange because I do have a CUDA device.

"During handling of the above exception, another exception occurred:
...
File "/home/.../envs/geneformer3/lib/python3.9/site-packages/transformers/configuration_utils.py", line 693, in _get_config_dict
raise EnvironmentError(
OSError: Can't load the configuration of 'path/to/pretrained_model/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'path/to/pretrained_model/' is the correct path to a directory containing a config.json file"
It's also strange because I can load the pretrained_model in the cell classification and gene classification task by the same "path/to".
"File "/home/jdhan_pkuhpc/profiles/yeyaxuan/lustre2/anaconda3/envs/geneformer3/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'path/to/pretrained_model/'. Use repo_type argument if needed."

I wonder whether it is caused by version conflict. I currently use the ray of version '1.13.0'.
I would appreciate it if you can offer me some help!

ctheodoris

Owner Aug 11, 2023

Thank you for your interest in Geneformer! Raytune can be sensitive to various factors, especially when running jobs distributed on multiple GPUs, that are highly environment-specific. If you are able to load the model usually but are having trouble when loading with Raytune, it's possible that the job being distributed to multiple GPUs is not replicating the environment correctly for each parallel job. This script was run with Raytune version 1.9.2 (conda-forge) at the time. You could consider trialing that version or following up with Raytune to help troubleshoot by checking their documentation and prior closed issues/discussions or opening a new issue.

ctheodoris changed discussion status to closed Aug 11, 2023

kamisama0101

Aug 14, 2023

What's the version of python you used?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment