Issues with usage of the repository
Hi, thank you for the publication, very interesting work here! I wanted to try out a local installation of this repository and ran into a couple of issues. I have installed this model on AWS SageMaker AI with the Amazon Linux 2 operating system and on JupyterLab4 for the notebook. I am currently on an EC2 instance with the ml.g5.4xlarge node type that has 1 GPU. I was able to clone the repository and install the necessary packages in requirements.txt on a python 3.10 virtual environment. As this operating system's compiler is a bit old, I had to run mamba install -c conda-forge xgboost==3.1.2 instead of pip install xgboost for installation. I will post the two errors I encountered below. Did I do something wrong in my own usage of this repository, or is there something on your guy's end that should be fixed?
First, after installing all the requirements, I tried to run python inference.py as requested and got the following error message:
(PeptiVerse) [ec2-user@ip-172-21-244-78 PeptiVerse]$ python inference.py
tokenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 95.0/95.0 [00:00<00:00, 996kB/s]
vocab.txt: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 93.0/93.0 [00:00<00:00, 1.07MB/s]
special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 125/125 [00:00<00:00, 1.56MB/s]
config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 724/724 [00:00<00:00, 8.98MB/s]
model.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.61G/2.61G [00:02<00:00, 1.08GB/s]
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 921, in
predictor = PeptiVersePredictor(
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 661, in init
self.smiles_embedder = SMILESEmbedder(self.device, clm_name=clm_name,
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 428, in init
self.tokenizer = SMILES_SPE_Tokenizer(vocab_path, splits_path)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/tokenizer/my_tokenizers.py", line 74, in init
raise ValueError("Can't find a vocabulary file at path '{}'.".format(vocab_file))
ValueError: Can't find a vocabulary file at path 'Classifier_Weight/tokenizer/new_vocab.txt'.
(PeptiVerse) [ec2-user@ip-172-21-244-78 PeptiVerse]$ python inference.py
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 921, in
predictor = PeptiVersePredictor(
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 661, in init
self.smiles_embedder = SMILESEmbedder(self.device, clm_name=clm_name,
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 428, in init
self.tokenizer = SMILES_SPE_Tokenizer(vocab_path, splits_path)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/tokenizer/my_tokenizers.py", line 74, in init
raise ValueError("Can't find a vocabulary file at path '{}'.".format(vocab_file))
ValueError: Can't find a vocabulary file at path 'Classifier_Weight/tokenizer/new_vocab.txt'.
I then tried to use the following script as specified:
from inference import PeptiVersePredictor
pred = PeptiVersePredictor(
manifest_path="best_models.txt", # best model list
classifier_weight_root=".", # repo root (where training_classifiers/ lives)
device="cuda", # or "cpu"
)
and got the following error message:
/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
RoFormerForMaskedLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From πv4.50π onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.
- If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
KeyError Traceback (most recent call last)
Cell In[1], line 3
1 from inference import PeptiVersePredictor
----> 3 pred = PeptiVersePredictor(
4 manifest_path="best_models.txt", # best model list
5 classifier_weight_root=".", # repo root (where training_classifiers/ lives)
6 device="cuda", # or "cpu"
7 )
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:668, in PeptiVersePredictor.init(self, manifest_path, classifier_weight_root, esm_name, clm_name, smiles_vocab, smiles_splits, device)
665 self.models: Dict[Tuple[str, str], Any] = {}
666 self.meta: Dict[Tuple[str, str], Dict[str, Any]] = {}
--> 668 self._load_all_best_models()
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:734, in PeptiVersePredictor._load_all_best_models(self)
731 continue
733 model_dir = self._resolve_dir(prop_key, m, mode)
--> 734 kind, obj, art = load_artifact(model_dir, self.device)
736 if kind in {"xgb", "joblib"}:
737 self.models[(prop_key, mode)] = obj
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:146, in load_artifact(model_dir, device)
143 return "xgb", booster, art
145 if art.suffix == ".joblib":
--> 146 obj = joblib.load(art)
147 return "joblib", obj, art
149 if art.suffix == ".pt":
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py:749, in load(filename, mmap_mode, ensure_native_byte_order)
744 return load_compatibility(fobj)
746 # A memory-mapped array has to be mapped with the endianness
747 # it has been written with. Other arrays are coerced to the
748 # native endianness of the host system.
--> 749 obj = _unpickle(
750 fobj,
751 ensure_native_byte_order=ensure_native_byte_order,
752 filename=filename,
753 mmap_mode=validated_mmap_mode,
754 )
756 return obj
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py:626, in _unpickle(fobj, ensure_native_byte_order, filename, mmap_mode)
624 obj = None
625 try:
--> 626 obj = unpickler.load()
627 if unpickler.compat_mode:
628 warnings.warn(
629 "The file '%s' has been generated with a "
630 "joblib version less than 0.10. "
(...)
633 stacklevel=3,
634 )
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/pickle.py:1213, in _Unpickler.load(self)
1211 raise EOFError
1212 assert isinstance(key, bytes_types)
-> 1213 dispatchkey[0]
1214 except _Stop as stopinst:
1215 return stopinst.value
KeyError: 118
Hi Jason! Thanks for letting us know about this! You found some bugs that we'll fix now on our end. Expect an update soon. Thanks again!
Hello Jason,
The path issue should be fixed now, basically it should be "./" pointing to the default folder instead of our previous old folder name "./Classifier_Weight".
For the second bug though, I didnt encounter issues executing
from inference import PeptiVersePredictor
pred = PeptiVersePredictor(
manifest_path="best_models.txt", # best model list
classifier_weight_root=".", # repo root (where training_classifiers/ lives)
device="cuda", # or "cpu"
)
in jupyter notebook. There are some warnings during initialization, but not errors. Can you try again after updating your joblib package? My version is '1.5.1'.
Best,
Yinuo
Thank you for the quick response!
My current requirements installation script. I installed xgboost manually with a specific version after removing the package from the pip install due to the aforementioned old compiler issue on my system:
mamba create -n PeptiVerse python=3.10 -y
source ~/.bashrc
conda activate PeptiVerse
pip install ipykernel
python -m ipykernel install --user --name=gReLU --display-name "PeptiVerse"
# Install dependencies
pip install -r requirements_noxgboost.txt
mamba install -c -y conda-forge xgboost==3.1.2
pip install joblib==1.5.1
# Run inference
python inference.py
After doing the following, you'll see the error I get with pytorch-lightning not being installed. I can install pytorch-lightning if you'd like, but pytorch-lightning installation doesn't seem to be a part of the requirements.txt file currently.
(PeptiVerse) [ec2-user@ip-172-21-246-168 PeptiVerse]$ pip install joblib==1.5.1
Collecting joblib==1.5.1
Downloading joblib-1.5.1-py3-none-any.whl.metadata (5.6 kB)
Downloading joblib-1.5.1-py3-none-any.whl (307 kB)
Installing collected packages: joblib
Attempting uninstall: joblib
Found existing installation: joblib 1.5.3
Uninstalling joblib-1.5.3:
Successfully uninstalled joblib-1.5.3
Successfully installed joblib-1.5.1
(PeptiVerse) [ec2-user@ip-172-21-246-168 PeptiVerse]$ python inference.py
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 16, in
from lightning.pytorch import seed_everything
ModuleNotFoundError: No module named 'lightning'
Below is what happened after I ran the following:
from inference import PeptiVersePredictor
pred = PeptiVersePredictor(
manifest_path="best_models.txt", # best model list
classifier_weight_root=".", # repo root (where training_classifiers/ lives)
device="cuda", # or "cpu"
)
/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from inference import PeptiVersePredictor
3 pred = PeptiVersePredictor(
4 manifest_path="best_models.txt", # best model list
5 classifier_weight_root=".", # repo root (where training_classifiers/ lives)
6 device="cuda", # or "cpu"
7 )
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:16
14 from transformers import EsmModel, EsmTokenizer, AutoModelForMaskedLM
15 from tokenizer.my_tokenizers import SMILES_SPE_Tokenizer
---> 16 from lightning.pytorch import seed_everything
17 seed_everything(1986)
19 # -----------------------------
20 # Manifest
21 # -----------------------------
ModuleNotFoundError: No module named 'lightning'
Hello Jason,
I guess the error occurs because in the newest version I add a line to set global seed with lightning. I will add lightning to the basic pkg required list.
Best,
Yinuo