Errors installing DAM

#1
by rfbrito - opened

Hello!

I was following the directions on the DAM model page for installation. I am on a linux HPC. After cloning the repo, when I try to run

pip install -r requirements.txt

I get the following error:

"ERROR: Could not find a version that satisfies the requirement pytorch=2.6.0 (from versions: 0.1.2, 1.0.2)
ERROR: No matching distribution found for pytorch
=2.6.0"

I replicated this (both on the cluster and on my mac) error by trying to just install that version of torch using the directions in their documentation in a fresh conda env:

https://pytorch.org/get-started/previous-versions/ (and go to v2.6.0).

Note that torch uses 'torch' instead of 'pytorch' to install their packages, but i get this error in both cases.

I will try a newer version of torch (as I can successfully install a current version), but I wanted to check in case there were versioning issues w.r.t. testing and in case others run into this too!

Cheers,

Rahul

Just an update! If I try to just install most recent version of the packages in requiremets.txt from pypi, I get a python error:

from pipeline import Pipeline
pipeline = Pipeline()
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 479/479 [00:00<00:00, 876.23it/s]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 479/479 [00:00<00:00, 1652.40it/s]
Traceback (most recent call last):
File "", line 1, in
pipeline = Pipeline()
File "/orcd/home/002/rfbrito/modeling/dam/pipeline.py", line 25, in init
state_dict = torch.load(checkpoint, map_location=device)
File "/home/rfbrito/miniconda3/envs/dam/lib/python3.14/site-packages/torch/serialization.py", line 1572, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make weights_only=True compatible with your use case: WeightsUnpickler error:

Unsupported operand 118

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

If I try to update torch.load() to add weights_only=False (not sure if this is even what you want), I get:

pipeline = Pipeline()
...
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 479/479 [00:00<00:00, 1272.81it/s]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 479/479 [00:00<00:00, 1650.12it/s]
Traceback (most recent call last):
File "", line 1, in
pipeline = Pipeline()
File "/orcd/home/002/rfbrito/modeling/dam/pipeline.py", line 25, in init
state_dict = torch.load(checkpoint, map_location=device, weights_only=False)
File "/home/rfbrito/miniconda3/envs/dam/lib/python3.14/site-packages/torch/serialization.py", line 1573, in load
return _legacy_load(
opened_file, map_location, pickle_module, **pickle_load_args
)
File "/home/rfbrito/miniconda3/envs/dam/lib/python3.14/site-packages/torch/serialization.py", line 1822, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

So I think I am stuck here. Thanks for your help when you get the chance!

Kintsugi org
β€’
edited 3 days ago

I can reproduce this when trying to install via pip. When I made the requirements file, I only tested it with mamba, where it still works for me via

mamba env create -n dam -f requirements.txt
mamba activate dam

Could you try installing via mamba and see if that works for you? I believe it should be possible with pip but requires pointing it at the right index. If this works I can update the docs.

I was able to get the install and get pipeline = Pipeline() to run properly after a minor tweak to the command, and code to run after some changes to the script.

For install, I ran this to add the channels on my cluster.

mamba env create -n dam -f requirements.txt -c conda-forge -c pytorch -c nvidia

However, I was still getingt this error:

>>>pipeline = Pipeline()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/orcd/home/002/rfbrito/modeling/dam/pipeline.py", line 25, in __init__
    state_dict = torch.load(checkpoint, map_location=device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/orcd/home/002/rfbrito/.conda/envs/dam/lib/python3.12/site-packages/torch/serialization.py", line 1494, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 118

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.  

So I made the below fix, per the error (with unchanged lines before and after:

 self.device = device
        self.model = Classifier(**config)
        self.preprocessor = Preprocessor(**self.model.preprocessor_config)
        state_dict = torch.load(checkpoint, map_location=device, weights_only=False) # change here to add this flag
        self.model.load_state_dict(state_dict)
        self.model.to(self.device)
        self.model.eval()  

And i got the same error as before. After debugging with Claude, I was able to figure out that dam3.1.ckpt was just a pointer file (only 143b) not the actual file. I did the below:

 # Remove the pointer file
rm dam3.1.ckpt

# Download the real file directly
wget https://huggingface.co/KintsugiHealth/dam/resolve/main/dam3.1.ckpt

And then pipeline = Pipeline() worked!

A side note, I do get this warning:

>>>from pipeline import Pipeline
/orcd/home/002/rfbrito/.conda/envs/dam/lib/python3.12/site-packages/sympy/external/gmpy.py:139: UserWarning: gmpy2 version is too old to use (2.0.0 or newer required)
  gmpy = import_module('gmpy2', min_module_version=_GMPY2_MIN_VERSION,  `````


Kintsugi org

Thanks! I can add details on the mamba portion to the documentation.

Did you have git lfs and/or git xet installed? These should enable you to fetch the real file rather than the pointer without needing a separate wget command. I think you don't need to make the weights_only change once you have the real file, right?

This gmpy2 warning is weird. I'm getting it too now, but I don't remember seeing it before. I'm not sure if this could be related to only having tested the instructions on a GPU machine. According to mamba, the version of gmpy2 installed is the latest 2.3.0, but according to import gmpy2; print(gmpy2.__version__) it's 0.0.0. I tried a few of google's suggestions on that with no luck. Please let me know if it seems to be a blocker and/or you find a workaround.

Thank you! I don't have git lfs; I am trying to figure out a way to but I keep running into permissions errors on the cluster. Trying to see if I can figure that out.

gmpy2 was not a blocker!

Fortunately I was able to get the model running (very small note, had to update pipeline.run_on_file(file, quantized=True) per the getting started steps in the data card to pipeline.run_on_file(file, quantized=True) )

Kintsugi org

You're welcome! I might be missing something obvious or there might be a copy-paste error but it looks like you wrote pipeline.run_on_file(file, quantized=True) twice. What was the change?

oh copy-past error my bad! "quantize" vs "quantized" with no d is what i meant to say. Its different between the docs and wahts in the code unless Ive truly lost my mind:)

Kintsugi org

Oh that makes sense. I've updated the model card to be clearer about these issues. Thanks for bringing them to my attention. Let us know how things go with the model!

Sign up or log in to comment