Fails to load model

#5
by Tofandel - opened

After installing all the cuda packages and pytorch, I tried to run the example.py but I got

Traceback (most recent call last): File "/home/tofandel/PhpstormProjects/nemotron-ocr-v1/./example.py", line 37, in main( File "/home/tofandel/PhpstormProjects/nemotron-ocr-v1/./example.py", line 11, in main ocr_pipeline = NemotronOCR() ^^^^^^^^^^^^^ File "/home/tofandel/.pyenv/versions/3.12.12/lib/python3.12/site-packages/nemotron_ocr/inference/pipeline.py", line 48, in init self._load_models() File "/home/tofandel/.pyenv/versions/3.12.12/lib/python3.12/site-packages/nemotron_ocr/inference/pipeline.py", line 55, in _load_models self.detector.load_state_dict(torch.load(self._model_dir / "detector.pth"), strict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tofandel/.pyenv/versions/3.12.12/lib/python3.12/site-packages/torch/serialization.py", line 1553, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None _pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. Please file an issue with the following so that we can make weights_only=True compatible with your use case: WeightsUnpickler error: Unsupported operand 118

I checked

cat checkpoints/detector.pth
version https://git-lfs.github.com/spec/v1
oid sha256:8b7d50c74b2dba9acb8dd76d2fbcf75e6eeae0cb3e9688edf42c91aa5550ade1
size 181677320

From what I got, those are the model files which is what hugging face normally downloads with it's cli and xet.

I did run hf download nvidia/nemotron-ocr-v1. But the files in that directory didn't update.

What am I missing?

I copied the 4 files manually from the hugging face blobs and it works. I don't know why it didn't do that in the first place

you probably didnt have lfs git extension installed and activated ?
i did :
sudo apt install git-lfs
git install lfs
and then git clone https://huggingface.co/nvidia/nemotron-ocr-v1
like it says in the documentation
I then ran docker compose up and it worked for me (after i upgraded my nvidia drivers to 580 as i was on 575 and that is not compatible with the base image)

hope this helps

NVIDIA org

Yes, the likely cause was not having git-lfs to download the large model artifacts. Without it only the code will be downloaded. Please reopen if you have further issues and I can look into it.

emelryan changed discussion status to closed

Sign up or log in to comment