Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Corrupt image in training set
#2
by ghanning - opened
Iterating over the training dataset fails with the following error:
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f4b76b54ea0>
It happens at iteration 13077/19677. Not sure which scene but it appears to contain a corrupt image file.
To reproduce:
from datasets import load_dataset
dataset = load_dataset("usm3d/hoho22k_2026_trainval", trust_remote_code=True)["train"]
for data in dataset:
pass
Hi,
Thank you for the report! I am not sure if that is easy for us to update the training set in place due to HF size limits (if we just push new version), and due to the gated access (if we delete the dataset and upload new one under same name). Probably the best course of action is to get image inside try/except.
I apologize for the issue and will try to fix it in the mean time.
--
Best, Dmytro