Spaces:

usm3d
/

S23DR2026

Running

Corrupt image in training set

by ghanning - opened Apr 2

Apr 2

Iterating over the training dataset fails with the following error:

PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f4b76b54ea0>

It happens at iteration 13077/19677. Not sure which scene but it appears to contain a corrupt image file.

ghanning

Apr 7

To reproduce:

from datasets import load_dataset

dataset = load_dataset("usm3d/hoho22k_2026_trainval", trust_remote_code=True)["train"]

for data in dataset:
    pass

dmytromishkin

Urban Scene Modeling Competition CVPR 2026 (Image Track) org Apr 7

Hi,

Thank you for the report! I am not sure if that is easy for us to update the training set in place due to HF size limits (if we just push new version), and due to the gated access (if we delete the dataset and upload new one under same name). Probably the best course of action is to get image inside try/except.
I apologize for the issue and will try to fix it in the mean time.

--
Best, Dmytro

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment