torch.OutOfMemoryError: CUDA out of memory

by leocneves - opened Nov 17, 2025

Nov 17, 2025

I am using the webcam in real-time, but after a few minutes, it crashes with an out of memory error. Has anyone solved this?

        edgetam_video_output = model(inference_session=inference_session, frame=inputs.pixel_values[0])
        
        video_res_masks = processor.post_process_masks(
            [edgetam_video_output.pred_masks], original_sizes=inputs.original_sizes, binarize=False
        )[0]

yonigozlan

Owner Nov 17, 2025

Hello @leocneves ! Where are you storing the inference states and processed video frames when instantiating the inference session?
Having something like this should alleviate the storage issue on cuda memory, but it will be slightly slower:

inference_session = processor.init_video_session(
    video=video_frames,
    inference_device="cuda",
    inference_state_device="cpu",
    processing_device="cpu",
    video_storage_device="cpu",
    dtype=torch.bfloat16,
)

Granted the states will still accumulate on RAM, which might cause issues for long sessions as well. You could manually progressively delete old states in the inference session as a temporary solution, but if this is a recurring problem, we can try to add something to do this cleanly on Transformers directly. Feel free to open an issue on Transformers!

leocneves

Nov 22, 2025

Thanks for the help! I solved it by recreating the session every N frames.
Basically, I keep the new processor in memory and use 5 random points from the last largest mask to forward the target object… It works like a charm!

leocneves changed discussion status to closed Nov 22, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment