is there any caching mechanism?

by xingwang1234 - opened Oct 23, 2024

Oct 23, 2024

•

edited Oct 23, 2024

when I load the model into GPU memory, total consumed GPU memory is 1318MiB
I only changed the input image, did not change the input text.
after the 1st inference, total consumed GPU memory is 2766MiB
after the 2nd inference, total consumed GPU memory is 3424MiB
after the 3rd inference, total consumed GPU memory is 3424MiB
after the 4th inference, total consumed GPU memory is 3424MiB
after the 5th inference, total consumed GPU memory is 3424MiB
why do the 1st inference and 2nd inference increase GPU memory so much, I wonder if there is some caching mechanism inside the codes?

when I used torch.cuda.empty_cache() to release GPU memory after each inference is finished.
after the 1st inference, total consumed GPU memory is 1516MiB
after the 2nd inference, total consumed GPU memory is 1552MiB
after the 3rd inference, total consumed GPU memory is 1530MiB
after the 4th inference, total consumed GPU memory is 1516MiB
after the 5th inference, total consumed GPU memory is 1542MiB

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment