How can I use this code to perform the "temporal image classification" task described in paper ?

by chenyuming - opened Aug 20, 2025

Discussion

chenyuming

Aug 20, 2025

Hello, I did not find the inference code for the 'temporal image classification' task.
Could you tell me where it is?

Thanks very much!

fepegar

Microsoft org Aug 20, 2025

Hi, @chenyuming . Have you tried snippet in the README?

chenyuming

Aug 23, 2025

Hi, @fepegar . The snippet in the README is performing a 'Temporal Sentence Similarity' analysis, which is a different task discussed in the paper. I have the following three questions, and I would be very grateful if you could answer them.

What I am confused about is the "zero-shot temporal image classification" task in the paper. According to the paper, this task was performed after "Fine-tuning BioViL-T for report generation." Is the currently open-source model the one that was performed after "fine-tune"?
Is the biovil_t_image_model_proj_size_128 a single-layer linear head on the image encoder, or a multi-layer classification head attached to the BioViL-T image encoder?
Is there any evaluation code for Section F.4 “Auto-regressive prompting for zero-shot temporal image classification” on GitHub?

Thank you for your contribution to the community through your reply.

fepegar

Microsoft org Aug 26, 2025

Maybe @shruthib and @sthyland can help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment