Thinking trace reproducibility
#3
by
stumbledparams - opened
Hi,
I was trying to reproduce some of the thinking trace using inference with this model VLM-R1-Qwen2.5VL-3B-OVD-0321 for OVD on the D3 dataset -- but the thinking traces are not extensive as reported in the paper. Is there a sample code to follow?
I am using this prompt:
def build_ovd_prompt(labels):
#VLM-R1
lbl = "\n- " + "\n- ".join(labels) # paper uses a list format for targets
q = (
f"Please carefully check the image and detect the following objects: {lbl}. "
"Output each detected target's bbox coordinates in JSON format."
"The format of the bbox coordinates is:\n"
"json\n" '[{"bbox_2d": [x1, y1, x2, y2], "label": "target name"},\n' ' {"bbox_2d": [x1, y1, x2, y2], "label": "target name"}]\n' "\n"
"If there are no such targets in the image, simply respond with None."
"Output the thinking process in and final answer in tags."
)
return q
Thanks,