Test-time search over ordered visual tokens.
Generate object masks on images with SAM2
Segment objects in images using text prompts or scribbles