Different Behavior between this hf model and the local model
Hi, thanks for your excellent job.
When testing the grounding dino with the provided "how to use" code, I change the input text from "a cat. a remote control." to "a cat. a remote control. the right cat". I find that the grounding-dino(hf) tends to match all detected cat boxes to the phrase "a cat," while the locally installed code with "groundingdino_swinb_cogcoor.pth" always matches the box of the right cat to the phrase "the right cat" and ignore "a cat."
I'm curious whether these two base models are the same. If they are different, which one will perform better?
I'm having similar issues! I basically copied the script from the model card, so it does include the postprocessing.
Ok no sorry, my bad. The difference was that I had only a single object category for the original checkpoint and two different categories for the HF-hub model, and apparently, that changes the output ( the confidence scores, the box proposals seem to be the same).