Has anyone tried adding positional embeddings to the image patches to improve the model?

#70

by jchiu1234 - opened Mar 20, 2024

Discussion

jchiu1234

Mar 20, 2024

Was thinking about trying to get very specific location information from the model. Has anyone tried this yet?

besiktas

Mar 27, 2024

Yeah, I have tried some form of it. I'm not sure if it will help (or didn't in my case) unless you have a large and diverse dataset to then train further with.

The base model seems really hit or miss with localization (meaning I will see it outperform other OCR tools on one sample but the next sample it has almost nil ability) and does not seem to train well for any downstream tasks that require localization (via box or point tags).

MJ-12

Oct 3, 2024

@besiktas any recommendations for other models that can achieve this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment