VALOR-GroundingDINO / README.md

dmarsili

Update README.md

b2cc72d verified 1 day ago

preview code

raw

history blame contribute delete

1.11 kB

metadata

license: mit
datasets:
  - lmms-lab/GQA
  - dmarsili/Omni3D-Bench
  - cambridgeltl/vsr_random
  - snowclipsed/TallyQA
language:
  - en
base_model:
  - ShilongLiu/GroundingDINO
pipeline_tag: object-detection
tags:
  - object-detection
  - computer-vision

Model Card for VALOR-GroundingDINO

This is the verified-tuned GroundingDINO model from the paper: No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers

For further information please refer to the project webpage, paper, and repository.

Citation

If you use VALOR in your research, please consider citing our work:

BibTeX:

@misc{marsili2025labelsproblemtrainingvisual,
      title={No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers}, 
      author={Damiano Marsili and Georgia Gkioxari},
      year={2025},
      eprint={2512.08889},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.08889}, 
}