STING-BEE 7B

STING-BEE, the first domain-aware visual AI assistant for X-ray baggage security. STING-BEE unifies scene comprehension, referring threat localization, visual grounding, and visual question answering (VQA), establishing new benchmarks for multi-modal learning in X-ray security research. Furthermore, it demonstrates state-of-the-art generalization across cross-domain settings, outperforming existing models in handling real-world threat detection scenarios. It is trained on our public multimodal dataset, STCray, which features image-text pairs across 21 threat categories, including complex concealment and novel threat types like IEDs and 3D-printed firearms.


πŸ“š Model Sources


πŸ”– BibTeX

@article{velayudhan2025stingbee,
  title={STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection},
  author={Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari, Neha Gour, Abderaouf Behouch, Taimur Hassan, Syed Talal Wasim, Nabil Maalej, Muzammal Naseer, Juergen Gall, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi},
  journal={CVPR},
  year={2025}
}
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Divs1159/stingbee-7b