HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection
HOI-R1 is inspired by recent advances in reinforcement learning for large language models and investigates how vision-language models can reason about and detect human-object interactions more effectively.
π Overview
- Task: Human-Object Interaction Detection (HOID)
- Our Motivation:
Leverage the reasoning capability of Multimodal LLMs and reinforcement learningβstyle optimization to explore HOI detection performance.
π Citation
If you find this work useful, please consider citing:
@article{chen2025hoi,
title={HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection},
author={Chen, Junwen and Xiong, Peilin and Yanai, Keiji},
journal={arXiv preprint arXiv:2510.05609},
year={2025}
}
- Downloads last month
- -
Model tree for thxplz/HOI-R1_Qwen2.5-VL-3B-Instruct
Base model
Qwen/Qwen2.5-VL-3B-Instruct