File size: 1,239 Bytes
5a3a1dc 012204e 5a3a1dc 012204e 5a3a1dc 012204e 5a3a1dc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-text-to-text
---
# HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection
[](https://arxiv.org/abs/2510.05609)
HOI-R1 is inspired by recent advances in reinforcement learning for large language models and investigates how vision-language models can reason about and detect human-object interactions more effectively.
---
## 🔍 Overview
- **Task**: Human-Object Interaction Detection (HOID)
- **Our Motivation**:
Leverage the reasoning capability of Multimodal LLMs and reinforcement learning–style optimization to explore HOI detection performance.
---

## 📌 Citation
If you find this work useful, please consider citing:
```bibtex
@article{chen2025hoi,
title={HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection},
author={Chen, Junwen and Xiong, Peilin and Yanai, Keiji},
journal={arXiv preprint arXiv:2510.05609},
year={2025}
}
|