thxplz
/

HOI-R1_Qwen2.5-VL-3B-Instruct

Image-Text-to-Text

Model card Files Files and versions

thxplz commited on 12 days ago

Commit

012204e

·

verified ·

1 Parent(s): 5a3a1dc

Update README.md

Files changed (1) hide show

README.md +18 -4

README.md CHANGED Viewed

@@ -7,17 +7,31 @@ pipeline_tag: image-text-to-text
 # HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection
-[paper](https://arxiv.org/abs/2510.05609)
 ![hoi-r1-arch](https://cdn-uploads.huggingface.co/production/uploads/63119ce2fb65b9a3e2f75e3c/tHYWwrnqBAHsoo8lIOtnM.jpeg)
-## Reference
-```text
 @article{chen2025hoi,
   title={HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection},
   author={Chen, Junwen and Xiong, Peilin and Yanai, Keiji},
   journal={arXiv preprint arXiv:2510.05609},
   year={2025}
 }
-```

 # HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection
+[![arXiv](https://img.shields.io/badge/arXiv-2510.05609-b31b1b.svg)](https://arxiv.org/abs/2510.05609)
+This repository contains the official resources for **HOI-R1**, a research project that explores the potential of **Multimodal Large Language Models (MLLMs)** for **Human-Object Interaction (HOI) Detection**.
+HOI-R1 is inspired by recent advances in reinforcement learning for large language models and investigates how vision-language models can reason about and detect human-object interactions more effectively.
+---
+## 🔍 Overview
+- **Task**: Human-Object Interaction Detection (HOID)
+- **Our Motivation**:
+  Leverage the reasoning capability of Multimodal LLMs and reinforcement learning–style optimization to explore HOI detection performance.
+---
 ![hoi-r1-arch](https://cdn-uploads.huggingface.co/production/uploads/63119ce2fb65b9a3e2f75e3c/tHYWwrnqBAHsoo8lIOtnM.jpeg)
+## 📌 Citation
+If you find this work useful, please consider citing:
+```bibtex
 @article{chen2025hoi,
   title={HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection},
   author={Chen, Junwen and Xiong, Peilin and Yanai, Keiji},
   journal={arXiv preprint arXiv:2510.05609},
   year={2025}
 }