openbmb
/

RLHF-V

 license: apache-2.0
 datasets:
 - Yirany/UniMM-Chat
+- HaoyeZhang/RLHF-V-Dataset
 language:
 - en
 library_name: transformers
 ---
+# Model Card for RLHF-V
+[Project Page](https://rlhf-v.github.io/)|[GitHub ](https://github.com/RLHF-V/RLHF-V)|[Demo](http://120.92.209.146:8081/)|[Paper](https://arxiv.org/abs/2312.00849)
+RLHF-V is an open-source multimodal large language model with the **lowest hallucination rate** on both long-form instructions and short-form questions.
+RLHF-V is trained on [RLHF-V-Dataset](https://huggingface.co/datasets/HaoyeZhang/RLHF-V-Dataset), which contains **fine-grained segment-level human corrections** on diverse instructions. The base model is trained on [UniMM-Chat](https://huggingface.co/datasets/Yirany/UniMM-Chat), which is a high-quality knowledge-intensive SFT dataset. We introduce a new method **Dense Direct Preference Optimization (DDPO)** that can make better use of the fine-grained annotations.
+For more details, please refer to our [paper](https://arxiv.org/abs/2312.00849).
+![Illustration of the RLHF-V frmework](https://rlhf-v.github.io/images/rlhf-v_framework.jpg)
+## Model Details
+### Model Description
+- **Trained from model:** Based on Vicuna-13B
+- **Trained on data:** [RLHF-V-Dataset](https://huggingface.co/datasets/HaoyeZhang/RLHF-V-Dataset)
+### Model Sources
+- **Project Page:** https://rlhf-v.github.io
+- **GitHub Repository:** https://github.com/RLHF-V/RLHF-V
+- **Demo:** http://120.92.209.146:8081
+- **Paper:** https://arxiv.org/abs/2312.00849
+## Performance
+Low hallucination rate while being informative:
+![fig2](https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/7xJEdKXeW33iKdHqJwvNN.png)
+More resistant to over-generalization, even compared to GPT-4V:
+![img](https://rlhf-v.github.io/images/over-generalization.jpg)
+## Citation
+If you find RLHF-V is useful in your work, please cite it with:
+```
+@article{2023rlhf-v,
+  author      = {Tianyu Yu and Yuan Yao and Haoye Zhang and Taiwen He and Yifeng Han and Ganqu Cui and Jinyi Hu and Zhiyuan Liu and Hai-Tao Zheng and Maosong Sun and Tat-Seng Chua},
+  title       = {RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback},
+  journal      = {arxiv},
+  year         = {2023},
+}
+```