| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | pipeline_tag: image-text-to-text |
| | --- |
| | |
| | # Model Card: Reflective LLaVA (ReflectiVA) |
| |
|
| | Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data. |
| | They have recently garnered attention due to their capability to address complex tasks involving both modalities. |
| | However, their effectiveness is limited to the knowledge acquired during training, which restricts their practical utility. |
| | In this work, we introduce a novel method to enhance the adaptability of MLLMs by integrating external knowledge sources. |
| | Our proposed model, Reflective LLaVA (```ReflectiVA```), utilizes reflective tokens to dynamically determine the need for external knowledge |
| | and predict the relevance of information retrieved from an external database. |
| | Tokens are trained following a two-stage two-model training recipe. This ultimately enables the MLLM to manage external knowledge |
| | while preserving fluency and performance on tasks where external knowledge is not needed. |
| |
|
| | The efficacy of ```ReflectiVA``` for knowledge-based visual question answering, highlighting its |
| | superior performance compared to existing methods. |
| |
|
| |
|
| | In this model space, you will find the Overall Model (stage two) weights of ```ReflectiVA```. |
| |
|
| | For more information, visit our [ReflectiVA repository](https://github.com/aimagelab/ReflectiVA), our [project page](https://aimagelab.github.io/ReflectiVA/) and the [dataset](https://huggingface.co/datasets/aimagelab/ReflectiVA-Data). |
| |
|
| | ## Citation |
| | If you make use of our work, please cite our repo: |
| |
|
| | ```bibtex |
| | @article{cocchi2024augmenting, |
| | title={{Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering}}, |
| | author={Cocchi, Federico and Moratelli, Nicholas and Cornia, Marcella and Baraldi, Lorenzo and Cucchiara, Rita}, |
| | journal={arXiv}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | ## Paper page |
| |
|
| | Paper can be found at https://huggingface.co/papers/2411.16863. |