LinWeizheDragon
/

FLMR

Feature Extraction

knowledge-based visual question answering

Model card Files Files and versions

LinWeizheDragon commited on Feb 27, 2024

Commit

05c02b4

·

verified ·

1 Parent(s): 23579dd

Create README.md

Files changed (1) hide show

README.md +78 -0

README.md ADDED Viewed

	@@ -0,0 +1,78 @@

+---
+library_name: transformers
+license: mit
+language:
+- en
+tags:
+- retrieval
+- multi-modal
+- knowledge-based visual question answering
+- FLMR
+- PreFLMR
+---
+# FLMR model card
+FLMR is an open-source model for multimodal knowledge retrieval. It is a transformer-based model that uses a combination of text and image inputs to retrieve relevant documents from a large corpus.
+## Model Details
+### Model Description
+- **Model type:** FLMRModelForRetrieval
+- **Language(s) (NLP):** English
+- **License:** MIT License
+### Paper and resources for more detail
+- **Blog Post for quick overview:** https://www.jinghong-chen.net/fined-grained-late-interaction-multimodal-retrieval-flmr/
+- **Paper:** https://openreview.net/forum?id=IWWWulAX7g
+- **Repository:** https://github.com/LinWeizheDragon/FLMR
+## Uses
+### Direct Use
+This model can be used directly to retrieve documents from a large corpus using a combination of text and image input queries. The retrieval usage can be found in the [official implementation](https://github.com/LinWeizheDragon/FLMR).
+### Downstream Use
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+This model can be used combined with language models to create a retrieval-augmented language model. The use for Knowledge-based VQA can be found in [RAVQA](https://github.com/linweizhedragon/retrieval-augmented-visual-question-answering)
+## How to Get Started with the Model
+For details of training, indexing, and performing retrieval, please refer to [here](https://github.com/LinWeizheDragon/FLMR).
+## Training datasets
+The model is pre-trained on
+1. Image to Text retrieval: WIT
+3. Image & Question to Text retrieval: OKVQA
+For details on the dataset split and conversion process, please refer to the paper [Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering](https://openreview.net/forum?id=IWWWulAX7g).
+The processed datasets are:
+- https://huggingface.co/datasets/BByrneLab/OKVQA_FLMR_preprocessed_data
+- https://huggingface.co/datasets/BByrneLab/OKVQA_FLMR_preprocessed_GoogleSearch_passages
+## Evaluation datasets
+The model is evaluated on OKVQA, Infoseek, and FVQA.
+Please find the evaluation results in [the paper](https://openreview.net/forum?id=IWWWulAX7g).
+## Citation
+**BibTeX:**
+```
+@inproceedings{
+  lin2023finegrained,
+  title={Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering},
+  author={Weizhe Lin and Jinghong Chen and Jingbiao Mei and Alexandru Coca and Bill Byrne},
+  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
+  year={2023},
+  url={https://openreview.net/forum?id=IWWWulAX7g}
+}
+```