bidiptas
/

PG-InstructBLIP

image-captioning

Model card Files Files and versions

bidiptas commited on Sep 4, 2023

Commit

1c7ae7d

·

1 Parent(s): 94f5f0e

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ pipeline_tag: image-to-text
 # PG-InstructBLIP model
-Finetuned version of InstructBLIP with Flan-T5-xxl as the language model. PG-InstructBLIP was introduced in the paper [Physically Grounded Vision-Language Models for Robotic Manipulation](https://iliad.stanford.edu/pg-vlm/) by Gao et al.
 ## Model description
@@ -20,6 +20,8 @@ PG-InstructBLIP is finetuned using the [PhysObjects dataset](https://drive.googl
 This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
 ```
 import torch
 from PIL import Image
@@ -41,6 +43,8 @@ vlm = load_model(
     device="cuda" if torch.cuda.is_available() else "cpu"
 )
 model_cls = registry.get_model_class('blip2_t5_instruct')
 model_type = 'flant5xxl'
 preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess

 # PG-InstructBLIP model
+Finetuned version of InstructBLIP with Flan-T5-XXL as the language model. PG-InstructBLIP was introduced in the paper [Physically Grounded Vision-Language Models for Robotic Manipulation](https://iliad.stanford.edu/pg-vlm/) by Gao et al.
 ## Model description
 This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
+After loading the model, you can disable the qformer text input to follow the same configuration we used for fine-tuning. However, the model still works well with it enabled, so we recommend users to experiment with both and choose the optimal configuration on a case-by-case basis.
 ```
 import torch
 from PIL import Image
     device="cuda" if torch.cuda.is_available() else "cpu"
 )
+vlm.qformer_text_input = False  # Optionally disable qformer text
 model_cls = registry.get_model_class('blip2_t5_instruct')
 model_type = 'flant5xxl'
 preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess