Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ pipeline_tag: image-to-text
|
|
| 9 |
|
| 10 |
# PG-InstructBLIP model
|
| 11 |
|
| 12 |
-
Finetuned version of InstructBLIP with Flan-T5-
|
| 13 |
|
| 14 |
## Model description
|
| 15 |
|
|
@@ -20,6 +20,8 @@ PG-InstructBLIP is finetuned using the [PhysObjects dataset](https://drive.googl
|
|
| 20 |
|
| 21 |
This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
|
| 22 |
|
|
|
|
|
|
|
| 23 |
```
|
| 24 |
import torch
|
| 25 |
from PIL import Image
|
|
@@ -41,6 +43,8 @@ vlm = load_model(
|
|
| 41 |
device="cuda" if torch.cuda.is_available() else "cpu"
|
| 42 |
)
|
| 43 |
|
|
|
|
|
|
|
| 44 |
model_cls = registry.get_model_class('blip2_t5_instruct')
|
| 45 |
model_type = 'flant5xxl'
|
| 46 |
preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess
|
|
|
|
| 9 |
|
| 10 |
# PG-InstructBLIP model
|
| 11 |
|
| 12 |
+
Finetuned version of InstructBLIP with Flan-T5-XXL as the language model. PG-InstructBLIP was introduced in the paper [Physically Grounded Vision-Language Models for Robotic Manipulation](https://iliad.stanford.edu/pg-vlm/) by Gao et al.
|
| 13 |
|
| 14 |
## Model description
|
| 15 |
|
|
|
|
| 20 |
|
| 21 |
This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
|
| 22 |
|
| 23 |
+
After loading the model, you can disable the qformer text input to follow the same configuration we used for fine-tuning. However, the model still works well with it enabled, so we recommend users to experiment with both and choose the optimal configuration on a case-by-case basis.
|
| 24 |
+
|
| 25 |
```
|
| 26 |
import torch
|
| 27 |
from PIL import Image
|
|
|
|
| 43 |
device="cuda" if torch.cuda.is_available() else "cpu"
|
| 44 |
)
|
| 45 |
|
| 46 |
+
vlm.qformer_text_input = False # Optionally disable qformer text
|
| 47 |
+
|
| 48 |
model_cls = registry.get_model_class('blip2_t5_instruct')
|
| 49 |
model_type = 'flant5xxl'
|
| 50 |
preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess
|