Osprey-7b / README.md
nielsr's picture
nielsr HF Staff
Add model card for Osprey-7b
ef2c5e3 verified
|
raw
history blame
957 Bytes
metadata
library_name: transformers
pipeline_tag: image-text-to-text

Osprey: Pixel Understanding with Visual Instruction Tuning

Osprey: Pixel Understanding with Visual Instruction Tuning

Code

Osprey is a mask-text instruction tuning approach that extends MLLMs by incorporating pixel-wise mask regions into language instructions, enabling fine-grained visual understanding. Based on input mask region, Osprey generates semantic descriptions including short description and detailed description.

Our Osprey can seamlessly integrate with SAM in point-prompt, box-prompt and segmentation everything modes to generate the semantics associated with specific parts or objects.