nielsr HF Staff commited on
Commit
ef2c5e3
·
verified ·
1 Parent(s): 50b6ac5

Add model card for Osprey-7b

Browse files

This PR adds a model card for the Osprey-7b model.

It includes:
- A link to the paper [Osprey: Pixel Understanding with Visual Instruction Tuning](https://huggingface.co/papers/2312.10032).
- The `pipeline_tag: image-text-to-text` metadata, ensuring the model appears in relevant searches.
- The `library_name: transformers` metadata, as the model is compatible with the Hugging Face `transformers` library.
- A link to the GitHub repository: https://github.com/CircleRadon/Osprey.
- A concise overview of the model's features.

Please review and merge this PR if it looks good.

Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: image-text-to-text
4
+ ---
5
+
6
+ # Osprey: Pixel Understanding with Visual Instruction Tuning
7
+
8
+ [Osprey: Pixel Understanding with Visual Instruction Tuning](https://huggingface.co/papers/2312.10032)
9
+
10
+ [Code](https://github.com/CircleRadon/Osprey)
11
+
12
+ Osprey is a mask-text instruction tuning approach that extends MLLMs by incorporating pixel-wise mask regions into language instructions, enabling **fine-grained visual understanding**. Based on input mask region, Osprey generates semantic descriptions including **short description** and **detailed description**.
13
+
14
+ Our Osprey can seamlessly integrate with [SAM](https://github.com/facebookresearch/segment-anything) in point-prompt, box-prompt and segmentation everything modes to generate the semantics associated with specific parts or objects.
15
+
16
+ <p align="center" width="100%">
17
+ <img src="https://github.com/CircleRadon/Osprey/raw/main/assets/osprey.png" width="90%">
18
+ </p>