Instructions to use jadechoghari/Ferret-UI-Gemma2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jadechoghari/Ferret-UI-Gemma2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="jadechoghari/Ferret-UI-Gemma2b", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("jadechoghari/Ferret-UI-Gemma2b", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jadechoghari/Ferret-UI-Gemma2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jadechoghari/Ferret-UI-Gemma2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jadechoghari/Ferret-UI-Gemma2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/jadechoghari/Ferret-UI-Gemma2b

SGLang

How to use jadechoghari/Ferret-UI-Gemma2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jadechoghari/Ferret-UI-Gemma2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jadechoghari/Ferret-UI-Gemma2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jadechoghari/Ferret-UI-Gemma2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jadechoghari/Ferret-UI-Gemma2b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use jadechoghari/Ferret-UI-Gemma2b with Docker Model Runner:
```
docker model run hf.co/jadechoghari/Ferret-UI-Gemma2b
```

Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring, grounding, and reasoning tasks. Built on Gemma-2B and Llama-3-8B, it is capable of executing complex UI tasks. This is the Gemma-2B version of ferret-ui. It follows from this paper by Apple.

How to Use 🤗📱

You will need first to download builder.py, conversation.py, inference.py, model_UI.py, and mm_utils.py locally.

wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/mm_utils.py

Usage:

from inference import inference_and_run
image_path = "appstore_reminders.png"
prompt = "Describe the image in details"

# Call the function without a box
inference_text = inference_and_run(image_path, prompt, conv_mode="ferret_gemma_instruct", model_path="jadechoghari/Ferret-UI-Gemma2b")

# Output processed text
print("Inference Text:", inference_text)

# Task with bounding boxes
image_path = "appstore_reminders.png"
prompt = "What's inside the selected region?"
box = [189, 906, 404, 970]

inference_text = inference_and_run(
    image_path=image_path, 
    prompt=prompt, 
    conv_mode="ferret_gemma_instruct", 
    model_path="jadechoghari/Ferret-UI-Gemma2b", 
    box=box
)
# you could also pass process_image=True
# to output: processed_image, inference_text = inference_and_run(...., process_image=True)

print("Inference Text:", inference_text)

# GROUNDING PROMPTS
GROUNDING_TEMPLATES = [
    '\nProvide the bounding boxes of the mentioned objects.',
     '\nInclude the coordinates for each mentioned object.',
    '\nLocate the objects with their coordinates.',
    '\nAnswer in [x1, y1, x2, y2] format.',
    '\nMention the objects and their locations using the format [x1, y1, x2, y2].',
    '\nDraw boxes around the mentioned objects.',
    '\nUse boxes to show where each thing is.',
    '\nTell me where the objects are with coordinates.',
    '\nList where each object is with boxes.',
    '\nShow me the regions with boxes.'
]

Downloads last month: 35

Safetensors

Model size

3B params

Tensor type

BF16

Space using jadechoghari/Ferret-UI-Gemma2b 1

Paper for jadechoghari/Ferret-UI-Gemma2b

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83