Instructions to use adept/fuyu-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use adept/fuyu-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="adept/fuyu-8b")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("adept/fuyu-8b") model = AutoModelForImageTextToText.from_pretrained("adept/fuyu-8b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use adept/fuyu-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "adept/fuyu-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "adept/fuyu-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/adept/fuyu-8b
- SGLang
How to use adept/fuyu-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "adept/fuyu-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "adept/fuyu-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "adept/fuyu-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "adept/fuyu-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use adept/fuyu-8b with Docker Model Runner:
docker model run hf.co/adept/fuyu-8b
crash kernel
Hello, when I try to run the following script, my environment crashes suddenly. If someone could help me, please.
The Kernel has become unresponsive while executing the code in the active cell or a previous cell. Please check the code in the cells to identify a possible cause of the failure. Click here for more information. For additional details, consult the Jupyter log.
How many images are there in the folder you preprocess before loading the model? What does the jupyter log say? How much system and GPU memory is being used when you run the notebook?
I would recommend you split the cell in two: one to load the model (run this first), and another one to process the images. My bet is you might be running out of system memory.
Thank you for your quick response.
There are only 2 images in the folder. Jupyter redirects me to this page: https://github.com/microsoft/vscode-jupyter/wiki/Kernel-crashes. The script still crashes at 43 seconds.
So, I split the code into two parts, with downloading and processing the image. However, it still crashes during the model download. The code crashes when it reaches my 16GB of RAM, which is 100% utilization.
What should I do?
Do you have a GPU? How much memory does it have? Fuyu requires ~20 GB of RAM to run in half precision, and double as much in full precision. In addition, if you install accelerate (using pip install accelerate) you can load the model directly on GPU instead of using your system memory and move the weights to GPU later. The following snippet uses both techniques to load the model:
from transformers import FuyuProcessor, FuyuForCausalLM
import torch
model_id = "adept/fuyu-8b"
processor = FuyuProcessor.from_pretrained(model_id)
model = FuyuForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="cuda")
Yes, I have a GPU and it has 8GB of RAM. On my computer, I have 16GB of DDR4, so that's probably why it crashes. Do you think I should run the code on a cloud, like Google Colab for example?
Thank you in advance!
Yes, 8 GB of GPU RAM is not much for these large models.
Hi,
I have same issue here. I am trying to run this model in GPU but it gets out of memory. I have a 3080Ti with 12GiB and my computer has 32GiB.