liuhaotian/LLaVA-CC3M-Pretrain-595K
Preview • Updated • 577 • 178
How to use qresearch/llama-3-vision-alpha-hf with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="qresearch/llama-3-vision-alpha-hf", trust_remote_code=True)
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages) # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("qresearch/llama-3-vision-alpha-hf", trust_remote_code=True, dtype="auto")How to use qresearch/llama-3-vision-alpha-hf with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "qresearch/llama-3-vision-alpha-hf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "qresearch/llama-3-vision-alpha-hf",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker model run hf.co/qresearch/llama-3-vision-alpha-hf
How to use qresearch/llama-3-vision-alpha-hf with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "qresearch/llama-3-vision-alpha-hf" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "qresearch/llama-3-vision-alpha-hf",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "qresearch/llama-3-vision-alpha-hf" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "qresearch/llama-3-vision-alpha-hf",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'How to use qresearch/llama-3-vision-alpha-hf with Docker Model Runner:
docker model run hf.co/qresearch/llama-3-vision-alpha-hf
projection module trained to add vision capabilties to Llama 3 using SigLIP. built by @yeswondwerr and @qtnx_
usable directly in Transformers
usage
pip install torch transformers pillow
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
llm_int8_skip_modules=["mm_projector", "vision_model"],
)
model_id = "qresearch/llama-3-vision-alpha-hf"
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=bnb_cfg,
)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
use_fast=True,
)
image = Image.open("image_path")
print(
tokenizer.decode(
model.answer_question(image, "question", tokenizer),
skip_special_tokens=True,
)
)
examples
| Image | Examples |
|---|---|
![]() |
What is the title of this book? answer briefly The title of the book is "The Little Book of Deep Learning". Where is the person standing? answer briefly The person is standing on the balcony. Describe the image The image shows a person holding a book with a cityscape visible through the window behind them. The book has a cover with a title that reads "The Little Book of Deep Learning" in bold letters. |
![]() |
What type of food is the girl holding? answer briefly A hamburger! What color is the woman's hair? answer briefly It's white! Describe the image The image is of a young girl with short, curly hair and a sweet smile, holding a giant hamburger in her hand. She's sitting at a table with a festive dinner setting, surrounded by candles and a warm glow. Her eyes are shining with excitement and contentment as she takes a big bite of the burger. |
acknowledgements
.x+=:.
z` ^% .uef^"
.u . . <k .u . :d88E
.u@u .d88B :@8c .u .@8Ned8" .u u .d88B :@8c . `888E
.zWF8888bx ="8888f8888r ud8888. .@^%8888" ud8888. us888u. ="8888f8888r .udR88N 888E .z8k
.888 9888 4888>'88" :888'8888. x88: `)8b. :888'8888. .@88 "8888" 4888>'88" <888'888k 888E~?888L
I888 9888 4888> ' d888 '88%" 8888N=*8888 d888 '88%" 9888 9888 4888> ' 9888 'Y" 888E 888E
I888 9888 4888> 8888.+" %8" R88 8888.+" 9888 9888 4888> 9888 888E 888E
I888 9888 .d888L .+ 8888L @8Wou 9% 8888L 9888 9888 .d888L .+ 9888 888E 888E
`888Nx?888 ^"8888*" '8888c. .+ .888888P` '8888c. .+ 9888 9888 ^"8888*" ?8888u../ 888E 888E
"88" '888 "Y" "88888% ` ^"F "88888% "888*""888" "Y" "8888P' m888N= 888>
88E "YP' "YP' ^Y" ^Y' "P' `Y" 888
98> J88"
'8 @%
` :"