CoLLaVO & MoAI
Collection
Computer Vision-aided Efficient 7B size Large Language and Vision Models. Let's enjoy it • 2 items • Updated • 2
How to use BK-Lee/CoLLaVO-7B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="BK-Lee/CoLLaVO-7B") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("BK-Lee/CoLLaVO-7B", dtype="auto")How to use BK-Lee/CoLLaVO-7B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BK-Lee/CoLLaVO-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BK-Lee/CoLLaVO-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/BK-Lee/CoLLaVO-7B
How to use BK-Lee/CoLLaVO-7B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "BK-Lee/CoLLaVO-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BK-Lee/CoLLaVO-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "BK-Lee/CoLLaVO-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BK-Lee/CoLLaVO-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use BK-Lee/CoLLaVO-7B with Docker Model Runner:
docker model run hf.co/BK-Lee/CoLLaVO-7B
This repository contains the weights of the model presented in CoLLaVO: Crayon Large Language and Vision mOdel.
You need only the following seven steps.
You need only the following seven steps.
git clone https://github.com/ByungKwanLee/CoLLaVO
bash install
from PIL import Image
from torchvision.transforms import Resize
from torchvision.transforms.functional import pil_to_tensor
image_path = "figures/crayon_image.jpg"
image = Resize(size=(490, 490), antialias=False)(pil_to_tensor(Image.open(image_path)))
prompt = "Describe this image in detail."
from collavo.load_collavo import prepare_collavo
collavo_model, collavo_processor, seg_model, seg_processor = prepare_collavo(collavo_path='BK-Lee/CoLLaVO-7B', bits=4, dtype='fp16')
collavo_inputs = collavo_model.demo_process(image=image,
prompt=prompt,
processor=collavo_processor,
seg_model=seg_model,
seg_processor=seg_processor,
device='cuda:0')
import torch
with torch.inference_mode():
generate_ids = collavo_model.generate(**collavo_inputs, do_sample=True, temperature=0.9, top_p=0.95, max_new_tokens=256, use_cache=True)
answer = collavo_processor.batch_decode(generate_ids, skip_special_tokens=True)[0].split('[U')[0]
print(answer)
docker model run hf.co/BK-Lee/CoLLaVO-7B