Instructions to use SkunkworksAI/BakLLaVA-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SkunkworksAI/BakLLaVA-1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SkunkworksAI/BakLLaVA-1")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("SkunkworksAI/BakLLaVA-1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SkunkworksAI/BakLLaVA-1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SkunkworksAI/BakLLaVA-1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SkunkworksAI/BakLLaVA-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SkunkworksAI/BakLLaVA-1
- SGLang
How to use SkunkworksAI/BakLLaVA-1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SkunkworksAI/BakLLaVA-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SkunkworksAI/BakLLaVA-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SkunkworksAI/BakLLaVA-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SkunkworksAI/BakLLaVA-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SkunkworksAI/BakLLaVA-1 with Docker Model Runner:
docker model run hf.co/SkunkworksAI/BakLLaVA-1
Inference Endpoints throwing an error
Hi,
I'm trying to run the model through HF Inference Endpoints for a quick POC. I'm running into this particular issue:
2023/10/19 17:56:21 ~ INFO | No custom pipeline found at /repository/handler.py
2023/10/19 17:56:21 ~ INFO | Using device GPU
2023/10/19 17:56:21 ~ 2023-10-19 15:56:21,563 | INFO | Initializing model from directory:/repository
2023/10/19 17:56:21 ~ KeyError: 'llava_mistral'
2023/10/19 17:56:21 ~ self.pipeline = get_pipeline(model_dir=model_dir, task=task)
2023/10/19 17:56:21 ~ inference_handler = get_inference_handler_either_custom_or_default_handler(HF_MODEL_DIR, task=HF_TASK)
2023/10/19 17:56:21 ~ config = AutoConfig.from_pretrained(model, _from_pipeline=task, **hub_kwargs, **model_kwargs)
2023/10/19 17:56:21 ~ File "/app/huggingface_inference_toolkit/handler.py", line 17, in init
2023/10/19 17:56:21 ~ File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 710, in getitem
2023/10/19 17:56:21 ~ File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 998, in from_pretrained
2023/10/19 17:56:21 ~ File "/app/huggingface_inference_toolkit/handler.py", line 45, in get_inference_handler_either_custom_or_default_handler
2023/10/19 17:56:21 ~ File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 584, in aenter
2023/10/19 17:56:21 ~ async with self.lifespan_context(app) as maybe_state:
2023/10/19 17:56:21 ~ File "/app/huggingface_inference_toolkit/utils.py", line 261, in get_pipeline
2023/10/19 17:56:21 ~ await handler()
2023/10/19 17:56:21 ~ File "/opt/conda/lib/python3.9/site-packages/transformers/pipelines/init.py", line 705, in pipeline
2023/10/19 17:56:21 ~ await self._router.startup()
2023/10/19 17:56:21 ~ File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 705, in lifespan
2023/10/19 17:56:21 ~ Traceback (most recent call last):
2023/10/19 17:56:21 ~ File "/app/webservice_starlette.py", line 57, in some_startup_task
2023/10/19 17:56:21 ~ return HuggingFaceHandler(model_dir=model_dir, task=task)
2023/10/19 17:56:21 ~ File "/opt/conda/lib/python3.9/site-packages/starlette/routing.py", line 682, in startup
2023/10/19 17:56:21 ~ Application startup failed. Exiting.
Can you let me know if there's anything going wrong with my setup?
Thanks!
@kk-envision i made a notebook which you can use for the inference.
I have no idea what is "Inference Endpoints", but you can make your own API with this.
Let me know if this was helpful.
@kk-envision simply put transformers doesnt have mistral_llava support. You have to use the bakllava repository