Instructions to use inceptionai/jais-13b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inceptionai/jais-13b-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="inceptionai/jais-13b-chat", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("inceptionai/jais-13b-chat", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use inceptionai/jais-13b-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "inceptionai/jais-13b-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inceptionai/jais-13b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/inceptionai/jais-13b-chat
- SGLang
How to use inceptionai/jais-13b-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "inceptionai/jais-13b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inceptionai/jais-13b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "inceptionai/jais-13b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inceptionai/jais-13b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use inceptionai/jais-13b-chat with Docker Model Runner:
docker model run hf.co/inceptionai/jais-13b-chat
I am unable to deploy the model for inference.
The logs do not show an error, but an error is thrown:
Not sure how to fix this.
Logs (from the end):
......
......
2024/04/12 00:18:35 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 168.4/168.4 MB 213.0 MB/s eta 0:00:00
2024/04/12 00:18:35 ~ Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
2024/04/12 00:18:35 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 54.6/54.6 MB 221.1 MB/s eta 0:00:00
2024/04/12 00:18:35 ~ Downloading nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
2024/04/12 00:18:35 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 102.6/102.6 MB 208.2 MB/s eta 0:00:00
2024/04/12 00:18:35 ~ Downloading nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
2024/04/12 00:18:36 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 173.2/173.2 MB 186.2 MB/s eta 0:00:00
2024/04/12 00:18:36 ~ Downloading nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
2024/04/12 00:18:37 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 177.1/177.1 MB 164.4 MB/s eta 0:00:00
2024/04/12 00:18:37 ~ Downloading nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
2024/04/12 00:18:37 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 98.6/98.6 kB 342.7 MB/s eta 0:00:00
2024/04/12 00:18:37 ~ Downloading triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
2024/04/12 00:18:38 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 63.3/63.3 MB 202.9 MB/s eta 0:00:00
2024/04/12 00:18:38 ~ Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
2024/04/12 00:18:38 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 7.8/7.8 MB 143.7 MB/s eta 0:00:00
2024/04/12 00:18:38 ~ Downloading lit-18.1.3-py3-none-any.whl (96 kB)
2024/04/12 00:18:38 ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 96.4/96.4 kB 368.4 MB/s eta 0:00:00
2024/04/12 00:18:46 ~ Installing collected packages: tokenizers, lit, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, nvidia-cusolver-cu11, nvidia-cudnn-cu11, transformers, triton, torch, accelerate
2024/04/12 00:18:46 ~ Attempting uninstall: tokenizers
2024/04/12 00:18:46 ~ Found existing installation: tokenizers 0.15.2
2024/04/12 00:18:46 ~ Uninstalling tokenizers-0.15.2:
2024/04/12 00:18:46 ~ Successfully uninstalled tokenizers-0.15.2
Hello
Did you find any solution? I have the same problem.
