Instructions to use inceptionai/jais-13b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inceptionai/jais-13b-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="inceptionai/jais-13b-chat", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("inceptionai/jais-13b-chat", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use inceptionai/jais-13b-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "inceptionai/jais-13b-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inceptionai/jais-13b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/inceptionai/jais-13b-chat
- SGLang
How to use inceptionai/jais-13b-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "inceptionai/jais-13b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inceptionai/jais-13b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "inceptionai/jais-13b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inceptionai/jais-13b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use inceptionai/jais-13b-chat with Docker Model Runner:
docker model run hf.co/inceptionai/jais-13b-chat
deploy the model on cloud machine
Hello all, I tried downloading the model locally and after the download finished i tried to run the sample code and it showed an error related to offload folder path and I did not manage to solve it, actually I don't know what is that..
So, I'm trying to deploy the model on a virtual machine to have the suitable specs.. am using runpod
and i have this error on the 6th model download
ERROR text_generation launcher: An error occurred while downloading using hf_transfer. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
can any one help with any of the issues with how to use it locally, step-by-step guide for the regular level laptops or the steps to deploy on cloud and use it with apis
thanks
Hi @MazenSiraj ,
The issue with offload folder can be solved by adding offload_folder='offload'self.model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", offload_folder='offload', trust_remote_code=True)
i have submitted a pull request so that the model can be deployed on hface inference endpoint
https://huggingface.co/inception-mbzuai/jais-13b-chat/discussions/12
while PR is being reviewed you can check out my copy of this model which already has those changes - see button deploy in top right corner
please note that you will need a beefy machine to run it, i was able to run it on GPU [large] · 4x Nvidia Tesla T4 which is $ 4.50 per h, small and medium size machines were not able to run it
https://huggingface.co/poiccard/jais-13b-chat-adn
Hi @poiccard ,
Thank u so much, I will check it. may I ask u, I tried to run it on my machine, it ran but every time I run the sample code it downloads again?
if you could support with the steps to run the model and use it, will be helpful.
thanks

@poiccard this is what i get every time I run the sample code and it starts downloading all over again, I don't think this is how it should go, correct?
Hi,
how did you clone it, make you have actually downloaded the bin files not just referencegit lfs install git clone https://huggingface.co/inception-mbzuai/jais-13b-chat
this model is big and is divided into pieces (shards) - what it tries to do next, is to load those shards into memory (so it is not downloading, but loading)
you can check more here
https://huggingface.co/docs/accelerate/v0.19.0/en/usage_guides/big_modeling
i was not able to launch this model on my machine, but i got in contact with model creators, and inshallah we will be working on improvements
in meantime as i mentioned previously, you can deploy my version of the model on huggingface inference endpoint (4.5 usd per hour - you can put it to sleep when you don't need it)