Instructions to use HuggingFaceH4/zephyr-7b-beta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceH4/zephyr-7b-beta with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta") model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceH4/zephyr-7b-beta with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceH4/zephyr-7b-beta" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/zephyr-7b-beta", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/HuggingFaceH4/zephyr-7b-beta
- SGLang
How to use HuggingFaceH4/zephyr-7b-beta with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceH4/zephyr-7b-beta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/zephyr-7b-beta", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceH4/zephyr-7b-beta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/zephyr-7b-beta", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use HuggingFaceH4/zephyr-7b-beta with Docker Model Runner:
docker model run hf.co/HuggingFaceH4/zephyr-7b-beta
Getting NULL response using curl or PHP
i am going through some AI learning and decided to try a practical exercise, so i created a space with a chat template using this model.
the chat in my space works with a response to my message, and that's great, but when i try and use curl in terminal or from a php script from my localhost i am getting null.
i have followed the guide and configured the api request with the constructed data, (an array with the required data, data[message, system_message, max_token, temperature, top_p]) and i get an event id response, but when i try to get the response using the event id with the appropriate api url all i get is null.
i have tried with and without my hugging face token. i have tried with and without a session id. i have interrogated the logs, which strangely suggests 6 values in my array, so i added the app name as the 6th value, but still no change. The error is suggesting the 2nd parameter should be a State object, but there is no documentation on what this value should be. I also used curl from terminal with the event id to get a response but i still get null. Below are the errors i get when using curl or my php script.
HUGGINGFACE LOG:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 622, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2012, in process_api
inputs = await self.preprocess_data(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1687, in preprocess_data
self.validate_inputs(block_fn, inputs)
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1669, in validate_inputs
raise ValueError(
ValueError: An event handler (respond) didn't receive enough input values (needed: 6, got: 5).
Check if the event handler calls a Javascript function, and make sure its return value is correct.
Wanted inputs:
[
<gradio.components.textbox.Textbox object at 0x7f26ac926d10>,
<gradio.components.state.State object at 0x7f26ac94f310>,
<gradio.components.textbox.Textbox object at 0x7f26e928ffa0>,
<gradio.components.slider.Slider object at 0x7f26ad6531c0>,
<gradio.components.slider.Slider object at 0x7f26e928fee0>,
<gradio.components.slider.Slider object at 0x7f26e928ff70>
]
Received inputs:
["hello, its 20:29:10", "You are an AI assistant. Keep responses short and specific.", 1, 0.1, 0.1]
PHP ERROR LOG:
Data: {"data":["hello, its 20:29:10","You are an AI assistant. Keep responses short and specific.",1,0.1,0.1]}, referer: http://localhost
Raw Response (Initial): {"event_id":"668db32d6e9e4d6e90af35848a873441"}\n, referer: http://localhost
Raw Response (Poll HTTP Code): 200, referer: http://localhost
Raw Response (Poll Body): event: error\ndata: null\n\n, referer: http://localhost
TERMINAL OUTPUT:
event: error data: null
i would greatly appreciate if anyone can help suggest where i can focus to isolate the cause.
thanks