Instructions to use HuggingFaceH4/zephyr-7b-beta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceH4/zephyr-7b-beta with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HuggingFaceH4/zephyr-7b-beta with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceH4/zephyr-7b-beta"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceH4/zephyr-7b-beta",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/HuggingFaceH4/zephyr-7b-beta

SGLang

How to use HuggingFaceH4/zephyr-7b-beta with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceH4/zephyr-7b-beta" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceH4/zephyr-7b-beta",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceH4/zephyr-7b-beta" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceH4/zephyr-7b-beta",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use HuggingFaceH4/zephyr-7b-beta with Docker Model Runner:
```
docker model run hf.co/HuggingFaceH4/zephyr-7b-beta
```

Getting NULL response using curl or PHP

#67

by ajmerphull - opened Feb 17, 2025

Discussion

ajmerphull

Feb 17, 2025

i am going through some AI learning and decided to try a practical exercise, so i created a space with a chat template using this model.

the chat in my space works with a response to my message, and that's great, but when i try and use curl in terminal or from a php script from my localhost i am getting null.

i have followed the guide and configured the api request with the constructed data, (an array with the required data, data[message, system_message, max_token, temperature, top_p]) and i get an event id response, but when i try to get the response using the event id with the appropriate api url all i get is null.

i have tried with and without my hugging face token. i have tried with and without a session id. i have interrogated the logs, which strangely suggests 6 values in my array, so i added the app name as the 6th value, but still no change. The error is suggesting the 2nd parameter should be a State object, but there is no documentation on what this value should be. I also used curl from terminal with the event id to get a response but i still get null. Below are the errors i get when using curl or my php script.

HUGGINGFACE LOG:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 622, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2012, in process_api
    inputs = await self.preprocess_data(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1687, in preprocess_data
    self.validate_inputs(block_fn, inputs)
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1669, in validate_inputs
    raise ValueError(
ValueError: An event handler (respond) didn't receive enough input values (needed: 6, got: 5).
Check if the event handler calls a Javascript function, and make sure its return value is correct.
Wanted inputs:
    [
        <gradio.components.textbox.Textbox object at 0x7f26ac926d10>, 
        <gradio.components.state.State object at 0x7f26ac94f310>, 
        <gradio.components.textbox.Textbox object at 0x7f26e928ffa0>, 
        <gradio.components.slider.Slider object at 0x7f26ad6531c0>, 
        <gradio.components.slider.Slider object at 0x7f26e928fee0>, 
        <gradio.components.slider.Slider object at 0x7f26e928ff70>
    ]
Received inputs:
    ["hello, its 20:29:10", "You are an AI assistant. Keep responses short and specific.", 1, 0.1, 0.1]

PHP ERROR LOG:

Data: {"data":["hello, its 20:29:10","You are an AI assistant. Keep responses short and specific.",1,0.1,0.1]},  referer: http://localhost
Raw Response (Initial): {"event_id":"668db32d6e9e4d6e90af35848a873441"}\n, referer: http://localhost
Raw Response (Poll HTTP Code): 200, referer: http://localhost
Raw Response (Poll Body): event: error\ndata: null\n\n, referer: http://localhost

TERMINAL OUTPUT:

event: error
data: null

i would greatly appreciate if anyone can help suggest where i can focus to isolate the cause.

thanks

ajmerphull changed discussion title from Help with cURL in PHP to Not getting a response using curl or PHP Feb 17, 2025

ajmerphull changed discussion title from Not getting a response using curl or PHP to Getting NULL response using curl or PHP Feb 17, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment