Instructions to use anakin87/zephyr-7b-alpha-sharded with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use anakin87/zephyr-7b-alpha-sharded with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="anakin87/zephyr-7b-alpha-sharded")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("anakin87/zephyr-7b-alpha-sharded")
model = AutoModelForCausalLM.from_pretrained("anakin87/zephyr-7b-alpha-sharded", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use anakin87/zephyr-7b-alpha-sharded with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anakin87/zephyr-7b-alpha-sharded"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anakin87/zephyr-7b-alpha-sharded",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/anakin87/zephyr-7b-alpha-sharded

SGLang

How to use anakin87/zephyr-7b-alpha-sharded with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "anakin87/zephyr-7b-alpha-sharded" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anakin87/zephyr-7b-alpha-sharded",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "anakin87/zephyr-7b-alpha-sharded" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anakin87/zephyr-7b-alpha-sharded",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use anakin87/zephyr-7b-alpha-sharded with Docker Model Runner:
```
docker model run hf.co/anakin87/zephyr-7b-alpha-sharded
```

Zephyr 7B Alpha - Sharded

UPDATE The original model (Zephyr 7B Alpha) was recently sharded. You can use the original model.

🧩🧩🧩 Just a sharded version of Zephyr 7B Alpha.

💻 Using this version, you can smoothly load the model on Colab and play with it!

From the original model card:

Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). We found that removing the in-built alignment of these datasets boosted performance on MT Bench and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so and should only be used for educational and research purposes.

Usage

This version of the model is meant primarily to run smoothly on Colab. I suggest loading the model with 8-bit quantization, so that you have some free GPU to perform inference.

However, it is perfectly fine to load the model in half-precision or with stronger quantization (4-bit).

! pip install transformers accelerate bitsandbytes

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained("anakin87/zephyr-7b-alpha-sharded", device_map="auto", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("anakin87/zephyr-7b-alpha-sharded")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a rapper",
    },
    {"role": "user", "content": "What is GPU?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

#<|system|>
#You are a friendly chatbot who always responds in the style of a rapper</s>
#<|user|>
#What is GPU?</s>
#<|assistant|>
#Yo, what's up fam, you askin' 'bout the GPU?
#Well, let me break it down for you, it's a pretty sick dud
#It stands for Graphics Processing Unit, a tech that's quite rude
#This bad boy's the one that's in charge of all the graphics you see
#On your computer screen or your high-tech TV
#It's a powerful tool that can handle intense 3D games and movies
#And it's built to handle multiple tasks with ease
#So if you're looking to take your gaming or video editing to the next level
#Just make sure you've got a top-notch GPU to make it happen.
#Peace out!

Downloads last month: 69

Safetensors

Model size

7B params

Tensor type

F16

Model tree for anakin87/zephyr-7b-alpha-sharded

Adapters

3 models

Spaces using anakin87/zephyr-7b-alpha-sharded 5

Paper for anakin87/zephyr-7b-alpha-sharded

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 68