Instructions to use ai-agi/neural-zephyr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ai-agi/neural-zephyr with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ai-agi/neural-zephyr")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ai-agi/neural-zephyr")
model = AutoModelForCausalLM.from_pretrained("ai-agi/neural-zephyr")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ai-agi/neural-zephyr with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ai-agi/neural-zephyr"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai-agi/neural-zephyr",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ai-agi/neural-zephyr

SGLang

How to use ai-agi/neural-zephyr with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ai-agi/neural-zephyr" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai-agi/neural-zephyr",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ai-agi/neural-zephyr" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai-agi/neural-zephyr",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ai-agi/neural-zephyr with Docker Model Runner:
```
docker model run hf.co/ai-agi/neural-zephyr
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Neural-Zephyr Mistral 14B

Intel and Hugging Face developed two of the most prominent Mistral-type models released: Neural-Chat and Zephyr.

Neural-Zephyr is a hybrid Transfer Learning version joining Neural-Chat weights and Zephyr Mistral type models. The weights are aggregated in the same layers, summing up 14B parameters.

Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so. You can find more details in the technical report.

Model description

Model type: A 14B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP): Primarily English
License: MIT
Finetuned from model: mistralai/Mistral-7B-v0.1

Use in Transformers

Load model directly
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, MistralForCausalLM
from huggingface_hub import hf_hub_download

model = MistralForCausalLM.from_pretrained("ai-agi/neural-zephyr", use_cache=False, torch_dtype=torch.bfloat16, device_map="auto")
model_weights = hf_hub_download(repo_id="ai-agi/neural-zephyr", filename="model_weights.pth")
state_dict = torch.load(model_weights)
model.load_state_dict(state_dict)

tokenizer = AutoTokenizer.from_pretrained("ai-agi/neural-zephyr", use_fast=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
Manage your GPU/CPU memory for model and weights

Downloads last month: 4

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for ai-agi/neural-zephyr

Quantizations

2 models

Papers for ai-agi/neural-zephyr

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 123

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 66