Instructions to use ai-agi/neural-zephyr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ai-agi/neural-zephyr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ai-agi/neural-zephyr") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ai-agi/neural-zephyr") model = AutoModelForCausalLM.from_pretrained("ai-agi/neural-zephyr") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ai-agi/neural-zephyr with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ai-agi/neural-zephyr" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai-agi/neural-zephyr", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ai-agi/neural-zephyr
- SGLang
How to use ai-agi/neural-zephyr with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ai-agi/neural-zephyr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai-agi/neural-zephyr", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ai-agi/neural-zephyr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai-agi/neural-zephyr", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ai-agi/neural-zephyr with Docker Model Runner:
docker model run hf.co/ai-agi/neural-zephyr
Model Card for Neural-Zephyr Mistral 14B
Intel and Hugging Face developed two of the most prominent Mistral-type models released: Neural-Chat and Zephyr.
Neural-Zephyr is a hybrid Transfer Learning version joining Neural-Chat weights and Zephyr Mistral type models. The weights are aggregated in the same layers, summing up 14B parameters.
Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so. You can find more details in the technical report.
Model description
- Model type: A 14B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
- Language(s) (NLP): Primarily English
- License: MIT
- Finetuned from model: mistralai/Mistral-7B-v0.1
Use in Transformers
Load model directly
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, MistralForCausalLM
from huggingface_hub import hf_hub_download
model = MistralForCausalLM.from_pretrained("ai-agi/neural-zephyr", use_cache=False, torch_dtype=torch.bfloat16, device_map="auto")
model_weights = hf_hub_download(repo_id="ai-agi/neural-zephyr", filename="model_weights.pth")
state_dict = torch.load(model_weights)
model.load_state_dict(state_dict)
tokenizer = AutoTokenizer.from_pretrained("ai-agi/neural-zephyr", use_fast=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
Manage your GPU/CPU memory for model and weights
- Downloads last month
- 4
