Instructions to use mgoin/zephyr-7b-alpha-ds with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mgoin/zephyr-7b-alpha-ds with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mgoin/zephyr-7b-alpha-ds") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mgoin/zephyr-7b-alpha-ds") model = AutoModelForCausalLM.from_pretrained("mgoin/zephyr-7b-alpha-ds") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mgoin/zephyr-7b-alpha-ds with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mgoin/zephyr-7b-alpha-ds" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mgoin/zephyr-7b-alpha-ds", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mgoin/zephyr-7b-alpha-ds
- SGLang
How to use mgoin/zephyr-7b-alpha-ds with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mgoin/zephyr-7b-alpha-ds" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mgoin/zephyr-7b-alpha-ds", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mgoin/zephyr-7b-alpha-ds" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mgoin/zephyr-7b-alpha-ds", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mgoin/zephyr-7b-alpha-ds with Docker Model Runner:
docker model run hf.co/mgoin/zephyr-7b-alpha-ds
zephyr-7b-alpha for DeepSparse
Usage
pip install deepsparse-nightly[llm]
from deepsparse import TextGeneration
model = TextGeneration(model="hf:mgoin/zephyr-7b-alpha-ds")
out = model("Once upon a time,", max_new_tokens=100)
print(out.generations[0].text)
### there was a young woman named Lily. She was a kind and gentle soul, with a heart full of love and compassion. Lily had always been fascinated by the natural world, and she spent most of her free time exploring the forests and fields around her home.\n\nOne day, as she was wandering through the woods, Lily stumbled upon a small clearing. In the center of the clearing, she saw a beautiful butterfly fluttering its wings. The butterfly was unlike any she had
How to export from zephyr-7b-alpha
Install SparseML with this PR
git clone https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
sparseml.transformers.export_onnx --model_path ./zephyr-7b-alpha --task text-generation --sequence_length 512 --trust_remote_code
cp deployment/model.onnx deployment/model-orig.onnx
python ~/onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
- Downloads last month
- 6