File size: 1,807 Bytes
f018507 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ---
license: apache-2.0
---
# ZAYA1-74B-Preview
ZAYA1-74B-Preview is a mixture of experts language model with 4B active and 74B total parameters. This is a reasoning-base checkpoint which has not been tuned for chat or undergone RL post-training. ZAYA1-74B-Preview was trained end to end on AMD.
Learn more on our [blog](https://www.zyphra.com/post/zaya1-74b-preview).
## Quickstart
### Prerequisites
We recommend installing the following libraries in a fresh python environment (tested with python 3.12).
To use ZAYA1-74B-preview, install `zaya1-pr` branch from our fork of `vllm` library (the command will trigger a full build of vLLM from source):
```bash
pip install "vllm @ git+https://github.com/Zyphra/vllm.git@zaya1-pr"
```
If you want to run in transformers, install `zaya1` branch from our fork of `transformers` library as well:
```bash
pip install "transformers @ git+https://github.com/Zyphra/transformers.git@zaya1"
```
### Deployment
To start vLLM server, run the following command:
```bash
vllm serve Zyphra/ZAYA1-74B-Preview --port 8010 \
--mamba-cache-dtype float32 --dtype bfloat16 \
--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser zaya_xml
```
For parallel deployment we recommend using DP with EP as TP for CCA is not supported in the branch above. If running on 8 GPUs, set extra flags `-dp 8 -ep` to run with DP=EP=8.
Once the server is up, you can query a model with `curl` like in the following example:
```bash
curl http://localhost:8010/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Zyphra/ZAYA1-74B-Preview",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello. How is it going?"}
]
}'
``` |