Instructions to use Zyphra/Zamba2-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Zyphra/Zamba2-7B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Zyphra/Zamba2-7B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-7B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-7B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Zyphra/Zamba2-7B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Zyphra/Zamba2-7B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zyphra/Zamba2-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Zyphra/Zamba2-7B-Instruct

SGLang

How to use Zyphra/Zamba2-7B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Zyphra/Zamba2-7B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zyphra/Zamba2-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Zyphra/Zamba2-7B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zyphra/Zamba2-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Zyphra/Zamba2-7B-Instruct with Docker Model Runner:
```
docker model run hf.co/Zyphra/Zamba2-7B-Instruct
```

Model type `zamba2` but Transformers does not recognize this architecture.

by hudsongouge - opened Nov 21, 2024

Discussion

hudsongouge

Nov 21, 2024

I used the example code provided here and the latest version of transformers and I am getting the following error:

ValueError: The checkpoint you are trying to load has model type `zamba2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

pglo

Nov 23, 2024

Could you try to first run pip uninstall transformers and then install the transformers_zamba2 folder?

hudsongouge

Nov 24, 2024

Could you try to first run pip uninstall transformers and then install the transformers_zamba2 folder?

That wouldn't install correctly.

          if bare_metal_version >= Version("11.8"):
             ^^^^^^^^^^^^^^^^^^
      NameError: name 'bare_metal_version' is not defined

pglo

Nov 25, 2024

Can you share the traceback of this error? I.e., what's the code path that leads to it? That should give some additional clue.

hudsongouge

Nov 25, 2024

Can you share the traceback of this error? I.e., what's the code path that leads to it? That should give some additional clue.

Collecting mamba-ssm==2.1.0 (from transformers==4.43.0.dev0)
  Using cached mamba_ssm-2.1.0.tar.gz (84 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [14 lines of output]
      /private/var/folders/mm/3p04lc0x4fsdqhnkk_3j68v80000gn/T/pip-install-jjl8_mo6/mamba-ssm_05bf2b6c5b3041ed97effd1b0dcf3a08/setup.py:119: UserWarning: mamba_ssm was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
        warnings.warn(
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/mm/3p04lc0x4fsdqhnkk_3j68v80000gn/T/pip-install-jjl8_mo6/mamba-ssm_05bf2b6c5b3041ed97effd1b0dcf3a08/setup.py", line 189, in <module>
          if bare_metal_version >= Version("11.8"):
             ^^^^^^^^^^^^^^^^^^
      NameError: name 'bare_metal_version' is not defined
      
      
      torch.__version__  = 2.5.1
      
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment