Zen Embedding 8b Gguf

GGUF quantized 8B Zen Embedding model for high-quality semantic search on CPU.

Overview

GGUF quantization for efficient CPU and mixed CPU/GPU inference using llama.cpp and compatible runtimes.

Developed by Hanzo AI and the Zoo Labs Foundation.

Quick Start

# Download and run with llama.cpp
./llama-cli -m zen-embedding-8B.Q4_K_M.gguf -p "Hello, how can I help you?" -n 512
# With llama-cpp-python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="zenlm/zen-embedding-8B-GGUF",
    filename="*Q4_K_M.gguf",
)
output = llm("Hello!", max_tokens=512)
print(output["choices"][0]["text"])

Model Details

Attribute Value
Parameters 8B
Format GGUF (quantized)
Context 8K tokens
License Apache 2.0

License

Apache 2.0

Downloads last month
265
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support