Zen Embedding 8b Gguf
GGUF quantized 8B Zen Embedding model for high-quality semantic search on CPU.
Overview
GGUF quantization for efficient CPU and mixed CPU/GPU inference using llama.cpp and compatible runtimes.
Developed by Hanzo AI and the Zoo Labs Foundation.
Quick Start
# Download and run with llama.cpp
./llama-cli -m zen-embedding-8B.Q4_K_M.gguf -p "Hello, how can I help you?" -n 512
# With llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="zenlm/zen-embedding-8B-GGUF",
filename="*Q4_K_M.gguf",
)
output = llm("Hello!", max_tokens=512)
print(output["choices"][0]["text"])
Model Details
| Attribute | Value |
|---|---|
| Parameters | 8B |
| Format | GGUF (quantized) |
| Context | 8K tokens |
| License | Apache 2.0 |
License
Apache 2.0
- Downloads last month
- 265
Hardware compatibility
Log In to add your hardware
4-bit