How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 754geg/kpok:IQ1_S
# Run inference directly in the terminal:
llama-cli -hf 754geg/kpok:IQ1_S
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 754geg/kpok:IQ1_S
# Run inference directly in the terminal:
llama-cli -hf 754geg/kpok:IQ1_S
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf 754geg/kpok:IQ1_S
# Run inference directly in the terminal:
./llama-cli -hf 754geg/kpok:IQ1_S
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf 754geg/kpok:IQ1_S
# Run inference directly in the terminal:
./build/bin/llama-cli -hf 754geg/kpok:IQ1_S
Use Docker
docker model run hf.co/754geg/kpok:IQ1_S
Quick Links

MaziyarPanahi/WizardLM-2-8x22B-GGUF

Description

MaziyarPanahi/WizardLM-2-8x22B-GGUF contains GGUF format model files for microsoft/WizardLM-2-8x22B.

How to download

You can download only the quants you need instead of cloning the entire repository as follows:

huggingface-cli download MaziyarPanahi/WizardLM-2-8x22B-GGUF --local-dir . --include '*Q2_K*gguf'

On Windows:

huggingface-cli download MaziyarPanahi/WizardLM-2-8x22B-GGUF --local-dir . --include *Q4_K_S*gguf

Load sharded model

llama_load_model_from_file will detect the number of files and will load additional tensors from the rest of files.

llama.cpp/main -m WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e

Prompt template

{system_prompt}
USER: {prompt}
ASSISTANT: </s>

or

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, 
detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>
USER: {prompt} ASSISTANT: </s>......
Downloads last month
7
GGUF
Model size
141B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

1-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for 754geg/kpok

Quantized
(17)
this model