Text Generation
GGUF
llama.cpp
cohere2_moe
command-a
rocm
amd
mixture-of-experts
conversational
imatrix
Instructions to use SixVolts/command-a-plus-05-2026-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use SixVolts/command-a-plus-05-2026-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="SixVolts/command-a-plus-05-2026-GGUF", filename="command-a-plus-Q3_K_XL-00001-of-00003.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use SixVolts/command-a-plus-05-2026-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL # Run inference directly in the terminal: llama cli -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL # Run inference directly in the terminal: llama cli -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL # Run inference directly in the terminal: ./llama-cli -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL # Run inference directly in the terminal: ./build/bin/llama-cli -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Use Docker
docker model run hf.co/SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
- LM Studio
- Jan
- vLLM
How to use SixVolts/command-a-plus-05-2026-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SixVolts/command-a-plus-05-2026-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SixVolts/command-a-plus-05-2026-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
- Ollama
How to use SixVolts/command-a-plus-05-2026-GGUF with Ollama:
ollama run hf.co/SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
- Unsloth Studio
How to use SixVolts/command-a-plus-05-2026-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SixVolts/command-a-plus-05-2026-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SixVolts/command-a-plus-05-2026-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for SixVolts/command-a-plus-05-2026-GGUF to start chatting
- Pi
How to use SixVolts/command-a-plus-05-2026-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use SixVolts/command-a-plus-05-2026-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use SixVolts/command-a-plus-05-2026-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use SixVolts/command-a-plus-05-2026-GGUF with Docker Model Runner:
docker model run hf.co/SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
- Lemonade
How to use SixVolts/command-a-plus-05-2026-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull SixVolts/command-a-plus-05-2026-GGUF:Q4_K_XL
Run and chat with the model
lemonade run user.command-a-plus-05-2026-GGUF-Q4_K_XL
List all available models
lemonade list
| diff --git a/common/chat.cpp b/common/chat.cpp | |
| index 9639af9..971e6d6 100644 | |
| --- a/common/chat.cpp | |
| +++ b/common/chat.cpp | |
| static common_chat_params common_chat_params_init_deepseek_v3_2(const common_cha | |
| return data; | |
| } | |
| +// Cohere Command (Cohere2 / command-a) tool-call + reasoning format: | |
| +// <|START_THINKING|>...plan...<|END_THINKING|> | |
| +// <|START_ACTION|>[{"tool_call_id":"0","tool_name":"X","parameters":{...}}]<|END_ACTION|> | |
| +// <|START_RESPONSE|>...text...<|END_RESPONSE|> | |
| +// The action block is a JSON array of objects keyed by tool_name/parameters, with a | |
| +// generated integer tool_call_id, which maps directly onto standard_json_tools(). | |
| +static common_chat_params common_chat_params_init_cohere2(const common_chat_template & tmpl, | |
| + const autoparser::generation_params & inputs) { | |
| + common_chat_params data; | |
| + | |
| + data.prompt = common_chat_template_direct_apply_impl(tmpl, inputs); | |
| + data.generation_prompt = common_chat_template_generation_prompt_impl(tmpl, inputs); | |
| + data.format = COMMON_CHAT_FORMAT_PEG_NATIVE; | |
| + data.supports_thinking = true; | |
| + data.thinking_start_tag = "<|START_THINKING|>"; | |
| + data.thinking_end_tag = "<|END_THINKING|>"; | |
| + data.preserved_tokens = { | |
| + "<|START_THINKING|>", "<|END_THINKING|>", | |
| + "<|START_ACTION|>", "<|END_ACTION|>", | |
| + "<|START_RESPONSE|>", "<|END_RESPONSE|>", | |
| + }; | |
| + | |
| + const std::string THINK_START = "<|START_THINKING|>"; | |
| + const std::string THINK_END = "<|END_THINKING|>"; | |
| + const std::string ACTION_START = "<|START_ACTION|>"; | |
| + const std::string ACTION_END = "<|END_ACTION|>"; | |
| + const std::string RESP_START = "<|START_RESPONSE|>"; | |
| + const std::string RESP_END = "<|END_RESPONSE|>"; | |
| + | |
| + const bool has_tools = inputs.tools.is_array() && !inputs.tools.empty(); | |
| + const bool extract_reasoning = inputs.reasoning_format != COMMON_REASONING_FORMAT_NONE; | |
| + const bool include_grammar = has_tools && inputs.tool_choice != COMMON_CHAT_TOOL_CHOICE_NONE; | |
| + | |
| + // With tools the template leaves generation open after <|CHATBOT_TOKEN|>; without tools | |
| + // it primes <|START_RESPONSE|> directly. | |
| + const bool response_primed = | |
| + data.generation_prompt.size() >= RESP_START.size() && | |
| + data.generation_prompt.compare(data.generation_prompt.size() - RESP_START.size(), | |
| + RESP_START.size(), RESP_START) == 0; | |
| + | |
| + auto parser = build_chat_peg_parser([&](common_chat_peg_builder & p) { | |
| + auto gen = p.literal(data.generation_prompt); | |
| + auto end = p.end(); | |
| + | |
| + if (!has_tools || inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_NONE) { | |
| + if (response_primed) { | |
| + return gen + p.content(p.until(RESP_END)) + p.literal(RESP_END) + end; | |
| + } | |
| + return gen + p.choice({ | |
| + p.literal(RESP_START) + p.content(p.until(RESP_END)) + p.literal(RESP_END), | |
| + p.content(p.rest()), | |
| + }) + end; | |
| + } | |
| + | |
| + auto reasoning = p.eps(); | |
| + if (extract_reasoning && inputs.enable_thinking) { | |
| + reasoning = p.optional(p.literal(THINK_START) + p.reasoning(p.until(THINK_END)) + p.literal(THINK_END)); | |
| + } else if (extract_reasoning) { | |
| + reasoning = p.optional(p.literal(THINK_START) + p.until(THINK_END) + p.literal(THINK_END)); | |
| + } | |
| + | |
| + auto tools_parser = p.standard_json_tools( | |
| + ACTION_START, ACTION_END, inputs.tools, | |
| + /* parallel_tool_calls = */ true, | |
| + /* force_tool_calls = */ inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_REQUIRED, | |
| + /* name_key = */ "tool_name", | |
| + /* args_key = */ "parameters", | |
| + /* array_wrapped = */ true, | |
| + /* function_is_key = */ false, | |
| + /* call_id_key = */ "", | |
| + /* gen_call_id_key = */ "tool_call_id", | |
| + /* parameters_order = */ {}); | |
| + | |
| + if (inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_REQUIRED) { | |
| + return gen + reasoning + p.space() + tools_parser + end; | |
| + } | |
| + // Content-first, then optional tool action (avoids non-backtracking choice on the | |
| + // tool-section trigger). The response may be <|START_RESPONSE|>-wrapped or bare; | |
| + // strip a leading wrapper token and let content run up to any tool action. | |
| + auto content_before = p.optional(p.literal(RESP_START)) + p.content(p.until(ACTION_START)); | |
| + return gen + reasoning + p.space() + content_before + p.optional(tools_parser) + end; | |
| + }); | |
| + | |
| + data.parser = parser.save(); | |
| + | |
| + if (include_grammar) { | |
| + data.grammar_lazy = inputs.tool_choice == COMMON_CHAT_TOOL_CHOICE_AUTO; | |
| + data.grammar = build_grammar([&](const common_grammar_builder & builder) { | |
| + foreach_function(inputs.tools, [&](const json & tool) { | |
| + const auto & function = tool.at("function"); | |
| + auto schema = function.contains("parameters") ? function.at("parameters") : json::object(); | |
| + builder.resolve_refs(schema); | |
| + }); | |
| + parser.build_grammar(builder, data.grammar_lazy); | |
| + }); | |
| + data.grammar_triggers = { | |
| + { COMMON_GRAMMAR_TRIGGER_TYPE_WORD, ACTION_START }, | |
| + }; | |
| + } | |
| + | |
| + return data; | |
| +} | |
| + | |
| namespace workaround { | |
| static void map_developer_role_to_system(json & messages) { | |
| std::optional<common_chat_params> common_chat_try_specialized_template( | |
| return common_chat_params_init_deepseek_v3_2(tmpl, params); | |
| } | |
| + // Cohere Command (Cohere2 / command-a) format detection: thinking + action-array tool calls. | |
| + if (src.find("<|START_ACTION|>") != std::string::npos && | |
| + src.find("<|START_THINKING|>") != std::string::npos && | |
| + src.find("tool_name") != std::string::npos) { | |
| + return common_chat_params_init_cohere2(tmpl, params); | |
| + } | |
| + | |
| // Gemma4 format detection | |
| if (src.find("'<|tool_call>call:'") != std::string::npos) { | |
| if (src.find("{#- OpenAI Chat Completions:") == std::string::npos) { | |