Instructions to use icedmoca/kcode-oss-20b-mxfp4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use icedmoca/kcode-oss-20b-mxfp4 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="icedmoca/kcode-oss-20b-mxfp4",
	filename="kcode-oss-20b-mxfp4.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use icedmoca/kcode-oss-20b-mxfp4 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf icedmoca/kcode-oss-20b-mxfp4
# Run inference directly in the terminal:
llama-cli -hf icedmoca/kcode-oss-20b-mxfp4

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf icedmoca/kcode-oss-20b-mxfp4
# Run inference directly in the terminal:
llama-cli -hf icedmoca/kcode-oss-20b-mxfp4

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf icedmoca/kcode-oss-20b-mxfp4
# Run inference directly in the terminal:
./llama-cli -hf icedmoca/kcode-oss-20b-mxfp4

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf icedmoca/kcode-oss-20b-mxfp4
# Run inference directly in the terminal:
./build/bin/llama-cli -hf icedmoca/kcode-oss-20b-mxfp4

Use Docker

docker model run hf.co/icedmoca/kcode-oss-20b-mxfp4

LM Studio
Jan
Ollama
How to use icedmoca/kcode-oss-20b-mxfp4 with Ollama:
```
ollama run hf.co/icedmoca/kcode-oss-20b-mxfp4
```

Unsloth Studio new

How to use icedmoca/kcode-oss-20b-mxfp4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for icedmoca/kcode-oss-20b-mxfp4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for icedmoca/kcode-oss-20b-mxfp4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for icedmoca/kcode-oss-20b-mxfp4 to start chatting

Pi new

How to use icedmoca/kcode-oss-20b-mxfp4 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf icedmoca/kcode-oss-20b-mxfp4

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "icedmoca/kcode-oss-20b-mxfp4"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use icedmoca/kcode-oss-20b-mxfp4 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf icedmoca/kcode-oss-20b-mxfp4

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default icedmoca/kcode-oss-20b-mxfp4

Run Hermes

hermes

Docker Model Runner
How to use icedmoca/kcode-oss-20b-mxfp4 with Docker Model Runner:
```
docker model run hf.co/icedmoca/kcode-oss-20b-mxfp4
```

Lemonade

How to use icedmoca/kcode-oss-20b-mxfp4 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull icedmoca/kcode-oss-20b-mxfp4

Run and chat with the model

lemonade run user.kcode-oss-20b-mxfp4-{{QUANT_TAG}}

List all available models

lemonade list

kcode-oss-20b-mxfp4 / README.md

icedmoca

Create README.md

fdd64bb verified 23 days ago

preview code

raw

history blame contribute delete

1.94 kB

	---
	license: mit
	language:
	- en
	base_model:
	- openai/gpt-oss-20b
	tags:
	- gguf
	- code
	- coding-agent
	- conversational
	- terminal-agent
	- tool-use
	- function-calling
	- long-context
	- context-memory
	- agentic
	- rust
	- llama-cpp
	- kcode
	---

	# kcode-oss-20b-mxfp4

	> kcode-oss-20b-mxfp4 is a GGUF MXFP4 coding-agent model built on top of GPT-OSS 20B and optimized for terminal-native software engineering workflows, structured tool use, retrieval-grounded reasoning, and long-session coding tasks.

	### The model is designed primarily for:

	repository navigation
	code editing and patch generation
	shell-oriented workflows
	structured tool calling
	retrieval-backed context restoration
	long-running agent sessions
	Architecture

	## Base architecture:

	GPT-OSS 20B
	Mixture-of-Experts (MoE)
	MXFP4 quantization
	131k context length
	GGUF runtime format

	## Model metadata:

	24 transformer blocks
	32 experts
	4 active experts per token
	GPT-4o tokenizer format
	YaRN rope scaling
	Intended Usage

	## This model is intended to be paired with the Kcode runtime and orchestration layer:

	exact-context replay
	context vault references
	dynamic tool schema expansion
	persistent memory systems
	multi-tool agent execution

	### It performs best in iterative:

	edit → test → repair

	coding workflows.

	Prompting

	Example system prompt:

	You are Kcode, a terminal-native coding agent.

	Repository state:
	<ctx ref="build_logs_14" />

	## Task:
	Fix the websocket reconnect logic without breaking auth refresh behavior.
	Runtime Compatibility

	## Optimized for:

	llama.cpp
	Ollama
	OpenAI-compatible local servers
	terminal coding agents
	structured tool runtimes
	Notes

	## kcode-oss-20b-mxfp4 is optimized more heavily for:

	coding workflows
	orchestration stability
	structured reasoning
	retrieval-grounded operation
	long-session memory behavior

	than for:

	roleplay
	creative writing
	unrestricted conversational chat
	Runtime

	# GitHub:
	https://github.com/icedmoca/kcode