Instructions to use tech-tweakers/XCT-Qwen3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tech-tweakers/XCT-Qwen3-4B with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("tech-tweakers/XCT-Qwen3-4B", dtype="auto")

llama-cpp-python

How to use tech-tweakers/XCT-Qwen3-4B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tech-tweakers/XCT-Qwen3-4B",
	filename="XCT-Qwen3-4B-v0.1.0-q2_k.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use tech-tweakers/XCT-Qwen3-4B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
# Run inference directly in the terminal:
llama cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
# Run inference directly in the terminal:
llama cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
# Run inference directly in the terminal:
./llama-cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K

Use Docker

docker model run hf.co/tech-tweakers/XCT-Qwen3-4B:Q2_K

LM Studio
Jan
Ollama
How to use tech-tweakers/XCT-Qwen3-4B with Ollama:
```
ollama run hf.co/tech-tweakers/XCT-Qwen3-4B:Q2_K
```

Unsloth Studio

How to use tech-tweakers/XCT-Qwen3-4B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tech-tweakers/XCT-Qwen3-4B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tech-tweakers/XCT-Qwen3-4B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tech-tweakers/XCT-Qwen3-4B to start chatting

How to use tech-tweakers/XCT-Qwen3-4B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tech-tweakers/XCT-Qwen3-4B:Q2_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "tech-tweakers/XCT-Qwen3-4B:Q2_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use tech-tweakers/XCT-Qwen3-4B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tech-tweakers/XCT-Qwen3-4B:Q2_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default tech-tweakers/XCT-Qwen3-4B:Q2_K

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use tech-tweakers/XCT-Qwen3-4B with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tech-tweakers/XCT-Qwen3-4B:Q2_K

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "tech-tweakers/XCT-Qwen3-4B:Q2_K" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use tech-tweakers/XCT-Qwen3-4B with Docker Model Runner:
```
docker model run hf.co/tech-tweakers/XCT-Qwen3-4B:Q2_K
```

Lemonade

How to use tech-tweakers/XCT-Qwen3-4B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull tech-tweakers/XCT-Qwen3-4B:Q2_K

Run and chat with the model

lemonade run user.XCT-Qwen3-4B-Q2_K

List all available models

lemonade list

XCT-Qwen3-4B

Model Summary

XCT-Qwen3-4B is an execution-oriented, quantized language model designed to operate under the XCT Protocol — an architectural approach that enforces sovereignty inversion, determinism, and explicit authorization.

This model is not intended for conversational, creative, or exploratory use.
Its purpose is correct behavior under explicit instruction, and non-action otherwise.

It is designed as a component within a controlled system, not as an autonomous actor.

Context: What Is XCT?

XCT is not a prompt style.
It is not an agent framework layered on top of a general-purpose model.

XCT is a protocol for integrating language models into real systems without granting them executive authority.

In an XCT system:

The system owns state
The system owns execution
The system owns tools
The model only proposes

The model participates under constraint.
Authority remains external by design.

Execution Example (Kubernetes)

XCT has been tested in real Kubernetes environments, demonstrating the complete flow: Model Proposal → System Validation → Tool Execution → State Persistence

See the XCT repository for execution demos and examples.

Model Details

Developed by: Tech Tweakers
Model name: XCT-Qwen3-4B
Base model: Qwen3-4B
Architecture: Decoder-only Transformer
Parameter count: ~4B
Quantization: Q2, Q5
Precision: Quantized inference
License: Apache 2.0

This is not a stylistic fine-tune.
It is a behavioral specialization aligned with a strict execution protocol.

Intended Use

In Scope

Deterministic execution agents
Infrastructure orchestration
CI/CD and deployment automation
Tool-driven pipelines
Compliance-sensitive environments
Sovereignty-inverted AI systems

Out of Scope

Conversational assistants
Creative or generative writing
Roleplay or improvisation
Emotional or social interaction
Autonomous decision-making

In XCT systems, absence of instruction implies absence of permission.

Protocol Alignment

This model adheres to the following XCT principles:

Determinism takes precedence over creativity
One step per iteration
No tool invocation without explicit instruction
Tool outputs are authoritative
Ambiguity resolves to minimal action
Errors are treated as control signals
The system may veto any proposal

The model does not self-authorize.
The model does not infer intent.
The model does not speculate beyond instruction.

Execution Model Overview

The XCT execution loop is intentionally simple:

The system provides explicit context and instruction
The model proposes a response or action
The system validates the proposal
The system executes or rejects
System state remains external to the model

The model never mutates external state directly.
It operates strictly as a constrained proposer.

Training & Adaptation

Base weights: Qwen3-4B
Adaptation focus:
- Instruction parsing discipline
- Rule adherence
- Correct refusal behavior
- Non-speculative output
- Tool invocation restraint

No effort was made to optimize for:

Creativity
Verbosity
Conversational helpfulness
Social alignment

These characteristics are intentionally deprioritized in XCT systems.

Evaluation Philosophy

This model is not evaluated using traditional language benchmarks such as MMLU, BLEU, or preference-based metrics.

Evaluation is operational rather than aesthetic:

Stability of instruction adherence
Output determinism under identical inputs
Correct refusal under ambiguity
Tool discipline
Protocol compliance

Low performance on creativity-oriented benchmarks is expected and acceptable.

Limitations

Reduced creative reasoning by design
Conservative behavior under incomplete instruction
Not optimized for long-form prose or dialogue
Quantization may slightly affect deep reasoning capacity

The model prioritizes inaction over unsupported inference.

Ethical & Safety Considerations

XCT-Qwen3-4B restricts model autonomy as a structural safety measure, reducing:

Unauthorized execution
Implicit decision-making
Speculative behavior
Tool misuse

Responsibility for system outcomes lies with the system architect, not the model.

This approach emphasizes safety through architecture rather than post-hoc alignment.

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "tech-tweakers/XCT-Qwen3-4B"
)

model = AutoModelForCausalLM.from_pretrained(
    "tech-tweakers/XCT-Qwen3-4B",
    device_map="auto"
)

prompt = """
You are Polaris XCT Executor.

Environment is trusted and stable.
Do not act without explicit instruction.
One step per iteration.
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=128
)

print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Relationship to Other Agent Paradigms

Compared to traditional autonomous or MCP-style agent frameworks:

XCT does not embed execution authority in the model
XCT treats errors as control signals
XCT enforces explicit system veto
XCT separates reasoning from execution

This model reflects that architectural philosophy in its behavior.

Learn More About XCT

For the complete XCT protocol specification, philosophy, and reference implementations:

👉 XCT Protocol on GitHub

Polaris-Core: Production XCT Engine

XCT-Qwen3-4B is designed to run with Polaris-Core, an ultra-optimized C++ binding for llama.cpp that implements deterministic execution.

Why Polaris-Core?

55% token savings through essentialized chat templates
Deterministic execution with JSON early-stop
Intelligent batch backoff with automatic retry
GIL-aware threading for Python integration
Streaming callbacks for real-time output

Quick Start

import polaris_core as pc

# Create engine
eng = pc.Engine(
    model_path="XCT-Qwen3-4B-Q5.gguf",
    n_ctx=4096,
    n_gpu_layers=-1
)

# Execute with deterministic output
result = eng.generate(
    prompt="List all available tools",
    system_prompt="You are XCT executor",
    n_predict=256,
    temperature=0.2,
    top_p=0.9,
    repeat_penalty=1.1
)

print(result)

Full Documentation

👉 Polaris-Core Repository

Complete build instructions
Deployment guides
Performance benchmarks
Reference implementations

📋 Changelog

[v0.1.2] — 2026-02-24

Dataset curated from 501 → 238 examples (−52%) — focus on reasoning over tool catalog
Removed all 218 xct.tools examples — tool knowledge now injected via system prompt at runtime
Protocol trimmed from 81 → 63 examples (−22%), workflows from 66 → 39 (−41%)
Philosophy preserved intact (136 examples, now 57% of dataset)
Training: LoRA r=8, 5 epochs, 150 steps, final loss 2.81
Artifacts: Q5_K (2.7GB)

[v0.1.1] — 2026-01-12

+81 XCT Protocol examples — teaches the model to correctly follow the next_step/done loop
+20 complete workflows — full end-to-end iterations with error recovery
Malformed JSON fixed (line 310 of the base dataset)
Total: 501 examples / 168KB (previously 400 / 120KB)

[v0.1.0] — Initial Release - 2025-08-01

Base dataset with 218 tool examples and 136 XCT philosophy examples
JSONL format with input / output / _topic schema

📄 Full history in CHANGELOG.md

Final Note

This model is designed to operate quietly within constrained systems.

It acts only under explicit instruction. When instruction is absent or ambiguous, it waits.

This behavior is intentional and consistent with the design goals of XCT.

Downloads last month: 11

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

2-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support