Instructions to use tech-tweakers/XCT-Qwen3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tech-tweakers/XCT-Qwen3-4B with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("tech-tweakers/XCT-Qwen3-4B", dtype="auto") - llama-cpp-python
How to use tech-tweakers/XCT-Qwen3-4B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="tech-tweakers/XCT-Qwen3-4B", filename="XCT-Qwen3-4B-v0.1.0-q2_k.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use tech-tweakers/XCT-Qwen3-4B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K # Run inference directly in the terminal: llama-cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K # Run inference directly in the terminal: llama-cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K # Run inference directly in the terminal: ./llama-cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
Use Docker
docker model run hf.co/tech-tweakers/XCT-Qwen3-4B:Q2_K
- LM Studio
- Jan
- Ollama
How to use tech-tweakers/XCT-Qwen3-4B with Ollama:
ollama run hf.co/tech-tweakers/XCT-Qwen3-4B:Q2_K
- Unsloth Studio new
How to use tech-tweakers/XCT-Qwen3-4B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tech-tweakers/XCT-Qwen3-4B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tech-tweakers/XCT-Qwen3-4B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tech-tweakers/XCT-Qwen3-4B to start chatting
- Pi new
How to use tech-tweakers/XCT-Qwen3-4B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "tech-tweakers/XCT-Qwen3-4B:Q2_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use tech-tweakers/XCT-Qwen3-4B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf tech-tweakers/XCT-Qwen3-4B:Q2_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default tech-tweakers/XCT-Qwen3-4B:Q2_K
Run Hermes
hermes
- Docker Model Runner
How to use tech-tweakers/XCT-Qwen3-4B with Docker Model Runner:
docker model run hf.co/tech-tweakers/XCT-Qwen3-4B:Q2_K
- Lemonade
How to use tech-tweakers/XCT-Qwen3-4B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull tech-tweakers/XCT-Qwen3-4B:Q2_K
Run and chat with the model
lemonade run user.XCT-Qwen3-4B-Q2_K
List all available models
lemonade list
- XCT-Qwen3-4B
- Model Summary
- Context: What Is XCT?
- Execution Example (Kubernetes)
- Model Details
- Intended Use
- Protocol Alignment
- Execution Model Overview
- Training & Adaptation
- Evaluation Philosophy
- Limitations
- Ethical & Safety Considerations
- Usage Example
- Relationship to Other Agent Paradigms
- Learn More About XCT
- Polaris-Core: Production XCT Engine
- π Changelog
- Final Note
- Model Summary
XCT-Qwen3-4B
Model Summary
XCT-Qwen3-4B is an execution-oriented, quantized language model designed to operate under the XCT Protocol β an architectural approach that enforces sovereignty inversion, determinism, and explicit authorization.
This model is not intended for conversational, creative, or exploratory use.
Its purpose is correct behavior under explicit instruction, and non-action otherwise.
It is designed as a component within a controlled system, not as an autonomous actor.
Context: What Is XCT?
XCT is not a prompt style.
It is not an agent framework layered on top of a general-purpose model.
XCT is a protocol for integrating language models into real systems without granting them executive authority.
In an XCT system:
- The system owns state
- The system owns execution
- The system owns tools
- The model only proposes
The model participates under constraint.
Authority remains external by design.
Execution Example (Kubernetes)
XCT has been tested in real Kubernetes environments, demonstrating the complete flow: Model Proposal β System Validation β Tool Execution β State Persistence
See the XCT repository for execution demos and examples.
Model Details
- Developed by: Tech Tweakers
- Model name: XCT-Qwen3-4B
- Base model: Qwen3-4B
- Architecture: Decoder-only Transformer
- Parameter count: ~4B
- Quantization: Q2, Q5
- Precision: Quantized inference
- License: Apache 2.0
This is not a stylistic fine-tune.
It is a behavioral specialization aligned with a strict execution protocol.
Intended Use
In Scope
- Deterministic execution agents
- Infrastructure orchestration
- CI/CD and deployment automation
- Tool-driven pipelines
- Compliance-sensitive environments
- Sovereignty-inverted AI systems
Out of Scope
- Conversational assistants
- Creative or generative writing
- Roleplay or improvisation
- Emotional or social interaction
- Autonomous decision-making
In XCT systems, absence of instruction implies absence of permission.
Protocol Alignment
This model adheres to the following XCT principles:
- Determinism takes precedence over creativity
- One step per iteration
- No tool invocation without explicit instruction
- Tool outputs are authoritative
- Ambiguity resolves to minimal action
- Errors are treated as control signals
- The system may veto any proposal
The model does not self-authorize.
The model does not infer intent.
The model does not speculate beyond instruction.
Execution Model Overview
The XCT execution loop is intentionally simple:
- The system provides explicit context and instruction
- The model proposes a response or action
- The system validates the proposal
- The system executes or rejects
- System state remains external to the model
The model never mutates external state directly.
It operates strictly as a constrained proposer.
Training & Adaptation
- Base weights: Qwen3-4B
- Adaptation focus:
- Instruction parsing discipline
- Rule adherence
- Correct refusal behavior
- Non-speculative output
- Tool invocation restraint
No effort was made to optimize for:
- Creativity
- Verbosity
- Conversational helpfulness
- Social alignment
These characteristics are intentionally deprioritized in XCT systems.
Evaluation Philosophy
This model is not evaluated using traditional language benchmarks such as MMLU, BLEU, or preference-based metrics.
Evaluation is operational rather than aesthetic:
- Stability of instruction adherence
- Output determinism under identical inputs
- Correct refusal under ambiguity
- Tool discipline
- Protocol compliance
Low performance on creativity-oriented benchmarks is expected and acceptable.
Limitations
- Reduced creative reasoning by design
- Conservative behavior under incomplete instruction
- Not optimized for long-form prose or dialogue
- Quantization may slightly affect deep reasoning capacity
The model prioritizes inaction over unsupported inference.
Ethical & Safety Considerations
XCT-Qwen3-4B restricts model autonomy as a structural safety measure, reducing:
- Unauthorized execution
- Implicit decision-making
- Speculative behavior
- Tool misuse
Responsibility for system outcomes lies with the system architect, not the model.
This approach emphasizes safety through architecture rather than post-hoc alignment.
Usage Example
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"tech-tweakers/XCT-Qwen3-4B"
)
model = AutoModelForCausalLM.from_pretrained(
"tech-tweakers/XCT-Qwen3-4B",
device_map="auto"
)
prompt = """
You are Polaris XCT Executor.
Environment is trusted and stable.
Do not act without explicit instruction.
One step per iteration.
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=128
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Relationship to Other Agent Paradigms
Compared to traditional autonomous or MCP-style agent frameworks:
- XCT does not embed execution authority in the model
- XCT treats errors as control signals
- XCT enforces explicit system veto
- XCT separates reasoning from execution
This model reflects that architectural philosophy in its behavior.
Learn More About XCT
For the complete XCT protocol specification, philosophy, and reference implementations:
Polaris-Core: Production XCT Engine
XCT-Qwen3-4B is designed to run with Polaris-Core, an ultra-optimized C++ binding for llama.cpp that implements deterministic execution.
Why Polaris-Core?
- 55% token savings through essentialized chat templates
- Deterministic execution with JSON early-stop
- Intelligent batch backoff with automatic retry
- GIL-aware threading for Python integration
- Streaming callbacks for real-time output
Quick Start
import polaris_core as pc
# Create engine
eng = pc.Engine(
model_path="XCT-Qwen3-4B-Q5.gguf",
n_ctx=4096,
n_gpu_layers=-1
)
# Execute with deterministic output
result = eng.generate(
prompt="List all available tools",
system_prompt="You are XCT executor",
n_predict=256,
temperature=0.2,
top_p=0.9,
repeat_penalty=1.1
)
print(result)
Full Documentation
- Complete build instructions
- Deployment guides
- Performance benchmarks
- Reference implementations
π Changelog
[v0.1.2] β 2026-02-24
- Dataset curated from 501 β 238 examples (β52%) β focus on reasoning over tool catalog
- Removed all 218
xct.toolsexamples β tool knowledge now injected via system prompt at runtime - Protocol trimmed from 81 β 63 examples (β22%), workflows from 66 β 39 (β41%)
- Philosophy preserved intact (136 examples, now 57% of dataset)
- Training: LoRA r=8, 5 epochs, 150 steps, final loss 2.81
- Artifacts: Q5_K (2.7GB)
[v0.1.1] β 2026-01-12
- +81 XCT Protocol examples β teaches the model to correctly follow the
next_step/doneloop - +20 complete workflows β full end-to-end iterations with error recovery
- Malformed JSON fixed (line 310 of the base dataset)
- Total: 501 examples / 168KB (previously 400 / 120KB)
[v0.1.0] β Initial Release - 2025-08-01
- Base dataset with 218 tool examples and 136 XCT philosophy examples
- JSONL format with
input / output / _topicschema
π Full history in CHANGELOG.md
Final Note
This model is designed to operate quietly within constrained systems.
It acts only under explicit instruction. When instruction is absent or ambiguous, it waits.
This behavior is intentional and consistent with the design goals of XCT.
- Downloads last month
- 28
2-bit