How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf infosave/cortiq-coder-12B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf infosave/cortiq-coder-12B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf infosave/cortiq-coder-12B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf infosave/cortiq-coder-12B:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf infosave/cortiq-coder-12B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf infosave/cortiq-coder-12B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf infosave/cortiq-coder-12B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf infosave/cortiq-coder-12B:Q4_K_M
Use Docker
docker model run hf.co/infosave/cortiq-coder-12B:Q4_K_M
Quick Links

Cortiq Coder 12B

Cortiq Coder 12B is a task-specialized coding model compiled from Qwen3-27B down to ~12B effective parameters using a proprietary dynamic neural network compression method developed by AllAIGate.

The compression is performed via the CORTIQ method โ€” a system and method for Dynamic Task-Guided Neural Network Compression with Catastrophic Forgetting Prevention, covered under US Patent Application No. 19/452,464 (filed January 19, 2026).

Unlike naive pruning or quantization, CORTIQ preserves task-critical knowledge during compression by dynamically guiding the pruning process toward the target domain (code generation), while actively preventing degradation of the model's core reasoning capabilities.

Key Features

  • ๐Ÿ”ง Optimized for code generation โ€” structured compression guided by coding tasks
  • ๐Ÿง  Based on Qwen3-27B โ€” retains strong reasoning foundation of the 27B base model at 12B scale
  • ๐Ÿš€ GGUF Q4_K_M quantization โ€” ready for efficient local inference
  • ๐Ÿ›ก๏ธ Catastrophic forgetting prevention โ€” task-specific compression without degrading general capabilities
  • ๐Ÿ“ฆ Compact footprint โ€” 9.37 GB in GGUF Q4_K_M format

Files

File Format Size Description
cortiq-coder-12b-Q4_K_M.gguf GGUF 9.37 GB Quantized model for llama.cpp / LM Studio / Ollama
cortiq-coder-12b-nvg.tar TAR 30.1 GB Full native model weights

Quick Start

llama.cpp

llama-server -hf infosave/cortiq-coder-12B:Q4_K_M

Ollama

ollama run hf.co/infosave/cortiq-coder-12B:Q4_K_M

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="infosave/cortiq-coder-12B",
    filename="cortiq-coder-12b-Q4_K_M.gguf",
)
response = llm.create_chat_completion(messages=[
    {"role": "user", "content": "Write a Python function to sort a list of dicts by key."}
])
print(response["choices"]["message"]["content"])

Method Reference

Patent: US Application No. 19/452,464 โ€” "System and Method for Dynamic Task-Guided Neural Network Compression with Catastrophic Forgetting Prevention" โ€” Filed January 19, 2026.
Details: https://allaigate.com/ru/

License

Released under the Apache 2.0 License, consistent with the Qwen3 base model license.

Downloads last month
247
GGUF
Model size
15B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support