--- license: apache-2.0 ---

North Code Quant

GGUF Code Generation

High-performance quantized GGUF builds of Cohere's North Code model.
Optimized for local inference via llama.cpp, LM Studio, and Ollama.

Base Model Cohere North Code
Architecture Cohere / Command-R
Context Length 128K Tokens
License CC-BY-NC / Custom

⚡ Quick Start

LM Studio

Search for "North Code Quant" in the LM Studio search bar, select your preferred quantization level from the sidebar, and click Download.

llama.cpp

Bash ./llama-cli -m north-code-quant-Q4_K_M.gguf \ --ctx-size 8192 \ --threads $(nproc) \ --prompt "def fibonacci(n):"

📦 Available Quants

Files are sorted by size and quality. Q4_K_M is recommended for most users as the best balance of speed and perplexity.

File Name Quant Type Size Description
North-Code-Quant.gguf Q8_0 -- GB Near-lossless. Best quality, higher VRAM/RAM requirement.

📝 About This Quantization

These GGUF files were converted from the official Cohere North Code weights using llama.cpp with importance matrix calibration for optimal token-level precision retention.

⚠️ Disclaimer: This is a quantized derivative model. While quants retain most of the base model's capabilities, lower-bit quantizations may exhibit degraded performance in edge-case code generation or multilingual tasks. Always verify generated code before execution. This model inherits the license terms of the original Cohere North Code model.