Instructions to use uaytug/ucoder-mini-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use uaytug/ucoder-mini-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="uaytug/ucoder-mini-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("uaytug/ucoder-mini-GGUF")
model = AutoModelForCausalLM.from_pretrained("uaytug/ucoder-mini-GGUF")

llama-cpp-python

How to use uaytug/ucoder-mini-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="uaytug/ucoder-mini-GGUF",
	filename="uCoder-mini-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use uaytug/ucoder-mini-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf uaytug/ucoder-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf uaytug/ucoder-mini-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf uaytug/ucoder-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf uaytug/ucoder-mini-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf uaytug/ucoder-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf uaytug/ucoder-mini-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf uaytug/ucoder-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf uaytug/ucoder-mini-GGUF:Q4_K_M

Use Docker

docker model run hf.co/uaytug/ucoder-mini-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use uaytug/ucoder-mini-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "uaytug/ucoder-mini-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "uaytug/ucoder-mini-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/uaytug/ucoder-mini-GGUF:Q4_K_M

SGLang

How to use uaytug/ucoder-mini-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "uaytug/ucoder-mini-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "uaytug/ucoder-mini-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "uaytug/ucoder-mini-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "uaytug/ucoder-mini-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use uaytug/ucoder-mini-GGUF with Ollama:
```
ollama run hf.co/uaytug/ucoder-mini-GGUF:Q4_K_M
```

Unsloth Studio new

How to use uaytug/ucoder-mini-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for uaytug/ucoder-mini-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for uaytug/ucoder-mini-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for uaytug/ucoder-mini-GGUF to start chatting

Pi new

How to use uaytug/ucoder-mini-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf uaytug/ucoder-mini-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "ucoder-mini-GGUF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use uaytug/ucoder-mini-GGUF with Docker Model Runner:
```
docker model run hf.co/uaytug/ucoder-mini-GGUF:Q4_K_M
```

Lemonade

How to use uaytug/ucoder-mini-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull uaytug/ucoder-mini-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.ucoder-mini-GGUF-Q4_K_M

List all available models

lemonade list

uCoder-mini-GGUF

Quantized GGUF models converted from uaytug/ucoder-mini.

Converted using the latest llama.cpp (CUDA-accelerated quantization).

Available Files

16-bit

ucoder-mini-BF16.gguf → Highest precision float (similar to original, ~3 GB)

8-bit

ucoder-mini-Q8_0.gguf → Near-lossless

6-bit

ucoder-mini-Q6_K.gguf

5-bit

ucoder-mini-Q5_K_S.gguf
ucoder-mini-Q5_K_M.gguf → Great quality

4-bit (most popular range)

ucoder-mini-Q4_K_M.gguf → Recommended balance
ucoder-mini-Q4_K_S.gguf
ucoder-mini-Q4_1.gguf
ucoder-mini-IQ4_XS.gguf
ucoder-mini-IQ4_NL.gguf

3-bit

ucoder-mini-Q3_K_S.gguf
ucoder-mini-Q3_K_M.gguf
ucoder-mini-IQ3_XXS.gguf

2-bit

ucoder-mini-Q2_K.gguf
ucoder-mini-IQ2_M.gguf

Original Model Information

uCoder Mini

Important: The model is unable to produce accurate and high-quality answers to general knowledge, creative writing, or non-coding tasks, and to questions asked in languages other than English. The answers to your questions in these areas may not be satisfactory because this model was specifically trained for coding and mathematical reasoning tasks (competitive programming, LeetCode, algorithm problems, etc.).

Overview

uCoder Mini is a 1.5B parameter dense language model fine-tuned specifically for code generation and mathematical reasoning. Built on the Qwen2 architecture, this model demonstrates that small, focused models can achieve strong performance on programming tasks when trained on high-quality, curated data.

Key Features

Specialized Focus: Trained exclusively on coding and math data for maximum performance in these domains
Efficient Size: 1.5B parameters — runs on consumer GPUs, fast inference
Extended Context: Supports up to 4096 tokens for longer code generation
Multi-Language: Handles Python, JavaScript, C++, Java, and more
Competitive Programming: Strong on algorithmic problems (LeetCode, Codeforces-style)

Model Details

Attribute	Value
Architecture	Qwen2 (Dense Transformer)
Parameters	~1.5B
Hidden Size	1536
Layers	28
Attention Heads	12
Context Length	4096 tokens
Vocabulary Size	151,936
Training Precision	bfloat16
Training Method	Supervised Fine-Tuning (SFT)

Intended Use

Recommended for:

Competitive programming (LeetCode, Codeforces, HackerRank)
Algorithm implementation and optimization
Mathematical problem solving with code
Code debugging and explanation
Learning programming concepts

Not recommended for:

General conversation or chat
Creative writing or storytelling
Factual Q&A or knowledge retrieval
Non-English tasks
Production systems without human review

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "uaytug/ucoder-mini"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Write a Python function to find the longest palindromic substring."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.6,
    top_p=0.95,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Recommended Generation Parameters

For best results on coding tasks:

generation_config = {
    "max_new_tokens": 2048,
    "temperature": 0.6,      # Use 0.6 for focused output, 1.0 for more exploration
    "top_p": 0.95,
    "do_sample": True,
}

Chat Template

uCoder Mini uses the ChatML format:

<|im_start|>user
Your coding question here<|im_end|>
<|im_start|>assistant

Training Data

Trained on UCDS (uCoder Dataset) — a curated collection of 420K+ high-quality samples:

Source	Samples	Description
Codeforces	47K+	Competitive programming with solutions
Code-Contests-Plus	10K+	Algorithm challenges
CodeAlpaca	15K+	Instruction-following code
OpenMathInstruct	Various	Mathematical reasoning chains
Curated Sources	Various	Tool use, chain-of-thought, deduplication

Total: 420,686 deduplicated samples covering Python, JavaScript, C++, Java, and more.

Limitations

Domain-Specific: This model is trained only on code/math — it will perform poorly on general tasks
English Only: Best performance in English; other languages not supported
Knowledge Cutoff: May not know about recent libraries, frameworks, or API changes
No Verification: Always test generated code before use in production
Context Limit: 4096 tokens may not be enough for very large codebases
No Web Access: Cannot fetch documentation or search for solutions

Example Outputs

Algorithm Problem

Prompt: Implement a function to find the kth largest element in an unsorted array.

import heapq

def find_kth_largest(nums, k):
    # Use a min-heap of size k
    # Time: O(n log k), Space: O(k)
    heap = []
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)
    return heap[0]

# Alternative: QuickSelect for O(n) average
def quickselect(nums, k):
    k = len(nums) - k  # Convert to kth smallest
    
    def partition(left, right, pivot_idx):
        pivot = nums[pivot_idx]
        nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
        store_idx = left
        for i in range(left, right):
            if nums[i] < pivot:
                nums[store_idx], nums[i] = nums[i], nums[store_idx]
                store_idx += 1
        nums[right], nums[store_idx] = nums[store_idx], nums[right]
        return store_idx
    
    left, right = 0, len(nums) - 1
    while True:
        pivot_idx = partition(left, right, right)
        if pivot_idx == k:
            return nums[k]
        elif pivot_idx < k:
            left = pivot_idx + 1
        else:
            right = pivot_idx - 1

Mathematical Reasoning

Prompt: Prove that the sum of the first n odd numbers equals n².

Response:

Let's prove this by multiple methods:

Method 1: Direct Calculation The first n odd numbers are: 1, 3, 5, ..., (2n-1)

Sum = 1 + 3 + 5 + ... + (2n-1)

This is an arithmetic sequence with:

First term a₁ = 1
Last term aₙ = 2n - 1
Number of terms = n

Sum = n × (first + last) / 2 = n × (1 + 2n - 1) / 2 = n × 2n / 2 = n²

Method 2: Induction

Base case: n=1 → Sum = 1 = 1² ✓
Assume true for k: 1 + 3 + ... + (2k-1) = k²
For k+1: k² + (2(k+1)-1) = k² + 2k + 1 = (k+1)² ✓

Debugging

Prompt: Why does this code give wrong results?

def binary_search(arr, target):
    left, right = 0, len(arr)
    while left < right:
        mid = (left + right) / 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid
        else:
            right = mid
    return -1

Response: There are several bugs:

Integer division: mid = (left + right) / 2 returns a float. Use // for integer division.
Infinite loop: left = mid can cause infinite loop when mid == left. Should be left = mid + 1.
Off-by-one: With right = len(arr), using right = mid is correct, but the initial range setup suggests you want right = len(arr) - 1 with left <= right.

Fixed version:

def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

Hardware Requirements

Setup	VRAM Required	Notes
FP16/BF16	~3 GB	Full precision inference

Citation

@misc{ucoder-mini,
  author = {uaytug},
  title = {uCoder Mini: A Compact Language Model for Code and Math},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/uaytug/ucoder-mini}
}

Acknowledgments

Thanks to the open-source community and creators of the datasets that made UCDS possible.

Downloads last month: 167

GGUF

Model size

2B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for uaytug/ucoder-mini-GGUF

Base model

uaytug/ucoder-mini

Quantized

(2)

this model

Dataset used to train uaytug/ucoder-mini-GGUF

Collection including uaytug/ucoder-mini-GGUF

uCoder Mini

Collection

1.5B SLM that is specifically trained for coding and mathematics • 3 items • Updated Mar 2