Instructions to use medzonai/medzon-1.2B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use medzonai/medzon-1.2B-Instruct with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="medzonai/medzon-1.2B-Instruct",
	filename="medzon-1.2B-Instruct.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use medzonai/medzon-1.2B-Instruct with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf medzonai/medzon-1.2B-Instruct
# Run inference directly in the terminal:
llama cli -hf medzonai/medzon-1.2B-Instruct

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf medzonai/medzon-1.2B-Instruct
# Run inference directly in the terminal:
llama cli -hf medzonai/medzon-1.2B-Instruct

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf medzonai/medzon-1.2B-Instruct
# Run inference directly in the terminal:
./llama-cli -hf medzonai/medzon-1.2B-Instruct

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf medzonai/medzon-1.2B-Instruct
# Run inference directly in the terminal:
./build/bin/llama-cli -hf medzonai/medzon-1.2B-Instruct

Use Docker

docker model run hf.co/medzonai/medzon-1.2B-Instruct

LM Studio
Jan

vLLM

How to use medzonai/medzon-1.2B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "medzonai/medzon-1.2B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "medzonai/medzon-1.2B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/medzonai/medzon-1.2B-Instruct

Ollama
How to use medzonai/medzon-1.2B-Instruct with Ollama:
```
ollama run hf.co/medzonai/medzon-1.2B-Instruct
```

Unsloth Studio

How to use medzonai/medzon-1.2B-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for medzonai/medzon-1.2B-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for medzonai/medzon-1.2B-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for medzonai/medzon-1.2B-Instruct to start chatting

How to use medzonai/medzon-1.2B-Instruct with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf medzonai/medzon-1.2B-Instruct

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "medzonai/medzon-1.2B-Instruct"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use medzonai/medzon-1.2B-Instruct with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf medzonai/medzon-1.2B-Instruct

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default medzonai/medzon-1.2B-Instruct

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use medzonai/medzon-1.2B-Instruct with Docker Model Runner:
```
docker model run hf.co/medzonai/medzon-1.2B-Instruct
```

Lemonade

How to use medzonai/medzon-1.2B-Instruct with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull medzonai/medzon-1.2B-Instruct

Run and chat with the model

lemonade run user.medzon-1.2B-Instruct-{{QUANT_TAG}}

List all available models

lemonade list

medzon-1.2B-Instruct

صُنع بكل فخر في النجف الأشرف

Try medzon AI • Docs • LEAP • Discord

نبذة عن النموذج

medzon-1.2B-Instruct هو نموذج لغوي عراقي بحجم 1.2 مليار معامل، مُدرَّب خصيصاً على استدعاء الأدوات والدوال (Tool / Function Calling). دُرِّب هذا النموذج محلياً في مدينة النجف الأشرف، ونحن في النجف نفخر بتقديمه كإسهامٍ عراقيٍّ خالص في مجال الذكاء الاصطناعي.

صُمِّم النموذج ليعمل بكفاءة على الأجهزة المحلية، ويتميّز بدقّةٍ عالية في فهم الأوامر وتوليد استدعاءات الأدوات بصيغةٍ منظّمة وقابلة للتحليل. نطمح أن يكون هذا العمل خطوةً نحو بناء نماذج ذكاءٍ اصطناعيٍّ عربيةٍ وعراقيةٍ بأيادٍ محلية.

About

medzon-1.2B-Instruct is a 1.2B-parameter instruction-tuned language model, specialized for structured tool / function calling. Shipped as a single f16 GGUF file for fast local inference with llama.cpp, Ollama, and any GGUF-compatible runtime.

The model is tuned to read a list of available functions from the system prompt, decide which (if any) to call, emit the call(s) in a strict, parseable format, consume the tool results, and return a natural-language answer.

Model details

Property	Value
Name	`medzon-1.2B-Instruct`
Base weights	LFM2-1.2B-Instruct by Liquid AI
Total parameters	1.17B
Layers	16 (10 double-gated LIV convolution + 6 GQA blocks)
Context length	32,768 tokens
Vocabulary size	65,536
Precision	BF16 (native) · distributed as GGUF `f16`
File	`medzon-1.2B-Instruct.gguf` (~2.34 GB)
Supported languages	English, Arabic, Chinese, French, German, Japanese, Korean, Spanish
Specialization	tool / function calling, multi-turn conversation
Origin	Iraqi local training — Najaf, Iraq

Benchmarks

Schema advantages vs other 1.2B tool-callers

Advantage	✅ medzon — bare `[...]`	⚠️ Other — control-token
Tokens per call	Fewer — no wrapper tokens	+2 special tokens every call
Duplication waste	None observed	Whole call re-emitted (~2×)
Argument integrity	Clean & well-formed every time	Control tokens leak into args
Parsing	Plain `[...]` regex	Requires special-token support
Portability	llama.cpp · Ollama · raw HTTP	Tied to token-aware backends
Output noise	Pure call, nothing to strip	Markers must be stripped first
Multi-turn cost	Savings compound per turn	Wrapper overhead repeats per turn

After the tool-call fine-tuning, function-calling performance on BFCLv3 increased relative to the base instruction model — the primary goal of this release. The bare [func(arg="value")] schema is also more token-efficient and portable: it drops the <|tool_call_start|> … <|tool_call_end|> wrapper tokens, avoids the duplicate/garbled calls seen with the control-token format, and parses with a plain regex on any runtime.

Token cost — example call [Get Weather(city="Erbil")]:

	Original (wrapper)	medzon (bare)
Typical clean call	~14 tokens (call + 2 markers)	~12 tokens
When it duplicates	~28 tokens	~12 tokens

The savings are small per call but compound across every tool turn in a multi-turn conversation.

Recommended generation settings

temperature        = 0.1
top_k              = 50
top_p              = 0.1
repetition_penalty = 1.05

Low temperature is important: tool calls must be emitted exactly, so deterministic decoding gives the most reliable parsing.

Tool-calling schema

The model uses four roles — system, user, assistant, tool — wrapped in the chat markup:

<|startoftext|><|im_start|>system
{system prompt + function list}<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{function call(s)}<|im_end|>
<|im_start|>tool
{tool results}<|im_end|>
<|im_start|>assistant
{final natural-language answer}<|im_end|>

1. System prompt — declaring functions

Pass the available functions to the system role as a JSON list. Each function declares name, description, and a parameters object (type: "dict", properties, required):

You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the function can be used, point it out. If the given question lacks the parameters required
by the function, also point it out. You should only return the function call in tools call sections.
Here is a list of functions in JSON format that you can invoke:
[
  {
    "name": "Get Arabic Word Meaning",
    "description": "Look up the meaning and root of an Arabic word in a classical dictionary.",
    "parameters": {
      "type": "dict",
      "properties": {
        "word": {"description": "The Arabic word to look up.", "type": "string"}
      },
      "required": ["word"]
    },
    "required": null
  },
  {
    "name": "Arabic News API",
    "description": "Get the latest Arabic news headlines for a specified country and topic.",
    "parameters": {
      "type": "dict",
      "properties": {
        "topic": {
          "description": "News topic.",
          "type": "string",
          "enum": ["POLITICS", "ECONOMY", "SPORTS", "CULTURE", "TECHNOLOGY", "RELIGION"]
        },
        "country": {"description": "2-letter ISO 3166 country code.", "type": "string", "default": "iq"},
        "language": {"description": "2-letter ISO 639-1 language code.", "type": "string", "default": "ar"}
      },
      "required": ["topic"]
    },
    "required": null
  }
]
Should you decide to return the function call(s).
Put it in the format of [func1(params_name=params_value, params_name2=params_value2...), func2(params)]

NO other text MUST be included.

2. Assistant — the function call

The model replies with the call(s) only, inside square brackets. Arguments are name=value pairs; string values are quoted. Multiple calls are comma-separated inside the same brackets:

[Arabic News API(topic="ECONOMY", country="iq")]

Single-argument call:

[Get Arabic Word Meaning(word="كتاب")]

Parallel / multiple calls:

[Arabic News API(topic="CULTURE", country="iq"), Get Arabic Word Meaning(word="نجف")]

If no function fits, or required parameters are missing, the model says so in plain text instead of fabricating a call.

3. Tool — returning results

Send results back in the tool role as a JSON list, one object per call, echoing the function name and a results payload:

[{"name": "Arabic News API", "results": {"headlines": [{"title": "ارتفاع أسعار النفط في الأسواق العراقية", "source": "INA"}]}}]

4. Assistant — final answer

The model then produces a natural-language response grounded in the tool results.

Usage

Download from Hugging Face

# CLI
huggingface-cli download medzonai/medzon-1.2B-Instruct medzon-1.2B-Instruct.gguf --local-dir .

# Python
from huggingface_hub import hf_hub_download
path = hf_hub_download("medzonai/medzon-1.2B-Instruct", "medzon-1.2B-Instruct.gguf")

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="medzon-1.2B-Instruct.gguf", n_ctx=32768)

out = llm.create_completion(
    prompt=PROMPT,            # built with the schema above
    temperature=0.1, top_k=50, top_p=0.1, repeat_penalty=1.05,
    max_tokens=1024,
)
print(out["choices"][0]["text"])

Training loss

Supervised fine-tuning converged cleanly, with loss computed on assistant/tool-call completions only:

Phase	Training loss
Initial	~5.03
Early convergence	~0.60
Plateau	~0.50
Final	~0.45 – 0.49

Loss dropped sharply over the first part of training and then settled into a stable ~0.45–0.49 band, indicating the model reliably learned the tool-call format without overfitting.

Notes & limitations

The model emits calls only in the [func(arg="value")] bracket format — your runtime must parse this and dispatch the actual functions; the model does not execute anything itself.
Keep the function list in the system role and feed real results back in the tool role for best results.
As a 1.2B model it is optimized for routing and argument extraction; verify arguments before executing sensitive actions.

Downloads last month: 117

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants