medzon-1.2B-Instruct

صُنع بكل فخر في النجف الأشرف


نبذة عن النموذج

medzon-1.2B-Instruct هو نموذج لغوي عراقي بحجم 1.2 مليار معامل، مُدرَّب خصيصاً على استدعاء الأدوات والدوال (Tool / Function Calling). دُرِّب هذا النموذج محلياً في مدينة النجف الأشرف، ونحن في النجف نفخر بتقديمه كإسهامٍ عراقيٍّ خالص في مجال الذكاء الاصطناعي.

صُمِّم النموذج ليعمل بكفاءة على الأجهزة المحلية، ويتميّز بدقّةٍ عالية في فهم الأوامر وتوليد استدعاءات الأدوات بصيغةٍ منظّمة وقابلة للتحليل. نطمح أن يكون هذا العمل خطوةً نحو بناء نماذج ذكاءٍ اصطناعيٍّ عربيةٍ وعراقيةٍ بأيادٍ محلية.


About

medzon-1.2B-Instruct is a 1.2B-parameter instruction-tuned language model, specialized for structured tool / function calling. Shipped as a single f16 GGUF file for fast local inference with llama.cpp, Ollama, and any GGUF-compatible runtime.

The model is tuned to read a list of available functions from the system prompt, decide which (if any) to call, emit the call(s) in a strict, parseable format, consume the tool results, and return a natural-language answer.


Model details

Property Value
Name medzon-1.2B-Instruct
Base weights LFM2-1.2B-Instruct by Liquid AI
Total parameters 1.17B
Layers 16 (10 double-gated LIV convolution + 6 GQA blocks)
Context length 32,768 tokens
Vocabulary size 65,536
Precision BF16 (native) · distributed as GGUF f16
File medzon-1.2B-Instruct.gguf (~2.34 GB)
Supported languages English, Arabic, Chinese, French, German, Japanese, Korean, Spanish
Specialization tool / function calling, multi-turn conversation
Origin Iraqi local training — Najaf, Iraq

Benchmarks

Schema advantages vs other 1.2B tool-callers

Advantage ✅ medzon — bare [...] ⚠️ Other — control-token
Tokens per call Fewer — no wrapper tokens +2 special tokens every call
Duplication waste None observed Whole call re-emitted (~2×)
Argument integrity Clean & well-formed every time Control tokens leak into args
Parsing Plain [...] regex Requires special-token support
Portability llama.cpp · Ollama · raw HTTP Tied to token-aware backends
Output noise Pure call, nothing to strip Markers must be stripped first
Multi-turn cost Savings compound per turn Wrapper overhead repeats per turn

After the tool-call fine-tuning, function-calling performance on BFCLv3 increased relative to the base instruction model — the primary goal of this release. The bare [func(arg="value")] schema is also more token-efficient and portable: it drops the <|tool_call_start|> … <|tool_call_end|> wrapper tokens, avoids the duplicate/garbled calls seen with the control-token format, and parses with a plain regex on any runtime.

Token cost — example call [Get Weather(city="Erbil")]:

Original (wrapper) medzon (bare)
Typical clean call ~14 tokens (call + 2 markers) ~12 tokens
When it duplicates ~28 tokens ~12 tokens

The savings are small per call but compound across every tool turn in a multi-turn conversation.

Recommended generation settings

temperature        = 0.1
top_k              = 50
top_p              = 0.1
repetition_penalty = 1.05

Low temperature is important: tool calls must be emitted exactly, so deterministic decoding gives the most reliable parsing.


Tool-calling schema

The model uses four roles — system, user, assistant, tool — wrapped in the chat markup:

<|startoftext|><|im_start|>system
{system prompt + function list}<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{function call(s)}<|im_end|>
<|im_start|>tool
{tool results}<|im_end|>
<|im_start|>assistant
{final natural-language answer}<|im_end|>

1. System prompt — declaring functions

Pass the available functions to the system role as a JSON list. Each function declares name, description, and a parameters object (type: "dict", properties, required):

You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the function can be used, point it out. If the given question lacks the parameters required
by the function, also point it out. You should only return the function call in tools call sections.
Here is a list of functions in JSON format that you can invoke:
[
  {
    "name": "Get Arabic Word Meaning",
    "description": "Look up the meaning and root of an Arabic word in a classical dictionary.",
    "parameters": {
      "type": "dict",
      "properties": {
        "word": {"description": "The Arabic word to look up.", "type": "string"}
      },
      "required": ["word"]
    },
    "required": null
  },
  {
    "name": "Arabic News API",
    "description": "Get the latest Arabic news headlines for a specified country and topic.",
    "parameters": {
      "type": "dict",
      "properties": {
        "topic": {
          "description": "News topic.",
          "type": "string",
          "enum": ["POLITICS", "ECONOMY", "SPORTS", "CULTURE", "TECHNOLOGY", "RELIGION"]
        },
        "country": {"description": "2-letter ISO 3166 country code.", "type": "string", "default": "iq"},
        "language": {"description": "2-letter ISO 639-1 language code.", "type": "string", "default": "ar"}
      },
      "required": ["topic"]
    },
    "required": null
  }
]
Should you decide to return the function call(s).
Put it in the format of [func1(params_name=params_value, params_name2=params_value2...), func2(params)]

NO other text MUST be included.

2. Assistant — the function call

The model replies with the call(s) only, inside square brackets. Arguments are name=value pairs; string values are quoted. Multiple calls are comma-separated inside the same brackets:

[Arabic News API(topic="ECONOMY", country="iq")]

Single-argument call:

[Get Arabic Word Meaning(word="كتاب")]

Parallel / multiple calls:

[Arabic News API(topic="CULTURE", country="iq"), Get Arabic Word Meaning(word="نجف")]

If no function fits, or required parameters are missing, the model says so in plain text instead of fabricating a call.

3. Tool — returning results

Send results back in the tool role as a JSON list, one object per call, echoing the function name and a results payload:

[{"name": "Arabic News API", "results": {"headlines": [{"title": "ارتفاع أسعار النفط في الأسواق العراقية", "source": "INA"}]}}]

4. Assistant — final answer

The model then produces a natural-language response grounded in the tool results.


Usage

Download from Hugging Face

# CLI
huggingface-cli download medzonai/medzon-1.2B-Instruct medzon-1.2B-Instruct.gguf --local-dir .
# Python
from huggingface_hub import hf_hub_download
path = hf_hub_download("medzonai/medzon-1.2B-Instruct", "medzon-1.2B-Instruct.gguf")

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="medzon-1.2B-Instruct.gguf", n_ctx=32768)

out = llm.create_completion(
    prompt=PROMPT,            # built with the schema above
    temperature=0.1, top_k=50, top_p=0.1, repeat_penalty=1.05,
    max_tokens=1024,
)
print(out["choices"][0]["text"])

Training loss

Supervised fine-tuning converged cleanly, with loss computed on assistant/tool-call completions only:

Phase Training loss
Initial ~5.03
Early convergence ~0.60
Plateau ~0.50
Final ~0.45 – 0.49

Loss dropped sharply over the first part of training and then settled into a stable ~0.45–0.49 band, indicating the model reliably learned the tool-call format without overfitting.


Notes & limitations

  • The model emits calls only in the [func(arg="value")] bracket format — your runtime must parse this and dispatch the actual functions; the model does not execute anything itself.
  • Keep the function list in the system role and feed real results back in the tool role for best results.
  • As a 1.2B model it is optimized for routing and argument extraction; verify arguments before executing sensitive actions.
Downloads last month
117
GGUF
Model size
1B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support