Instructions to use medzonai/medzon-1.2B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use medzonai/medzon-1.2B-Instruct with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="medzonai/medzon-1.2B-Instruct", filename="medzon-1.2B-Instruct.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use medzonai/medzon-1.2B-Instruct with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf medzonai/medzon-1.2B-Instruct # Run inference directly in the terminal: llama cli -hf medzonai/medzon-1.2B-Instruct
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf medzonai/medzon-1.2B-Instruct # Run inference directly in the terminal: llama cli -hf medzonai/medzon-1.2B-Instruct
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf medzonai/medzon-1.2B-Instruct # Run inference directly in the terminal: ./llama-cli -hf medzonai/medzon-1.2B-Instruct
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf medzonai/medzon-1.2B-Instruct # Run inference directly in the terminal: ./build/bin/llama-cli -hf medzonai/medzon-1.2B-Instruct
Use Docker
docker model run hf.co/medzonai/medzon-1.2B-Instruct
- LM Studio
- Jan
- vLLM
How to use medzonai/medzon-1.2B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "medzonai/medzon-1.2B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "medzonai/medzon-1.2B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/medzonai/medzon-1.2B-Instruct
- Ollama
How to use medzonai/medzon-1.2B-Instruct with Ollama:
ollama run hf.co/medzonai/medzon-1.2B-Instruct
- Unsloth Studio
How to use medzonai/medzon-1.2B-Instruct with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for medzonai/medzon-1.2B-Instruct to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for medzonai/medzon-1.2B-Instruct to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for medzonai/medzon-1.2B-Instruct to start chatting
- Pi
How to use medzonai/medzon-1.2B-Instruct with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf medzonai/medzon-1.2B-Instruct
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "medzonai/medzon-1.2B-Instruct" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use medzonai/medzon-1.2B-Instruct with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf medzonai/medzon-1.2B-Instruct
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default medzonai/medzon-1.2B-Instruct
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use medzonai/medzon-1.2B-Instruct with Docker Model Runner:
docker model run hf.co/medzonai/medzon-1.2B-Instruct
- Lemonade
How to use medzonai/medzon-1.2B-Instruct with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull medzonai/medzon-1.2B-Instruct
Run and chat with the model
lemonade run user.medzon-1.2B-Instruct-{{QUANT_TAG}}List all available models
lemonade list
medzon-1.2B-Instruct
صُنع بكل فخر في النجف الأشرف
نبذة عن النموذج
medzon-1.2B-Instruct هو نموذج لغوي عراقي بحجم 1.2 مليار معامل، مُدرَّب خصيصاً على استدعاء الأدوات والدوال (Tool / Function Calling). دُرِّب هذا النموذج محلياً في مدينة النجف الأشرف، ونحن في النجف نفخر بتقديمه كإسهامٍ عراقيٍّ خالص في مجال الذكاء الاصطناعي.
صُمِّم النموذج ليعمل بكفاءة على الأجهزة المحلية، ويتميّز بدقّةٍ عالية في فهم الأوامر وتوليد استدعاءات الأدوات بصيغةٍ منظّمة وقابلة للتحليل. نطمح أن يكون هذا العمل خطوةً نحو بناء نماذج ذكاءٍ اصطناعيٍّ عربيةٍ وعراقيةٍ بأيادٍ محلية.
About
medzon-1.2B-Instruct is a 1.2B-parameter instruction-tuned language model, specialized for structured tool / function calling. Shipped as a single f16 GGUF file for fast local inference with llama.cpp, Ollama, and any GGUF-compatible runtime.
The model is tuned to read a list of available functions from the system prompt, decide which (if any) to call, emit the call(s) in a strict, parseable format, consume the tool results, and return a natural-language answer.
Model details
| Property | Value |
|---|---|
| Name | medzon-1.2B-Instruct |
| Base weights | LFM2-1.2B-Instruct by Liquid AI |
| Total parameters | 1.17B |
| Layers | 16 (10 double-gated LIV convolution + 6 GQA blocks) |
| Context length | 32,768 tokens |
| Vocabulary size | 65,536 |
| Precision | BF16 (native) · distributed as GGUF f16 |
| File | medzon-1.2B-Instruct.gguf (~2.34 GB) |
| Supported languages | English, Arabic, Chinese, French, German, Japanese, Korean, Spanish |
| Specialization | tool / function calling, multi-turn conversation |
| Origin | Iraqi local training — Najaf, Iraq |
Benchmarks
|
Schema advantages vs other 1.2B tool-callers
|
After the tool-call fine-tuning, function-calling performance on BFCLv3 increased relative to the base instruction model — the primary goal of this release. The bare [func(arg="value")] schema is also more token-efficient and portable: it drops the <|tool_call_start|> … <|tool_call_end|> wrapper tokens, avoids the duplicate/garbled calls seen with the control-token format, and parses with a plain regex on any runtime.
Token cost — example call [Get Weather(city="Erbil")]:
| Original (wrapper) | medzon (bare) | |
|---|---|---|
| Typical clean call | ~14 tokens (call + 2 markers) | ~12 tokens |
| When it duplicates | ~28 tokens | ~12 tokens |
The savings are small per call but compound across every tool turn in a multi-turn conversation.
Recommended generation settings
temperature = 0.1
top_k = 50
top_p = 0.1
repetition_penalty = 1.05
Low temperature is important: tool calls must be emitted exactly, so deterministic decoding gives the most reliable parsing.
Tool-calling schema
The model uses four roles — system, user, assistant, tool — wrapped in the chat markup:
<|startoftext|><|im_start|>system
{system prompt + function list}<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{function call(s)}<|im_end|>
<|im_start|>tool
{tool results}<|im_end|>
<|im_start|>assistant
{final natural-language answer}<|im_end|>
1. System prompt — declaring functions
Pass the available functions to the system role as a JSON list. Each function declares name, description, and a parameters object (type: "dict", properties, required):
You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the function can be used, point it out. If the given question lacks the parameters required
by the function, also point it out. You should only return the function call in tools call sections.
Here is a list of functions in JSON format that you can invoke:
[
{
"name": "Get Arabic Word Meaning",
"description": "Look up the meaning and root of an Arabic word in a classical dictionary.",
"parameters": {
"type": "dict",
"properties": {
"word": {"description": "The Arabic word to look up.", "type": "string"}
},
"required": ["word"]
},
"required": null
},
{
"name": "Arabic News API",
"description": "Get the latest Arabic news headlines for a specified country and topic.",
"parameters": {
"type": "dict",
"properties": {
"topic": {
"description": "News topic.",
"type": "string",
"enum": ["POLITICS", "ECONOMY", "SPORTS", "CULTURE", "TECHNOLOGY", "RELIGION"]
},
"country": {"description": "2-letter ISO 3166 country code.", "type": "string", "default": "iq"},
"language": {"description": "2-letter ISO 639-1 language code.", "type": "string", "default": "ar"}
},
"required": ["topic"]
},
"required": null
}
]
Should you decide to return the function call(s).
Put it in the format of [func1(params_name=params_value, params_name2=params_value2...), func2(params)]
NO other text MUST be included.
2. Assistant — the function call
The model replies with the call(s) only, inside square brackets. Arguments are name=value pairs; string values are quoted. Multiple calls are comma-separated inside the same brackets:
[Arabic News API(topic="ECONOMY", country="iq")]
Single-argument call:
[Get Arabic Word Meaning(word="كتاب")]
Parallel / multiple calls:
[Arabic News API(topic="CULTURE", country="iq"), Get Arabic Word Meaning(word="نجف")]
If no function fits, or required parameters are missing, the model says so in plain text instead of fabricating a call.
3. Tool — returning results
Send results back in the tool role as a JSON list, one object per call, echoing the function name and a results payload:
[{"name": "Arabic News API", "results": {"headlines": [{"title": "ارتفاع أسعار النفط في الأسواق العراقية", "source": "INA"}]}}]
4. Assistant — final answer
The model then produces a natural-language response grounded in the tool results.
Usage
Download from Hugging Face
# CLI
huggingface-cli download medzonai/medzon-1.2B-Instruct medzon-1.2B-Instruct.gguf --local-dir .
# Python
from huggingface_hub import hf_hub_download
path = hf_hub_download("medzonai/medzon-1.2B-Instruct", "medzon-1.2B-Instruct.gguf")
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="medzon-1.2B-Instruct.gguf", n_ctx=32768)
out = llm.create_completion(
prompt=PROMPT, # built with the schema above
temperature=0.1, top_k=50, top_p=0.1, repeat_penalty=1.05,
max_tokens=1024,
)
print(out["choices"][0]["text"])
Training loss
Supervised fine-tuning converged cleanly, with loss computed on assistant/tool-call completions only:
| Phase | Training loss |
|---|---|
| Initial | ~5.03 |
| Early convergence | ~0.60 |
| Plateau | ~0.50 |
| Final | ~0.45 – 0.49 |
Loss dropped sharply over the first part of training and then settled into a stable ~0.45–0.49 band, indicating the model reliably learned the tool-call format without overfitting.
Notes & limitations
- The model emits calls only in the
[func(arg="value")]bracket format — your runtime must parse this and dispatch the actual functions; the model does not execute anything itself. - Keep the function list in the system role and feed real results back in the tool role for best results.
- As a 1.2B model it is optimized for routing and argument extraction; verify arguments before executing sensitive actions.
- Downloads last month
- 117
We're not able to determine the quantization variants.