You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Farabi-4B

A 4B Kazakh / Russian / English assistant built on Qwen3-4B and strengthened for Kazakh & Russian knowledge and grounded RAG / agentic tool-use, while retaining the base model's function-calling ability. It drops into agent stacks that expect OpenAI-style function calling and emits clean Hermes tool calls.

Capabilities

  • Stronger Kazakh & Russian knowledge. Improves over the Qwen3-4B base and surpasses the much larger ISSAI Sherkala-8B on the Kazakh knowledge benchmarks (see below).
  • Grounded RAG. Answers from provided passages, attributes claims to the supporting text, and abstains when the evidence is insufficient.
  • Tool-calling (Hermes / OpenAI function calling). Decides when a tool is needed, asks for missing required arguments, emits valid calls, and grounds the final answer in the tool result. Competitive with the Qwen3-4B base on common call patterns.
    • Parallel tool-calling — multiple independent calls in a single turn.
    • Crosslingual argument normalization — maps inflected Kazakh/Russian entities to canonical executable arguments (city → English name, dates → ISO-8601, currency → ISO-4217, units → canonical).
    • Error recovery — retries repairable failures and reports non-repairable ones (not-found / permission-denied / empty) instead of inventing success.
  • Clean outputs — no hidden chain-of-thought; final answers and tool calls only, suitable for production serving.

How to use

Serve with vLLM (OpenAI-compatible, Hermes tool parser)

vllm serve nur-dev/farabi-4b \
  --chat-template chat_template.jinja \
  --enable-auto-tool-choice --tool-call-parser hermes

Call it with the OpenAI SDK (and the OpenAI Agents SDK)

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

resp = client.chat.completions.create(
    model="nur-dev/farabi-4b",
    messages=[{"role": "user", "content": "Бүгін Алматыда ауа райы қандай?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string", "description": "Canonical English city name."}},
                "required": ["city"],
            },
        },
    }],
    tool_choice="auto",
)
print(resp.choices[0].message.tool_calls)

The OpenAI Agents SDK works the same way via openai.AsyncOpenAI(base_url="http://localhost:8000/v1", api_key="x").

Quick chat with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("nur-dev/farabi-4b")
model = AutoModelForCausalLM.from_pretrained(
    "nur-dev/farabi-4b", torch_dtype="bfloat16", device_map="auto")

msgs = [{"role": "user", "content": "Қазақстанның астанасы қай қала?"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=256)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))

The canonical chat_template.jinja ships in this repo. Use it when serving so tool calls render in the Hermes format the parser expects.

Benchmarks

Kazakh knowledge — ISSAI QOLDA suite (n=250/benchmark, accuracy %)

Against the Qwen3-4B base (same size) and the larger ISSAI Sherkala-8B-Chat.

Benchmark Farabi-4B Qwen3-4B (base) Sherkala-8B
ARC-kk 69.6 68.4 74.8
MMLU-kk 50.8 42.8 47.6
MMLU-Pro-kk 28.8 25.6 20.4
GPQA-kk 36.0 30.8 30.0
mean 46.3 41.9 43.2

Farabi-4B improves Kazakh knowledge over its own base (+4.4 mean) and beats the 8B Sherkala on 3 of 4 benchmarks and on the mean, at roughly half Sherkala's size — only ARC-kk still trails Sherkala.

Kazakh academic (additional, accuracy %)

Belebele-kk KazMMLU-kk TUMLU-kk
69.5 36.1 36.5

Russian & math (accuracy %)

Farabi-4B Qwen3-4B (base)
ARC-ru 92.4 92.0
MMLU-Pro-ru 42.4 35.2
GPQA-ru 32.4 31.6
GSM8K-ru 84.0 91.6
GSM8K-kk 68.4 68.4

Farabi-4B leads on Russian knowledge MC (notably MMLU-Pro-ru, +7.2); the base remains stronger on Russian grade-school math (GSM8K-ru).

Function calling — BFCL (Berkeley Function Calling Leaderboard, V4 non-live, accuracy %)

Category Farabi-4B Qwen3-4B (base)
Simple AST 77.6 75.8
  • Python 95.8 96.3
  • Java 65.0 61.0
  • JavaScript 72.0 70.0
Multiple 95.5 96.5
Parallel 88.5 91.5
Parallel-Multiple 64.0 87.5
Irrelevance 47.9 82.1
Non-live overall 81.4 87.8

Farabi-4B matches the Qwen3-4B base on the common call patterns — simple, multiple, and parallel calls (all ≥ 88%, with Farabi slightly ahead on Java/JavaScript). The base stays ahead on the compositional parallel-multiple case and on irrelevance detection, so its overall non-live score is higher. In short: Farabi-4B keeps the base's everyday function-calling while adding the Kazakh/Russian knowledge and grounded-RAG behavior above.

Known limitation. The model's relative weak point is abstention / irrelevance detection — when no tool or evidence is appropriate, it tends to act (answer or call a tool) rather than decline (BFCL irrelevance 47.9%). For high-stakes or credential-bearing contexts, pair it with explicit guardrails or an output filter.

Serving compatibility

Works with vLLM's OpenAI-compatible server using the Hermes tool-call parser (--enable-auto-tool-choice --tool-call-parser hermes) and with the OpenAI Agents SDK via openai.AsyncOpenAI(base_url=..., api_key="x").

Languages

Kazakh (kk), Russian (ru), English (en).

License

CC BY-NC 4.0 — non-commercial use only. Released for research, education, and evaluation; commercial use is not permitted. Built on Qwen3-4B (Apache-2.0); the base-model components remain under their original Apache-2.0 terms.

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nur-dev/farabi-4b

Finetuned
Qwen/Qwen3-4B
Finetuned
(727)
this model