Text Generation
Transformers
Safetensors
Kazakh
Russian
English
qwen3
kazakh
multilingual
instruction-tuning
tool-calling
function-calling
agent
conversational
text-generation-inference
Instructions to use nur-dev/farabi-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nur-dev/farabi-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nur-dev/farabi-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nur-dev/farabi-0.6B") model = AutoModelForCausalLM.from_pretrained("nur-dev/farabi-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nur-dev/farabi-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nur-dev/farabi-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nur-dev/farabi-0.6B
- SGLang
How to use nur-dev/farabi-0.6B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nur-dev/farabi-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nur-dev/farabi-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nur-dev/farabi-0.6B with Docker Model Runner:
docker model run hf.co/nur-dev/farabi-0.6B
| language: | |
| - kk | |
| - ru | |
| - en | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| tags: | |
| - kazakh | |
| - multilingual | |
| - instruction-tuning | |
| - tool-calling | |
| - function-calling | |
| - agent | |
| - conversational | |
| base_model: nur-dev/farabi-0.6B-base | |
| license: apache-2.0 | |
| # Farabi-0.6B | |
| **Farabi-0.6B** is a compact, multilingual instruction-tuned language model with a | |
| primary focus on **Kazakh**, alongside strong **Russian** and **English** support. | |
| It is designed for everyday assistant use, reasoning, retrieval-grounded answering, | |
| and **tool / function calling** in agentic applications. | |
| The model speaks fluent Kazakh and is intended to make high-quality conversational | |
| AI more accessible for the Kazakh language, where well-aligned models remain scarce. | |
| Created by **[Nurgali Kadyrbek](https://www.linkedin.com/in/nurgali-kadyrbek-504260231/)**. | |
| It is built on **[`nur-dev/farabi-0.6B-base`](https://huggingface.co/nur-dev/farabi-0.6B-base)** — | |
| a Kazakh-adapted base model that was itself continually pre-trained from Qwen3-0.6B — and then | |
| instruction-tuned to produce this assistant. | |
| --- | |
| ## Highlights | |
| - 🇰🇿 **Kazakh-first** — the majority of the instruction data is native Kazakh, with | |
| Russian and English mixed in for cross-lingual robustness. | |
| - 🧠 **Reasoning** — supports optional step-by-step "thinking" mode that can be toggled | |
| on or off at request time. | |
| - 🔧 **Tool calling** — emits Hermes-style `<tool_call>` blocks and is compatible with | |
| the OpenAI-style function-calling interface and agent frameworks. | |
| - 📚 **Grounded answering** — trained to answer from provided documents and context, | |
| including longer inputs. | |
| - 🪶 **Small & deployable** — 0.6B parameters, runs comfortably on a single modest GPU. | |
| --- | |
| ## Languages | |
| | Language | Approx. share of instruction data | | |
| |----------|-----------------------------------| | |
| | Kazakh (kk) | ~56% | | |
| | English (en) | ~33% | | |
| | Russian (ru) | ~10% | | |
| --- | |
| ## Data coverage by domain | |
| The model was instruction-tuned on a broad, internally curated mixture. Described in | |
| general terms (no technical specifics), the approximate domain composition is: | |
| | Domain | Approx. share | | |
| |--------|---------------| | |
| | General instruction following & multi-turn conversation | ~45% | | |
| | Reasoning & step-by-step problem solving | ~27% | | |
| | Retrieval-grounded answering, long context & document Q&A | ~13% | | |
| | Tool use, function calling & agentic interaction | ~7% | | |
| | Knowledge, culture, news & encyclopedic content | ~4% | | |
| | Mathematics, language tasks (grammar / translation), safety & appropriate refusal, device & environment control, and assistant identity | ~4% | | |
| *Shares are approximate and reflect general domain proportions rather than exact figures.* | |
| --- | |
| ## Data provenance & acknowledgments | |
| The training datasets were **created internally by the author**, including original | |
| synthesis as well as additionally processed and enriched material. | |
| Approximately **5.4%** of all data used for instruction tuning was derived (with | |
| additional processing and enrichment) from resources of two organizations, whose | |
| contributions to the Kazakh language are gratefully acknowledged: | |
| 1. **Институт языкознания имени А. Байтурсынова** — *Institute of Linguistics named after A. Baitursynov* | |
| 2. **ННПЦ «Тіл-Қазына» имени Шайсултана Шаяхметова** — *Sh. Shayakhmetov National Research and Practical Center "Til-Qazyna"* | |
| --- | |
| ## Recommended sampling parameters | |
| A good starting point for general use: | |
| ```json | |
| { | |
| "temperature": 0.15, | |
| "top_p": 0.95, | |
| "max_tokens": 1024, | |
| "repetition_penalty": 1.05, | |
| "stream": true, | |
| "chat_template_kwargs": { | |
| "enable_thinking": true | |
| }, | |
| "continue_final_message": true | |
| } | |
| ``` | |
| Set `"enable_thinking": false` to get direct answers without an explicit reasoning step. | |
| Raise `temperature` for more creative / open-ended generation. | |
| --- | |
| ## Serving with vLLM | |
| Start an OpenAI-compatible server with tool-calling enabled: | |
| ```bash | |
| vllm serve nur-dev/farabi-0.6B \ | |
| --served-model-name farabi-0.6b \ | |
| --enable-auto-tool-choice \ | |
| --tool-call-parser hermes | |
| ``` | |
| Query it with the standard OpenAI client (and the recommended sampling params): | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY") | |
| resp = client.chat.completions.create( | |
| model="farabi-0.6b", | |
| messages=[ | |
| {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."}, | |
| {"role": "user", "content": "Алматы туралы қысқаша айтып бер."}, | |
| ], | |
| temperature=0.15, | |
| top_p=0.95, | |
| max_tokens=1024, | |
| extra_body={ | |
| "repetition_penalty": 1.05, | |
| "chat_template_kwargs": {"enable_thinking": True}, | |
| }, | |
| stream=True, | |
| ) | |
| for chunk in resp: | |
| delta = chunk.choices[0].delta.content | |
| if delta: | |
| print(delta, end="", flush=True) | |
| ``` | |
| Tool calling works through the standard `tools=[...]` argument — the model returns | |
| function calls that the server parses into structured `tool_calls`. | |
| --- | |
| ## Serving with PyTorch / Transformers | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "nur-dev/farabi-0.6B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| messages = [ | |
| {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."}, | |
| {"role": "user", "content": "Қазақстанның астанасы қай қала?"}, | |
| ] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| add_generation_prompt=True, | |
| enable_thinking=True, # set False for direct answers | |
| return_tensors="pt", | |
| ).to(model.device) | |
| outputs = model.generate( | |
| inputs, | |
| max_new_tokens=1024, | |
| do_sample=True, | |
| temperature=0.15, | |
| top_p=0.95, | |
| repetition_penalty=1.05, | |
| ) | |
| print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| --- | |
| ## Evaluation | |
| > ⚠️ **Interim results.** The numbers below were measured on an early checkpoint | |
| > (~17% through instruction tuning). They are expected to improve as training | |
| > continues, but already show meaningful capability. | |
| ### Tool / function calling — BFCL v4 | |
| Berkeley Function-Calling Leaderboard (v4), 1,040 cases, evaluated with the | |
| HuggingFace backend. | |
| | Category | Accuracy | n | What it measures | | |
| |----------|----------|---|------------------| | |
| | Simple | 80.5% | 322/400 | one call, one tool available | | |
| | Multiple | 71.5% | 143/200 | pick the right tool from several | | |
| | Parallel | 65.5% | 131/200 | several calls in one turn | | |
| | Irrelevance | 5.4% | 13/240 | abstain when no tool fits | | |
| | **Overall** | **58.6%** | 609/1040 | | | |
| | **Function-calling avg** | **74.5%** | 596/800 | excludes irrelevance | | |
| **Takeaways:** | |
| - **Strong calling ability for a 0.6B model.** When a call is warranted it is correct | |
| ~74.5% of the time — right tool, valid arguments, clean JSON — including 65.5% on the | |
| hard parallel / multi-call category. | |
| - **The weakness is abstention, not calling.** On queries that match no available tool, | |
| the model still tends to emit a call (irrelevance 5.4% → it over-triggers). This is the | |
| main driver of the lower overall score and the clearest area for improvement. | |
| ### Multilingual comprehension — 4-way multiple choice | |
| Multiple-choice comprehension across the model's three languages (random baseline = 25%), | |
| evaluated with the chat template and `enable_thinking=False`. | |
| | Language | Accuracy | | |
| |----------|----------| | |
| | English | 53.7% ±1.7 | | |
| | Russian | 50.0% ±1.7 | | |
| | Kazakh | 41.8% ±1.6 | | |
| **Takeaways:** | |
| - Well above the 25% random baseline in all three languages — real comprehension in | |
| English, Russian, and Kazakh. | |
| - Resource ordering (en > ru > kk) is as expected; Kazakh at 41.8% is clearly non-trivial. | |
| - Evaluating with the chat template and `enable_thinking=False` adds ~5–6 points per | |
| language versus a raw prompt — another reason to serve the model with its chat template | |
| (see serving instructions above). | |
| --- | |
| ## Intended use & limitations | |
| Farabi-0.6B is intended as a helpful general-purpose and agentic assistant, with a | |
| focus on Kazakh-language use cases. As a small model, it can make factual mistakes, | |
| and outputs should be verified for high-stakes or factual-critical applications. It | |
| should be used responsibly and in accordance with applicable laws and the base model's | |
| license. | |
| --- | |
| ## Citation | |
| If you use this model, please credit the author: | |
| > Nurgali Kadyrbek — Farabi-0.6B. | |
| > https://www.linkedin.com/in/nurgali-kadyrbek-504260231/ | |