farabi-1.7b / README.md
nur-dev's picture
Update benchmarks
4b96944 verified
metadata
language:
  - kk
  - ru
  - en
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
  - kazakh
  - multilingual
  - instruction-tuned
  - tool-calling
  - sft
gated: auto

Farabi-1.7B

What a model can DO matters more than what it knows. Knowledge expires; skills endure.

Farabi-1.7B is a multilingual instruction-following model fine-tuned for Kazakh, Russian, and English. It is optimised for reasoning, instruction adherence, and structured tool calling.

Property Value
Parameters 1.7B
Languages Kazakh (KK) · Russian (RU) · English (EN)
License Apache-2.0
Architecture Qwen3-1.7B (transformer, GQA)

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "nur-dev/farabi-1.7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "user", "content": "Қазақстан туралы қысқаша айтып бер."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Evaluation

Scores are accuracy (%) on held-out test splits. Qolda-4.3B is included for scale reference (it has 2.5× more parameters).

Kazakh

Benchmark Samples Qwen3-1.7B Farabi-1.7B Qolda-4.3B
KazMMLU 22,889 41.0% 43.8% (+2.7) 47.1%
Belebele KK 900 41.7% 54.0% (+12.3) 80.9%
UNT 14,849 31.3% 37.6% (+6.3) 39.9%
Dastur 1,004 82.1% 88.8% (+6.7) 93.1%

Russian

Benchmark Samples Qwen3-1.7B Farabi-1.7B Qolda-4.3B
ruMMLU 14,012 44.8% 45.6% (+0.8) 58.7%
Belebele RU 900 69.9% 71.2% (+1.3) 89.4%

English (controls)

Benchmark Samples Qwen3-1.7B Farabi-1.7B Qolda-4.3B
MMLU-Pro EN 12,032 29.2% 28.5% (−0.7) 20.7%
ARC-Challenge 1,172 73.6% 71.4% (−2.2) 91.6%
Belebele EN 900 76.9% 76.6% (−0.3) 92.7%

Small English regression is expected: the model was fine-tuned primarily on KK/RU data.

Tool Calling

Test Qwen3-1.7B Farabi-1.7B Qolda-4.3B
Weather lookup (KK) MISS OK MISS
Currency conversion (KK) MISS OK MISS
Search + calculator (EN) MISS MISS MISS
No tool needed (KK) OK OK OK
Translation tool (RU) MISS OK MISS
Accuracy 20% 80% 20%

Benchmark Chart

Benchmark Chart


Base Model Note

Farabi-1.7B is built on Qwen3-1.7B, which is itself an instruction-tuned model (not a raw pretrained base). All capability improvements are measured relative to that already-capable starting point.


Acknowledgements

We thank the Qwen team at Alibaba Cloud for releasing Qwen3-1.7B under the Apache-2.0 license, which made this work possible.