farabi-1.7b / README.md

nur-dev

Update benchmarks

4b96944 verified about 1 month ago

preview code

raw

history blame contribute delete

3.29 kB

metadata

language:
  - kk
  - ru
  - en
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
  - kazakh
  - multilingual
  - instruction-tuned
  - tool-calling
  - sft
gated: auto

Farabi-1.7B

What a model can DO matters more than what it knows. Knowledge expires; skills endure.

Farabi-1.7B is a multilingual instruction-following model fine-tuned for Kazakh, Russian, and English. It is optimised for reasoning, instruction adherence, and structured tool calling.

Property	Value
Parameters	1.7B
Languages	Kazakh (KK) · Russian (RU) · English (EN)
License	Apache-2.0
Architecture	Qwen3-1.7B (transformer, GQA)

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "nur-dev/farabi-1.7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "user", "content": "Қазақстан туралы қысқаша айтып бер."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Evaluation

Scores are accuracy (%) on held-out test splits. Qolda-4.3B is included for scale reference (it has 2.5× more parameters).

Kazakh

Benchmark	Samples	Qwen3-1.7B	Farabi-1.7B	Qolda-4.3B
KazMMLU	22,889	41.0%	43.8% (+2.7)	47.1%
Belebele KK	900	41.7%	54.0% (+12.3)	80.9%
UNT	14,849	31.3%	37.6% (+6.3)	39.9%
Dastur	1,004	82.1%	88.8% (+6.7)	93.1%

Russian

Benchmark	Samples	Qwen3-1.7B	Farabi-1.7B	Qolda-4.3B
ruMMLU	14,012	44.8%	45.6% (+0.8)	58.7%
Belebele RU	900	69.9%	71.2% (+1.3)	89.4%

English (controls)

Benchmark	Samples	Qwen3-1.7B	Farabi-1.7B	Qolda-4.3B
MMLU-Pro EN	12,032	29.2%	28.5% (−0.7)	20.7%
ARC-Challenge	1,172	73.6%	71.4% (−2.2)	91.6%
Belebele EN	900	76.9%	76.6% (−0.3)	92.7%

Small English regression is expected: the model was fine-tuned primarily on KK/RU data.

Tool Calling

Test	Qwen3-1.7B	Farabi-1.7B	Qolda-4.3B
Weather lookup (KK)	MISS	OK	MISS
Currency conversion (KK)	MISS	OK	MISS
Search + calculator (EN)	MISS	MISS	MISS
No tool needed (KK)	OK	OK	OK
Translation tool (RU)	MISS	OK	MISS
Accuracy	20%	80%	20%

Benchmark Chart

Base Model Note

Farabi-1.7B is built on Qwen3-1.7B, which is itself an instruction-tuned model (not a raw pretrained base). All capability improvements are measured relative to that already-capable starting point.

Acknowledgements

We thank the Qwen team at Alibaba Cloud for releasing Qwen3-1.7B under the Apache-2.0 license, which made this work possible.