nur-dev
/

farabi-1.7b

instruction-tuned

Model card Files Files and versions

farabi-1.7b / README.md

nur-dev's picture

Update benchmarks

4b96944 verified about 1 month ago

|

history blame contribute delete

3.29 kB

	---
	language:
	- kk
	- ru
	- en
	license: apache-2.0
	base_model: Qwen/Qwen3-1.7B
	tags:
	- kazakh
	- multilingual
	- instruction-tuned
	- tool-calling
	- sft
	gated: auto
	---

	# Farabi-1.7B

	> What a model can DO matters more than what it knows. Knowledge expires; skills endure.

	Farabi-1.7B is a multilingual instruction-following model fine-tuned for Kazakh, Russian, and English.
	It is optimised for reasoning, instruction adherence, and structured tool calling.

	\| Property \| Value \|
	\|---\|---\|
	\| Parameters \| 1.7B \|
	\| Languages \| Kazakh (KK) · Russian (RU) · English (EN) \|
	\| License \| Apache-2.0 \|
	\| Architecture \| Qwen3-1.7B (transformer, GQA) \|

	---

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "nur-dev/farabi-1.7b"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

	messages = [
	{"role": "user", "content": "Қазақстан туралы қысқаша айтып бер."}
	]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
	```

	---

	## Evaluation

	Scores are accuracy (%) on held-out test splits.
	Qolda-4.3B is included for scale reference (it has 2.5× more parameters).

	### Kazakh

	\| Benchmark \| Samples \| Qwen3-1.7B \| Farabi-1.7B \| Qolda-4.3B \|
	\|---\|---\|---\|---\|---\|
	\| KazMMLU \| 22,889 \| 41.0% \| 43.8% (+2.7) \| 47.1% \|
	\| Belebele KK \| 900 \| 41.7% \| 54.0% (+12.3) \| 80.9% \|
	\| UNT \| 14,849 \| 31.3% \| 37.6% (+6.3) \| 39.9% \|
	\| Dastur \| 1,004 \| 82.1% \| 88.8% (+6.7) \| 93.1% \|

	### Russian

	\| Benchmark \| Samples \| Qwen3-1.7B \| Farabi-1.7B \| Qolda-4.3B \|
	\|---\|---\|---\|---\|---\|
	\| ruMMLU \| 14,012 \| 44.8% \| 45.6% (+0.8) \| 58.7% \|
	\| Belebele RU \| 900 \| 69.9% \| 71.2% (+1.3) \| 89.4% \|

	### English (controls)

	\| Benchmark \| Samples \| Qwen3-1.7B \| Farabi-1.7B \| Qolda-4.3B \|
	\|---\|---\|---\|---\|---\|
	\| MMLU-Pro EN \| 12,032 \| 29.2% \| 28.5% (−0.7) \| 20.7% \|
	\| ARC-Challenge \| 1,172 \| 73.6% \| 71.4% (−2.2) \| 91.6% \|
	\| Belebele EN \| 900 \| 76.9% \| 76.6% (−0.3) \| 92.7% \|

	Small English regression is expected: the model was fine-tuned primarily on KK/RU data.

	### Tool Calling

	\| Test \| Qwen3-1.7B \| Farabi-1.7B \| Qolda-4.3B \|
	\|---\|---\|---\|---\|
	\| Weather lookup (KK) \| MISS \| OK \| MISS \|
	\| Currency conversion (KK) \| MISS \| OK \| MISS \|
	\| Search + calculator (EN) \| MISS \| MISS \| MISS \|
	\| No tool needed (KK) \| OK \| OK \| OK \|
	\| Translation tool (RU) \| MISS \| OK \| MISS \|
	\| Accuracy \| 20% \| 80% \| 20% \|

	### Benchmark Chart

	![Benchmark Chart](benchmark_chart.png)

	---

	## Base Model Note

	Farabi-1.7B is built on [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B), which is itself an
	instruction-tuned model (not a raw pretrained base). All capability improvements are measured
	relative to that already-capable starting point.

	---

	## Acknowledgements

	We thank the Qwen team at Alibaba Cloud for releasing Qwen3-1.7B under the Apache-2.0 license,
	which made this work possible.