farabi-1.7b / README.md
nur-dev's picture
Update benchmarks
4b96944 verified
---
language:
- kk
- ru
- en
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
- kazakh
- multilingual
- instruction-tuned
- tool-calling
- sft
gated: auto
---
# Farabi-1.7B
> **What a model can DO matters more than what it knows. Knowledge expires; skills endure.**
Farabi-1.7B is a multilingual instruction-following model fine-tuned for Kazakh, Russian, and English.
It is optimised for reasoning, instruction adherence, and structured tool calling.
| Property | Value |
|---|---|
| Parameters | 1.7B |
| Languages | Kazakh (KK) · Russian (RU) · English (EN) |
| License | Apache-2.0 |
| Architecture | Qwen3-1.7B (transformer, GQA) |
---
## Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "nur-dev/farabi-1.7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
messages = [
{"role": "user", "content": "Қазақстан туралы қысқаша айтып бер."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
---
## Evaluation
Scores are accuracy (%) on held-out test splits.
**Qolda-4.3B** is included for scale reference (it has 2.5× more parameters).
### Kazakh
| Benchmark | Samples | Qwen3-1.7B | **Farabi-1.7B** | Qolda-4.3B |
|---|---|---|---|---|
| KazMMLU | 22,889 | 41.0% | **43.8%** (+2.7) | 47.1% |
| Belebele KK | 900 | 41.7% | **54.0%** (+12.3) | 80.9% |
| UNT | 14,849 | 31.3% | **37.6%** (+6.3) | 39.9% |
| Dastur | 1,004 | 82.1% | **88.8%** (+6.7) | 93.1% |
### Russian
| Benchmark | Samples | Qwen3-1.7B | **Farabi-1.7B** | Qolda-4.3B |
|---|---|---|---|---|
| ruMMLU | 14,012 | 44.8% | **45.6%** (+0.8) | 58.7% |
| Belebele RU | 900 | 69.9% | **71.2%** (+1.3) | 89.4% |
### English (controls)
| Benchmark | Samples | Qwen3-1.7B | **Farabi-1.7B** | Qolda-4.3B |
|---|---|---|---|---|
| MMLU-Pro EN | 12,032 | **29.2%** | 28.5% (−0.7) | 20.7% |
| ARC-Challenge | 1,172 | **73.6%** | 71.4% (−2.2) | 91.6% |
| Belebele EN | 900 | **76.9%** | 76.6% (−0.3) | 92.7% |
Small English regression is expected: the model was fine-tuned primarily on KK/RU data.
### Tool Calling
| Test | Qwen3-1.7B | **Farabi-1.7B** | Qolda-4.3B |
|---|---|---|---|
| Weather lookup (KK) | MISS | **OK** | MISS |
| Currency conversion (KK) | MISS | **OK** | MISS |
| Search + calculator (EN) | MISS | MISS | MISS |
| No tool needed (KK) | OK | **OK** | OK |
| Translation tool (RU) | MISS | **OK** | MISS |
| **Accuracy** | 20% | **80%** | 20% |
### Benchmark Chart
![Benchmark Chart](benchmark_chart.png)
---
## Base Model Note
Farabi-1.7B is built on [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B), which is itself an
instruction-tuned model (not a raw pretrained base). All capability improvements are measured
relative to that already-capable starting point.
---
## Acknowledgements
We thank the Qwen team at Alibaba Cloud for releasing Qwen3-1.7B under the Apache-2.0 license,
which made this work possible.