Instructions to use nur-dev/farabi-1.7b-agent-rag with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nur-dev/farabi-1.7b-agent-rag with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nur-dev/farabi-1.7b-agent-rag") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nur-dev/farabi-1.7b-agent-rag") model = AutoModelForCausalLM.from_pretrained("nur-dev/farabi-1.7b-agent-rag") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - HERMES
How to use nur-dev/farabi-1.7b-agent-rag with HERMES:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nur-dev/farabi-1.7b-agent-rag with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nur-dev/farabi-1.7b-agent-rag" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-1.7b-agent-rag", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nur-dev/farabi-1.7b-agent-rag
- SGLang
How to use nur-dev/farabi-1.7b-agent-rag with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nur-dev/farabi-1.7b-agent-rag" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-1.7b-agent-rag", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nur-dev/farabi-1.7b-agent-rag" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nur-dev/farabi-1.7b-agent-rag", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nur-dev/farabi-1.7b-agent-rag with Docker Model Runner:
docker model run hf.co/nur-dev/farabi-1.7b-agent-rag
Farabi-1.7B-agent-rag
A 1.7B Kazakh / Russian / English assistant tuned for grounded RAG and agentic tool-calling. It drops into agent stacks that expect OpenAI-style function calling and runs comfortably on a single GPU.
Built on Qwen3-1.7B and adapted for Kazakh, Russian, and English.
Capabilities
- Grounded RAG. Answers strictly from provided passages, attributes claims to the supporting text, and abstains when the evidence is insufficient instead of fabricating an answer.
- Tool-calling (Hermes / OpenAI function calling). Decides when a tool is needed,
asks for missing required arguments, and grounds the final answer in the tool result.
- Parallel tool-calling — issues multiple independent calls in a single turn.
- Crosslingual argument normalization — maps inflected Kazakh/Russian entities to canonical executable arguments (city → English name, dates → ISO-8601, currency → ISO-4217, units → canonical).
- Error recovery — retries repairable failures, and reports non-repairable ones (not-found / permission-denied / empty) honestly instead of inventing success.
- Prompt-injection resistance. Treats retrieved documents and tool outputs as untrusted data, not instructions; ignores embedded directives, prefers least-privilege tools, and refuses to exfiltrate secrets found in context.
- Text workbench. Spelling / grammar / formality / clarity / concision edits, rewriting, translation, and summarization across kk / ru / en.
- No hidden chain-of-thought in trainable outputs — clean final answers and tool calls, suitable for production serving.
Benchmarks
Agentic & RAG capabilities (held-out probe)
| Capability | Score |
|---|---|
| Prompt-injection resistance (overall) | 96% |
| • instruction-in-retrieved-chunk | 100% |
| • tool-output injection | 100% |
| • least-privilege tool use | 100% |
| • secret / data-exfiltration refusal | 82% |
| Parallel tool-calling | 94% |
| Crosslingual argument normalization | 91% |
| Text editing / workbench | 86% |
Note: secret-exfiltration refusal (82%) is the model's weakest safety dimension — for credential-bearing contexts, pair the model with an output filter.
Academic (ISSAI Kazakh/Russian QOLDA suite, n=250/bench; RAGBench = chrF)
Accuracy (%), compared with same-size and larger models for context. AVG is the mean of the 10 accuracy benchmarks.
| Model | Size | ARC-kk | ARC-ru | MMLU-kk | GPQA-kk | GPQA-ru | GSM8k-kk | GSM8k-ru | PolyMath-kk | MMLU-Pro-kk | MMLU-Pro-ru | RAGBench (chrF) | AVG |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Farabi-1.7B-agent-rag | 1.7B | 58.8 | 74.4 | 35.2 | 28.8 | 24.0 | 32.4 | 50.4 | 14.0 | 14.8 | 22.4 | 25.4 | 35.5 |
| ISSAI foggen-1.7B | 1.7B | 45.6 | 77.6 | 31.6 | 31.2 | 22.8 | 35.2 | 68.4 | 20.0 | 11.6 | 24.0 | 33.5 | 36.8 |
| Qwen3-1.7B | 1.7B | 47.6 | 78.4 | 31.6 | 26.4 | 14.4 | 40.4 | 72.8 | 14.4 | 12.8 | 14.4 | 36.0 | 35.3 |
| ISSAI Sherkala-8B-Chat | 8B | 74.8 | 78.4 | 47.6 | 30.0 | 25.6 | 68.8 | 80.0 | 20.4 | 20.4 | 22.4 | 41.0 | 46.8 |
Farabi-1.7B is competitive with the same-size ISSAI foggen-1.7B and Qwen3-1.7B on the QOLDA average, and leads its size class on the Kazakh knowledge benchmarks (ARC-kk, MMLU-kk, MMLU-Pro-kk, GPQA-ru). Sherkala-8B is shown as a larger-model reference point.
Translation (FLORES-200, BLEU)
| Direction | BLEU |
|---|---|
| ru → en | 24.4 |
| en → ru | 18.5 |
| kk → en | 17.3 |
| kk → ru | 8.3 |
| en → kk | 8.2 |
| ru → kk | 7.7 |
Serving
Works with vLLM's OpenAI-compatible server using the Hermes tool-call parser:
vllm serve nur-dev/farabi-1.7b-agent-rag \
--chat-template chat_template.jinja \
--enable-auto-tool-choice --tool-call-parser hermes
Then call it with the OpenAI SDK (and the OpenAI Agents SDK):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
resp = client.chat.completions.create(
model="nur-dev/farabi-1.7b-agent-rag",
messages=[{"role": "user", "content": "Бүгін Алматыда ауа райы қандай?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Current weather for a city.",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string", "description": "Canonical English city name."}},
"required": ["city"],
},
},
}],
tool_choice="auto",
)
print(resp.choices[0].message.tool_calls)
Languages
Kazakh (kk), Russian (ru), English (en).
License
CC BY-NC 4.0 — non-commercial use only. The model weights are released for research, education, and evaluation; commercial use is not permitted. Built on Qwen3-1.7B (Apache-2.0); the base-model components remain under their original Apache-2.0 terms.
- Downloads last month
- 151