File size: 6,213 Bytes
eda4955 ba72741 eda4955 ba72741 eda4955 ba72741 eda4955 ba72741 eda4955 ba72741 eda4955 ba72741 eda4955 ba72741 eda4955 ba72741 eda4955 ba72741 eda4955 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | ---
language:
- tr
- en
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen3.5-9B
tags:
- turkish
- instruct
- fine-tuned
- lora
- gguf
- llama-cpp
- text-generation
- conversational
- qwen3.5
pipeline_tag: text-generation
model-index:
- name: lale-9b-2603
results:
- task:
type: text-generation
name: Turkish Language Understanding
dataset:
name: terazi
type: custom
metrics:
- name: core
type: accuracy
value: 0.516
- name: tool
type: accuracy
value: 0.444
- name: fin
type: accuracy
value: 0.454
- name: legal
type: accuracy
value: 0.376
---
# lale-9b-2603
**lale** (Turkish for "tulip") is a Turkish instruction-following language model fine-tuned from [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B). It is designed to be the best Turkish language model at its size class, with strong performance in general knowledge, reasoning, tool use, grammar, finance, and legal domains.
## Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-9B |
| Method | LoRA SFT (r=32, alpha=32, bf16) |
| Training data | 118,355 Turkish instruction examples (~113M tokens) |
| Epochs | 3 |
| Final loss | 0.282 |
| Training time | ~120 hours on 1x RTX 4090 |
| Parameters | 9.5B total, 58M trainable (0.61%) |
## Available Formats
| Format | Size | Use case |
|---|---|---|
| `merged/` | 18 GB | Full bf16 for further fine-tuning or vLLM serving |
| `gguf/lale-9b-q8_0.gguf` | 8.9 GB | High quality inference with llama.cpp / Ollama |
| `gguf/lale-9b-q4_k_m.gguf` | 5.3 GB | Fast inference on consumer hardware |
| `adapter/` | 242 MB | LoRA adapter to apply on base Qwen3.5-9B |
## Training Data
The training data consists of 118,355 synthetic Turkish instruction-response pairs generated using Claude Opus 4.6 and Claude Sonnet 4.6 via AWS Bedrock, across 21 categories in 3 rounds:
**Round 1 (Sonnet, 61.6K examples):** general, reasoning, tool_use, tool_use_advanced, finance, legal, code, translation
**Round 2 (Opus, 37.1K examples):** math, math_cot, multi_turn, tool_use_mcp, distill_reasoning, conversation_persona, reasoning_v2, code_v2
**Round 3 (Opus+Sonnet, 19.7K examples):** multi_step_tool, grammar_drill, error_recovery, legal_terms, translation_pro
All data was filtered for format validity, length bounds, exact deduplication, and tool-use message normalization.
## Benchmark Results (terazi)
Evaluated using the [terazi](https://github.com/selimozten/terazi) Turkish language model benchmark suite.
### lale-9b-2602 vs lale-9b-2603
| Category | 2602 (98K data) | 2603 (118K data) | Change |
|---|---|---|---|
| **core** | 0.511 | **0.516** | +1.0% |
| common_sense | 0.970 | **0.980** | +1.0% |
| reading_comp | 0.535 | 0.512 | -4.3% |
| grammar | 0.288 | **0.337** | **+17.0%** |
| translation | 0.342 | 0.333 | -2.6% |
| summarization | 0.421 | 0.417 | -1.0% |
| **tool** | 0.411 | **0.444** | **+8.0%** |
| api_call | 0.557 | **0.586** | +5.2% |
| multi_step | 0.075 | **0.168** | **+124%** |
| param_extraction | 0.506 | 0.482 | -4.7% |
| error_recovery | 0.229 | 0.215 | -6.1% |
| **fin** | 0.492 | 0.454 | -7.7% |
| sentiment | 0.744 | 0.592 | -20.4% |
| numerical_reasoning | 0.524 | **0.557** | +6.3% |
| term_understanding | 0.226 | **0.252** | +11.5% |
| **legal** | n/a | **0.376** | new |
### Key Improvements
- **multi_step tool use: +124%** -- from targeted R3 multi_step_tool training data
- **grammar: +17%** -- from R3 grammar_drill exercises (vowel harmony, suffix ordering, conjugation)
- **tool use overall: +8%** -- from additional tool_use_mcp and multi_step_tool categories
- **numerical_reasoning: +6.3%** -- from math and math_cot data
- **term_understanding: +11.5%** -- from legal_terms and fin_analysis data
## Usage
### With llama.cpp
```bash
llama-server -m lale-9b-q8_0.gguf -ngl 99 --reasoning-budget 0 -c 4096
```
Note: `--reasoning-budget 0` disables Qwen3.5's thinking mode, which puts output in `reasoning_content` instead of `content`.
### With Ollama
Create a Modelfile:
```
FROM ./lale-9b-q8_0.gguf
PARAMETER num_ctx 4096
```
```bash
ollama create lale -f Modelfile
ollama run lale
```
### With transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"comarproject/lale-9b-2603",
subfolder="merged",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
"comarproject/lale-9b-2603",
subfolder="merged",
)
messages = [{"role": "user", "content": "Turkiye'nin baskenti neresidir?"}]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Technical Notes
- Qwen3.5-9B is a unified VLM (vision-language model) with Mamba/hybrid layers. We train only the language components.
- Training data includes normalized tool-use formats: `tool_call`/`tool_result` roles are remapped to standard `assistant`/`tool`, and `content: null` is allowed for OpenAI-style function calling messages.
- LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Optimizer: AdamW 8-bit, cosine LR schedule, warmup 10%
- Sample packing enabled (required patching Unsloth's VLM detection for Qwen3.5)
## Limitations
- Trained primarily on synthetic data from Claude models; may reflect Claude's style and biases
- Context window limited to 2048 tokens during training (base model supports 128K)
- Sentiment analysis regressed from 2602 (-20%) -- may need targeted data for this subcategory
- Some long legal/financial prompts may exceed the trained context length
## License
Apache 2.0
## Citation
```bibtex
@misc{lale-9b-2603,
title={lale-9b-2603: Turkish Instruction Model Distilled from Frontier Models},
author={Selim Ozten},
year={2026},
url={https://huggingface.co/comarproject/lale-9b-2603}
}
```
|