File size: 6,213 Bytes
eda4955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba72741
eda4955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba72741
eda4955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba72741
eda4955
ba72741
eda4955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba72741
eda4955
 
 
 
 
ba72741
eda4955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba72741
eda4955
 
 
 
 
 
 
 
 
ba72741
 
eda4955
 
ba72741
eda4955
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
language:
  - tr
  - en
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen3.5-9B
tags:
  - turkish
  - instruct
  - fine-tuned
  - lora
  - gguf
  - llama-cpp
  - text-generation
  - conversational
  - qwen3.5
pipeline_tag: text-generation
model-index:
  - name: lale-9b-2603
    results:
      - task:
          type: text-generation
          name: Turkish Language Understanding
        dataset:
          name: terazi
          type: custom
        metrics:
          - name: core
            type: accuracy
            value: 0.516
          - name: tool
            type: accuracy
            value: 0.444
          - name: fin
            type: accuracy
            value: 0.454
          - name: legal
            type: accuracy
            value: 0.376
---

# lale-9b-2603

**lale** (Turkish for "tulip") is a Turkish instruction-following language model fine-tuned from [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B). It is designed to be the best Turkish language model at its size class, with strong performance in general knowledge, reasoning, tool use, grammar, finance, and legal domains.

## Model Details

| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-9B |
| Method | LoRA SFT (r=32, alpha=32, bf16) |
| Training data | 118,355 Turkish instruction examples (~113M tokens) |
| Epochs | 3 |
| Final loss | 0.282 |
| Training time | ~120 hours on 1x RTX 4090 |
| Parameters | 9.5B total, 58M trainable (0.61%) |

## Available Formats

| Format | Size | Use case |
|---|---|---|
| `merged/` | 18 GB | Full bf16 for further fine-tuning or vLLM serving |
| `gguf/lale-9b-q8_0.gguf` | 8.9 GB | High quality inference with llama.cpp / Ollama |
| `gguf/lale-9b-q4_k_m.gguf` | 5.3 GB | Fast inference on consumer hardware |
| `adapter/` | 242 MB | LoRA adapter to apply on base Qwen3.5-9B |

## Training Data

The training data consists of 118,355 synthetic Turkish instruction-response pairs generated using Claude Opus 4.6 and Claude Sonnet 4.6 via AWS Bedrock, across 21 categories in 3 rounds:

**Round 1 (Sonnet, 61.6K examples):** general, reasoning, tool_use, tool_use_advanced, finance, legal, code, translation

**Round 2 (Opus, 37.1K examples):** math, math_cot, multi_turn, tool_use_mcp, distill_reasoning, conversation_persona, reasoning_v2, code_v2

**Round 3 (Opus+Sonnet, 19.7K examples):** multi_step_tool, grammar_drill, error_recovery, legal_terms, translation_pro

All data was filtered for format validity, length bounds, exact deduplication, and tool-use message normalization.

## Benchmark Results (terazi)

Evaluated using the [terazi](https://github.com/selimozten/terazi) Turkish language model benchmark suite.

### lale-9b-2602 vs lale-9b-2603

| Category | 2602 (98K data) | 2603 (118K data) | Change |
|---|---|---|---|
| **core** | 0.511 | **0.516** | +1.0% |
| common_sense | 0.970 | **0.980** | +1.0% |
| reading_comp | 0.535 | 0.512 | -4.3% |
| grammar | 0.288 | **0.337** | **+17.0%** |
| translation | 0.342 | 0.333 | -2.6% |
| summarization | 0.421 | 0.417 | -1.0% |
| **tool** | 0.411 | **0.444** | **+8.0%** |
| api_call | 0.557 | **0.586** | +5.2% |
| multi_step | 0.075 | **0.168** | **+124%** |
| param_extraction | 0.506 | 0.482 | -4.7% |
| error_recovery | 0.229 | 0.215 | -6.1% |
| **fin** | 0.492 | 0.454 | -7.7% |
| sentiment | 0.744 | 0.592 | -20.4% |
| numerical_reasoning | 0.524 | **0.557** | +6.3% |
| term_understanding | 0.226 | **0.252** | +11.5% |
| **legal** | n/a | **0.376** | new |

### Key Improvements
- **multi_step tool use: +124%** -- from targeted R3 multi_step_tool training data
- **grammar: +17%** -- from R3 grammar_drill exercises (vowel harmony, suffix ordering, conjugation)
- **tool use overall: +8%** -- from additional tool_use_mcp and multi_step_tool categories
- **numerical_reasoning: +6.3%** -- from math and math_cot data
- **term_understanding: +11.5%** -- from legal_terms and fin_analysis data

## Usage

### With llama.cpp

```bash
llama-server -m lale-9b-q8_0.gguf -ngl 99 --reasoning-budget 0 -c 4096
```

Note: `--reasoning-budget 0` disables Qwen3.5's thinking mode, which puts output in `reasoning_content` instead of `content`.

### With Ollama

Create a Modelfile:
```
FROM ./lale-9b-q8_0.gguf
PARAMETER num_ctx 4096
```

```bash
ollama create lale -f Modelfile
ollama run lale
```

### With transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
)

messages = [{"role": "user", "content": "Turkiye'nin baskenti neresidir?"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Technical Notes

- Qwen3.5-9B is a unified VLM (vision-language model) with Mamba/hybrid layers. We train only the language components.
- Training data includes normalized tool-use formats: `tool_call`/`tool_result` roles are remapped to standard `assistant`/`tool`, and `content: null` is allowed for OpenAI-style function calling messages.
- LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Optimizer: AdamW 8-bit, cosine LR schedule, warmup 10%
- Sample packing enabled (required patching Unsloth's VLM detection for Qwen3.5)

## Limitations

- Trained primarily on synthetic data from Claude models; may reflect Claude's style and biases
- Context window limited to 2048 tokens during training (base model supports 128K)
- Sentiment analysis regressed from 2602 (-20%) -- may need targeted data for this subcategory
- Some long legal/financial prompts may exceed the trained context length

## License

Apache 2.0

## Citation

```bibtex
@misc{lale-9b-2603,
  title={lale-9b-2603: Turkish Instruction Model Distilled from Frontier Models},
  author={Selim Ozten},
  year={2026},
  url={https://huggingface.co/comarproject/lale-9b-2603}
}
```