README.md · comarproject/lale-9b-2603 at main

lale-9b-2603 / README.md

comarproject

Add model card with training details and benchmarks

ba72741 verified 25 days ago

preview code

raw

history blame contribute delete

6.21 kB

	---
	language:
	- tr
	- en
	license: apache-2.0
	library_name: transformers
	base_model: Qwen/Qwen3.5-9B
	tags:
	- turkish
	- instruct
	- fine-tuned
	- lora
	- gguf
	- llama-cpp
	- text-generation
	- conversational
	- qwen3.5
	pipeline_tag: text-generation
	model-index:
	- name: lale-9b-2603
	results:
	- task:
	type: text-generation
	name: Turkish Language Understanding
	dataset:
	name: terazi
	type: custom
	metrics:
	- name: core
	type: accuracy
	value: 0.516
	- name: tool
	type: accuracy
	value: 0.444
	- name: fin
	type: accuracy
	value: 0.454
	- name: legal
	type: accuracy
	value: 0.376
	---

	# lale-9b-2603

	lale (Turkish for "tulip") is a Turkish instruction-following language model fine-tuned from [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B). It is designed to be the best Turkish language model at its size class, with strong performance in general knowledge, reasoning, tool use, grammar, finance, and legal domains.

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| Qwen/Qwen3.5-9B \|
	\| Method \| LoRA SFT (r=32, alpha=32, bf16) \|
	\| Training data \| 118,355 Turkish instruction examples (~113M tokens) \|
	\| Epochs \| 3 \|
	\| Final loss \| 0.282 \|
	\| Training time \| ~120 hours on 1x RTX 4090 \|
	\| Parameters \| 9.5B total, 58M trainable (0.61%) \|

	## Available Formats

	\| Format \| Size \| Use case \|
	\|---\|---\|---\|
	\| `merged/` \| 18 GB \| Full bf16 for further fine-tuning or vLLM serving \|
	\| `gguf/lale-9b-q8_0.gguf` \| 8.9 GB \| High quality inference with llama.cpp / Ollama \|
	\| `gguf/lale-9b-q4_k_m.gguf` \| 5.3 GB \| Fast inference on consumer hardware \|
	\| `adapter/` \| 242 MB \| LoRA adapter to apply on base Qwen3.5-9B \|

	## Training Data

	The training data consists of 118,355 synthetic Turkish instruction-response pairs generated using Claude Opus 4.6 and Claude Sonnet 4.6 via AWS Bedrock, across 21 categories in 3 rounds:

	Round 1 (Sonnet, 61.6K examples): general, reasoning, tool_use, tool_use_advanced, finance, legal, code, translation

	Round 2 (Opus, 37.1K examples): math, math_cot, multi_turn, tool_use_mcp, distill_reasoning, conversation_persona, reasoning_v2, code_v2

	Round 3 (Opus+Sonnet, 19.7K examples): multi_step_tool, grammar_drill, error_recovery, legal_terms, translation_pro

	All data was filtered for format validity, length bounds, exact deduplication, and tool-use message normalization.

	## Benchmark Results (terazi)

	Evaluated using the [terazi](https://github.com/selimozten/terazi) Turkish language model benchmark suite.

	### lale-9b-2602 vs lale-9b-2603

	\| Category \| 2602 (98K data) \| 2603 (118K data) \| Change \|
	\|---\|---\|---\|---\|
	\| core \| 0.511 \| 0.516 \| +1.0% \|
	\| common_sense \| 0.970 \| 0.980 \| +1.0% \|
	\| reading_comp \| 0.535 \| 0.512 \| -4.3% \|
	\| grammar \| 0.288 \| 0.337 \| +17.0% \|
	\| translation \| 0.342 \| 0.333 \| -2.6% \|
	\| summarization \| 0.421 \| 0.417 \| -1.0% \|
	\| tool \| 0.411 \| 0.444 \| +8.0% \|
	\| api_call \| 0.557 \| 0.586 \| +5.2% \|
	\| multi_step \| 0.075 \| 0.168 \| +124% \|
	\| param_extraction \| 0.506 \| 0.482 \| -4.7% \|
	\| error_recovery \| 0.229 \| 0.215 \| -6.1% \|
	\| fin \| 0.492 \| 0.454 \| -7.7% \|
	\| sentiment \| 0.744 \| 0.592 \| -20.4% \|
	\| numerical_reasoning \| 0.524 \| 0.557 \| +6.3% \|
	\| term_understanding \| 0.226 \| 0.252 \| +11.5% \|
	\| legal \| n/a \| 0.376 \| new \|

	### Key Improvements
	- multi_step tool use: +124% -- from targeted R3 multi_step_tool training data
	- grammar: +17% -- from R3 grammar_drill exercises (vowel harmony, suffix ordering, conjugation)
	- tool use overall: +8% -- from additional tool_use_mcp and multi_step_tool categories
	- numerical_reasoning: +6.3% -- from math and math_cot data
	- term_understanding: +11.5% -- from legal_terms and fin_analysis data

	## Usage

	### With llama.cpp

	```bash
	llama-server -m lale-9b-q8_0.gguf -ngl 99 --reasoning-budget 0 -c 4096
	```

	Note: `--reasoning-budget 0` disables Qwen3.5's thinking mode, which puts output in `reasoning_content` instead of `content`.

	### With Ollama

	Create a Modelfile:
	```
	FROM ./lale-9b-q8_0.gguf
	PARAMETER num_ctx 4096
	```

	```bash
	ollama create lale -f Modelfile
	ollama run lale
	```

	### With transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"comarproject/lale-9b-2603",
	subfolder="merged",
	torch_dtype="bfloat16",
	device_map="auto",
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"comarproject/lale-9b-2603",
	subfolder="merged",
	)

	messages = [{"role": "user", "content": "Turkiye'nin baskenti neresidir?"}]
	text = tokenizer.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Technical Notes

	- Qwen3.5-9B is a unified VLM (vision-language model) with Mamba/hybrid layers. We train only the language components.
	- Training data includes normalized tool-use formats: `tool_call`/`tool_result` roles are remapped to standard `assistant`/`tool`, and `content: null` is allowed for OpenAI-style function calling messages.
	- LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Optimizer: AdamW 8-bit, cosine LR schedule, warmup 10%
	- Sample packing enabled (required patching Unsloth's VLM detection for Qwen3.5)

	## Limitations

	- Trained primarily on synthetic data from Claude models; may reflect Claude's style and biases
	- Context window limited to 2048 tokens during training (base model supports 128K)
	- Sentiment analysis regressed from 2602 (-20%) -- may need targeted data for this subcategory
	- Some long legal/financial prompts may exceed the trained context length

	## License

	Apache 2.0

	## Citation

	```bibtex
	@misc{lale-9b-2603,
	title={lale-9b-2603: Turkish Instruction Model Distilled from Frontier Models},
	author={Selim Ozten},
	year={2026},
	url={https://huggingface.co/comarproject/lale-9b-2603}
	}
	```