Instructions to use comarproject/lale-9b-2603 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use comarproject/lale-9b-2603 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="comarproject/lale-9b-2603")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("comarproject/lale-9b-2603", dtype="auto")

llama-cpp-python

How to use comarproject/lale-9b-2603 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="comarproject/lale-9b-2603",
	filename="gguf/lale-9b-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use comarproject/lale-9b-2603 with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf comarproject/lale-9b-2603:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf comarproject/lale-9b-2603:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf comarproject/lale-9b-2603:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf comarproject/lale-9b-2603:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf comarproject/lale-9b-2603:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf comarproject/lale-9b-2603:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf comarproject/lale-9b-2603:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf comarproject/lale-9b-2603:Q4_K_M

Use Docker

docker model run hf.co/comarproject/lale-9b-2603:Q4_K_M

LM Studio
Jan

vLLM

How to use comarproject/lale-9b-2603 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "comarproject/lale-9b-2603"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "comarproject/lale-9b-2603",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/comarproject/lale-9b-2603:Q4_K_M

SGLang

How to use comarproject/lale-9b-2603 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "comarproject/lale-9b-2603" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "comarproject/lale-9b-2603",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "comarproject/lale-9b-2603" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "comarproject/lale-9b-2603",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use comarproject/lale-9b-2603 with Ollama:
```
ollama run hf.co/comarproject/lale-9b-2603:Q4_K_M
```

Unsloth Studio

How to use comarproject/lale-9b-2603 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for comarproject/lale-9b-2603 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for comarproject/lale-9b-2603 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for comarproject/lale-9b-2603 to start chatting

How to use comarproject/lale-9b-2603 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf comarproject/lale-9b-2603:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "comarproject/lale-9b-2603:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use comarproject/lale-9b-2603 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf comarproject/lale-9b-2603:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default comarproject/lale-9b-2603:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use comarproject/lale-9b-2603 with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf comarproject/lale-9b-2603:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "comarproject/lale-9b-2603:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use comarproject/lale-9b-2603 with Docker Model Runner:
```
docker model run hf.co/comarproject/lale-9b-2603:Q4_K_M
```

Lemonade

How to use comarproject/lale-9b-2603 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull comarproject/lale-9b-2603:Q4_K_M

Run and chat with the model

lemonade run user.lale-9b-2603-Q4_K_M

List all available models

lemonade list

lale-9b-2603

lale (Turkish for "tulip") is a Turkish instruction-following language model fine-tuned from Qwen3.5-9B. It is designed to be the best Turkish language model at its size class, with strong performance in general knowledge, reasoning, tool use, grammar, finance, and legal domains.

Model Details

Property	Value
Base model	Qwen/Qwen3.5-9B
Method	LoRA SFT (r=32, alpha=32, bf16)
Training data	118,355 Turkish instruction examples (~113M tokens)
Epochs	3
Final loss	0.282
Training time	~120 hours on 1x RTX 4090
Parameters	9.5B total, 58M trainable (0.61%)

Available Formats

Format	Size	Use case
`merged/`	18 GB	Full bf16 for further fine-tuning or vLLM serving
`gguf/lale-9b-q8_0.gguf`	8.9 GB	High quality inference with llama.cpp / Ollama
`gguf/lale-9b-q4_k_m.gguf`	5.3 GB	Fast inference on consumer hardware
`adapter/`	242 MB	LoRA adapter to apply on base Qwen3.5-9B

Training Data

The training data consists of 118,355 synthetic Turkish instruction-response pairs generated using Claude Opus 4.6 and Claude Sonnet 4.6 via AWS Bedrock, across 21 categories in 3 rounds:

Round 1 (Sonnet, 61.6K examples): general, reasoning, tool_use, tool_use_advanced, finance, legal, code, translation

Round 2 (Opus, 37.1K examples): math, math_cot, multi_turn, tool_use_mcp, distill_reasoning, conversation_persona, reasoning_v2, code_v2

Round 3 (Opus+Sonnet, 19.7K examples): multi_step_tool, grammar_drill, error_recovery, legal_terms, translation_pro

All data was filtered for format validity, length bounds, exact deduplication, and tool-use message normalization.

Benchmark Results (terazi)

Evaluated using the terazi Turkish language model benchmark suite.

lale-9b-2602 vs lale-9b-2603

Category	2602 (98K data)	2603 (118K data)	Change
core	0.511	0.516	+1.0%
common_sense	0.970	0.980	+1.0%
reading_comp	0.535	0.512	-4.3%
grammar	0.288	0.337	+17.0%
translation	0.342	0.333	-2.6%
summarization	0.421	0.417	-1.0%
tool	0.411	0.444	+8.0%
api_call	0.557	0.586	+5.2%
multi_step	0.075	0.168	+124%
param_extraction	0.506	0.482	-4.7%
error_recovery	0.229	0.215	-6.1%
fin	0.492	0.454	-7.7%
sentiment	0.744	0.592	-20.4%
numerical_reasoning	0.524	0.557	+6.3%
term_understanding	0.226	0.252	+11.5%
legal	n/a	0.376	new

Key Improvements

multi_step tool use: +124% -- from targeted R3 multi_step_tool training data
grammar: +17% -- from R3 grammar_drill exercises (vowel harmony, suffix ordering, conjugation)
tool use overall: +8% -- from additional tool_use_mcp and multi_step_tool categories
numerical_reasoning: +6.3% -- from math and math_cot data
term_understanding: +11.5% -- from legal_terms and fin_analysis data

Usage

With llama.cpp

llama-server -m lale-9b-q8_0.gguf -ngl 99 --reasoning-budget 0 -c 4096

Note: --reasoning-budget 0 disables Qwen3.5's thinking mode, which puts output in reasoning_content instead of content.

With Ollama

Create a Modelfile:

FROM ./lale-9b-q8_0.gguf
PARAMETER num_ctx 4096

ollama create lale -f Modelfile
ollama run lale

With transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "comarproject/lale-9b-2603",
    subfolder="merged",
)

messages = [{"role": "user", "content": "Turkiye'nin baskenti neresidir?"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Technical Notes

Qwen3.5-9B is a unified VLM (vision-language model) with Mamba/hybrid layers. We train only the language components.
Training data includes normalized tool-use formats: tool_call/tool_result roles are remapped to standard assistant/tool, and content: null is allowed for OpenAI-style function calling messages.
LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Optimizer: AdamW 8-bit, cosine LR schedule, warmup 10%
Sample packing enabled (required patching Unsloth's VLM detection for Qwen3.5)

Limitations

Trained primarily on synthetic data from Claude models; may reflect Claude's style and biases
Context window limited to 2048 tokens during training (base model supports 128K)
Sentiment analysis regressed from 2602 (-20%) -- may need targeted data for this subcategory
Some long legal/financial prompts may exceed the trained context length

License

Apache 2.0

Citation

@misc{lale-9b-2603,
  title={lale-9b-2603: Turkish Instruction Model Distilled from Frontier Models},
  author={Selim Ozten},
  year={2026},
  url={https://huggingface.co/comarproject/lale-9b-2603}
}

Downloads last month: 7

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

Model tree for comarproject/lale-9b-2603

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(396)

this model

Evaluation results

core on terazi
self-reported

0.516
tool on terazi
self-reported

0.444
fin on terazi
self-reported

0.454
legal on terazi
self-reported

0.376