Qwen3.5-9B + metro-v23 LoRA

Domain-specialised tool-using agent for transit-kiosk tasks: routing, fare calculation, disruption advisories, accessibility, multilingual cultural notes, multi-turn context tracking, and policy adaptation across 6 metro systems (MARTA, BART, CTA, Doha, Taipei MRT, Beijing Subway).

QLoRA r=16 fine-tune of Qwen/Qwen3.5-9B on 790 distilled traces from Qwen3.5-27B and Qwen3.5-35B-A3B teachers (filtered to tier1 โ‰ฅ 90% per case, deduplicated by case_id, evaluated on the MetroLLM-Bench v23 harness).

Files

File Purpose
Qwen3.5-9B-metro-v23-Q4_K_M.gguf (5.3 GB) Runtime artifact for llama.cpp / Ollama
adapter/ Raw LoRA adapter (use with PEFT + base Qwen3.5-9B)
training_summary.json Hyperparameters, seed, dataset version

Eval (v23, 6 systems, Haiku judge for Tier 2)

Cross-system average: Tier-1 92.4, Composite 90.0 (+2.2 T1 / +1.4 Comp vs base Qwen3.5-9B)

System Tier-1 %
MARTA 94.0
BART 90.7
CTA 93.4
DOHA 93.1
TAIPEI 92.6
BEIJING 90.7

Quickstart (llama.cpp)

huggingface-cli download continker/Qwen3.5-9B-metro-v23 \
  Qwen3.5-9B-metro-v23-Q4_K_M.gguf --local-dir ./models

llama-server -m ./models/Qwen3.5-9B-metro-v23-Q4_K_M.gguf \
  --port 8080 --ctx-size 32768 --n-gpu-layers 999

Quickstart (PEFT adapter, Python)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "continker/Qwen3.5-9B-metro-v23", subfolder="adapter")
tokenizer = AutoTokenizer.from_pretrained("continker/Qwen3.5-9B-metro-v23", subfolder="adapter")

Training

  • Base: Qwen/Qwen3.5-9B
  • Method: QLoRA, rank=16, alpha=32, dropout=0.05
  • Targets: q/k/v/o + gate/up/down projections
  • Optimizer: AdamW, lr=2e-4, cosine, warmup 5%
  • Epochs: 3, effective batch 8 (per_device_train_batch_size=2 ร— grad_accum=4)
  • Max sequence length: 4096
  • Seed: 42 (default; multi-seed CI in progress for 27B)
  • Dataset: 790 distilled examples, see continker/metrollm-bench-train-data-v23

Limitations

  • Trained on 6 metro systems; generalisation to other systems untested.
  • Tool-use schema is specific to the MetroLLM-Bench mock server (route_planner, fare_calculator, station_info, disruption_feed, knowledge_base, submit_assistant_state).
  • Quantised to 4-bit (Q4_K_M); for full-precision behaviour use the adapter on bf16 base weights.

Citation

@misc{metrollm-bench-2026,
  title={MetroLLM-Bench: Evaluating LLMs as Prompt-Driven Transit Kiosk Agents},
  author={Hendriks, Remco and contributors},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/continker}}
}
Downloads last month
7
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for continker/Qwen3.5-9B-metro-v23

Finetuned
Qwen/Qwen3.5-9B
Adapter
(216)
this model

Space using continker/Qwen3.5-9B-metro-v23 1