HopCoder-Mini-9B Native Tool-Call LoRA (H200)

Overview

A LoRA fine-tune of HopCoder-Mini-9B (a Qwen3.5 multimodal model) that teaches the model to emit native tool-call blocks in a compact XML-like format instead of JSON-based function-calling schemas. Trained on a Modal H200 GPU in BF16 precision using a blend of xLAM and Hermes function-calling data plus 1,920 targeted CLI-tool examples.

Benchmark Results

Metric	Score
Overall accuracy	92.5% (37/40)
Targeted CLI tools	93.3% (28/30)
General tool-call (xLAM-style)	90.0% (9/10)
Average latency	2.73 s/case
Total generation time	109.04 s

Per-tool breakdown

Tool	Accuracy	Avg latency
`ask_user_question`	5/5 (100%)	4.25 s
`todo_write`	5/5 (100%)	4.19 s
`glob`	5/5 (100%)	1.59 s
`run_shell_command`	5/5 (100%)	2.08 s
`grep_search`	4/5 (80%)	1.95 s
`edit`	4/5 (80%)	2.65 s
`get_weather`	1/1 (100%)	1.34 s
`search_flights`	1/1 (100%)	2.45 s
`calculate_mortgage`	1/1 (100%)	3.02 s
`send_email`	1/1 (100%)	2.74 s
`book_restaurant`	1/1 (100%)	3.59 s
`get_stock_price`	1/1 (100%)	1.39 s
`create_event`	0/1 (0%)	3.30 s
`translate_text`	1/1 (100%)	2.65 s
`get_directions`	1/1 (100%)	2.90 s
`set_reminder`	1/1 (100%)	2.10 s

Failure analysis (3 cases)

Case	Expected	Got	Error
16	`grep_search`	`search_code`	Wrong function selected
24	`edit`	`read_file`	Wrong function selected
37	`create_event`	`create_event`	Missing `title` parameter

Native Tool-Call Format

The adapter emits tool calls in a compact XML-like format using special tokens:

<tool_call>
<function=FUNCTION_NAME>
<parameter=KEY>VALUE</parameter>
<parameter=KEY>VALUE</parameter>
</function>
</tool_call>

Key characteristics:

No JSON wrapping — parameters are individual XML-like tags, not a JSON object
No markdown fences — output is pure tool-call blocks
Balanced tags — every opening tag has a matching closing tag
Arrays and objects — JSON is used only for complex parameter values (arrays/objects)
Multiple calls — the model can emit multiple blocks in sequence

Example output

<tool_call>
<function=glob>
<parameter=pattern>**/*.py</parameter>
</function>
</tool_call>

Parsing the output

import re

FUNCTION_RE = re.compile(
    r"<tool_call>\s*<function=([A-Za-z_][A-Za-z0-9_]*)>\s*"
    r"(.*?)\s*</function>\s*</tool_call>",
    flags=re.DOTALL,
)
PARAMETER_RE = re.compile(
    r"<parameter=([A-Za-z_][A-Za-z0-9_]*)>\s*"
    r"(.*?)\s*</parameter>",
    flags=re.DOTALL,
)

def parse_tool_calls(text):
    calls = []
    for match in FUNCTION_RE.finditer(text):
        function_name = match.group(1)
        body = match.group(2)
        params = {}
        for key, value in PARAMETER_RE.findall(body):
            value = value.strip()
            if value.startswith("[") or value.startswith("{"):
                import json
                params[key] = json.loads(value)
            else:
                params[key] = value
        calls.append({"function": function_name, "parameters": params})
    return calls

Model Details

Property	Value
Base model	`TaimoorSiddiqui/Hopcoder-Mini-9B`
Architecture	Qwen3.5ForConditionalGeneration (multimodal)
Model loader	`AutoModelForImageTextToText`
Precision	BF16
PEFT type	LoRA
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0.05
Target modules	`q_proj, k_proj, v_proj, o_proj, in_proj_qkv, in_proj_z, in_proj_b, out_proj` (excludes vision tower)
Trainable parameters	0.12% of total (LoRA only)
Max sequence length	1024 tokens

Training Details

Hardware

Property	Value
GPU	NVIDIA H200 (Modal cloud)
CPU	16 physical cores
RAM	64 GiB
Training time	~45 min (453 steps)

Training data

Dataset	Source	Samples
xLAM function-calling	Salesforce/xlam-function-calling-60k	3,500
Hermes function-calling	NousResearch/hermes-function-calling-v1 (config: `func_calling_singleturn`)	1,900
Targeted CLI examples	8 tools x 240 examples x 2 repeats	3,840
Total training examples		~5,400+

Targeted CLI tools (8)

These are the real CLI agent tools the adapter was specifically trained on:

ask_user_question — Show interactive questions in the CLI
todo_write — Create or update a structured task list
read_file — Read a UTF-8 text file
search_code — Search source files for a text or regex pattern
glob — Find files by glob pattern
grep_search — Search file contents for a regex pattern
edit — Replace text in a file with new content
run_shell_command — Execute a shell command and return output

Training hyperparameters

Parameter	Value
Learning rate	1e-4
Epochs	1.0
Train batch size	4
Eval batch size	4
Gradient accumulation	4
Effective batch size	16
LR scheduler	Cosine
Warmup ratio	0.05
Weight decay	0.01
Max grad norm	1.0
Optimizer	AdamW (fused)
Precision	BF16 + TF32
Gradient checkpointing	Disabled
Dataloader workers	16
Seed	42

Training metrics

Metric	Value
Total steps	453
Train loss	0.0034
Eval loss	0.0237

Tool prompt format

The system prompt uses compact tool signatures instead of verbose JSON schemas:

<available_tools>
<tool name="glob" args="pattern:path, path?">Find files by glob pattern (e.g., **/*.py).</tool>
<tool name="grep_search" args="pattern:path, path?, glob?">Search file contents for a regex pattern.</tool>
</available_tools>

Required parameters have no suffix; optional parameters are marked with ?
Type annotations are compact (e.g., array[object{label,description}])
Descriptions are truncated to 120 characters (tools) / 60 characters (parameters)

How to Use

Installation

pip install torch transformers peft

Quick start

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel

MODEL_ID = "TaimoorSiddiqui/Hopcoder-Mini-9B"
ADAPTER_ID = "TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200"

processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
tokenizer = processor.tokenizer
if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

Generating a tool call

SYSTEM_PROMPT = (
    "Use the provided tools whenever the request requires one.\n\n"
    "For a tool request, emit only complete native tool-call blocks. "
    "Never emit a function name as a top-level tag. Never leave unmatched "
    "parameter, function, or tool_call tags. Arrays and objects inside "
    "parameter blocks must be valid JSON. Do not use Markdown fences.\n\n"
)

TOOLS_XML = (
    "<available_tools>\n"
    "<tool name=\"glob\" args=\"pattern:path, path?\">"
    "Find files by glob pattern (e.g., **/*.py).</tool>\n"
    "</available_tools>"
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT + TOOLS_XML},
    {"role": "user", "content": "Find all Python files in the project."},
]

prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=384,
        do_sample=False,
        repetition_penalty=1.05,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

generated = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True).strip())

Generation parameters

Parameter	Value
`max_new_tokens`	384
`do_sample`	False (greedy)
`repetition_penalty`	1.05
`enable_thinking`	False

Benchmark

The benchmark evaluates the adapter on 40 cases:

30 targeted cases — 5 per CLI tool x 6 tools (ask_user_question, todo_write, glob, grep_search, edit, run_shell_command)
10 general cases — xLAM-style queries with 10 different tools (get_weather, search_flights, calculate_mortgage, send_email, book_restaurant, get_stock_price, create_event, translate_text, get_directions, set_reminder)

Validation criteria

Each generated tool call is validated on:

Complete tool-call block — must contain at least one valid block
No extra prose — no text outside tool-call blocks
No markdown fences — no code blocks
Balanced tags — matching counts of opening/closing tags for tool_call, function, and parameter
Correct function name — the called function matches the expected one
Valid JSON — array/object parameter values must be valid JSON
Required parameters — all required parameters must be present
No top-level function tags — function name must not appear as a standalone XML tag

Running the benchmark

The benchmark runs on Modal with an H200 GPU:

python -m modal run --detach hopcoder_benchmark.py

The benchmark script (hopcoder_benchmark.py) is included in this repository.

Limitations

Function confusion — The model occasionally confuses similar tools (e.g., search_code vs grep_search, read_file vs edit)
Missing parameters — Rare cases of omitting required parameters for complex tools
Single-turn only — The adapter was trained on single-turn examples; multi-turn conversations may require additional fine-tuning
CLI-focused — The 8 targeted tools are CLI agent tools; the adapter has not been tested with real-world API tools
Compact schema format — The system prompt uses a compact XML-like tool signature format, not standard JSON schemas. This may not be compatible with all tool-calling frameworks

Training Infrastructure

Component	Value
Platform	Modal
GPU	NVIDIA H200
Image	Debian slim (Python 3.12)
Key libraries	torch 2.10.0, transformers 5.12.1, peft 0.19.1, datasets 5.0.0, accelerate 1.14.0
HF cache	Modal volume (`hopcoder-hf-cache`)
Training output	Modal volume (`hopcoder-training`)

Files

File	Description
`adapter_config.json`	LoRA configuration (r=16, alpha=32, dropout=0.05)
`adapter_model.safetensors`	Trained LoRA weights
`chat_template.jinja`	Chat template for the base model
`processor_config.json`	Processor configuration
`tokenizer.json`	Tokenizer data
`tokenizer_config.json`	Tokenizer configuration
`hopcoder_benchmark.py`	Benchmark script (40 cases, 6 targeted + 10 general tools)

Citation

@misc{hopcoder-mini-9b-native-toolcall-lora-h200,
  author = {Taimoor Siddiqui},
  title = {HopCoder-Mini-9B Native Tool-Call LoRA Adapter (H200)},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200}
}

License

Apache 2.0

Downloads last month: -

Model tree for TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

empero-ai/Qwythos-9B-Claude-Mythos-5-1M

Finetuned

TaimoorSiddiqui/Hopcoder-Mini-9B

Adapter

(2)

this model