HopCoder-Mini-9B Native Tool-Call LoRA (H200)

Overview

A LoRA fine-tune of HopCoder-Mini-9B (a Qwen3.5 multimodal model) that teaches the model to emit native tool-call blocks in a compact XML-like format instead of JSON-based function-calling schemas. Trained on a Modal H200 GPU in BF16 precision using a blend of xLAM and Hermes function-calling data plus 1,920 targeted CLI-tool examples.

Benchmark Results

Metric Score
Overall accuracy 92.5% (37/40)
Targeted CLI tools 93.3% (28/30)
General tool-call (xLAM-style) 90.0% (9/10)
Average latency 2.73 s/case
Total generation time 109.04 s

Per-tool breakdown

Tool Accuracy Avg latency
ask_user_question 5/5 (100%) 4.25 s
todo_write 5/5 (100%) 4.19 s
glob 5/5 (100%) 1.59 s
run_shell_command 5/5 (100%) 2.08 s
grep_search 4/5 (80%) 1.95 s
edit 4/5 (80%) 2.65 s
get_weather 1/1 (100%) 1.34 s
search_flights 1/1 (100%) 2.45 s
calculate_mortgage 1/1 (100%) 3.02 s
send_email 1/1 (100%) 2.74 s
book_restaurant 1/1 (100%) 3.59 s
get_stock_price 1/1 (100%) 1.39 s
create_event 0/1 (0%) 3.30 s
translate_text 1/1 (100%) 2.65 s
get_directions 1/1 (100%) 2.90 s
set_reminder 1/1 (100%) 2.10 s

Failure analysis (3 cases)

Case Expected Got Error
16 grep_search search_code Wrong function selected
24 edit read_file Wrong function selected
37 create_event create_event Missing title parameter

Native Tool-Call Format

The adapter emits tool calls in a compact XML-like format using special tokens:

<tool_call>
<function=FUNCTION_NAME>
<parameter=KEY>VALUE</parameter>
<parameter=KEY>VALUE</parameter>
</function>
</tool_call>

Key characteristics:

  • No JSON wrapping — parameters are individual XML-like tags, not a JSON object
  • No markdown fences — output is pure tool-call blocks
  • Balanced tags — every opening tag has a matching closing tag
  • Arrays and objects — JSON is used only for complex parameter values (arrays/objects)
  • Multiple calls — the model can emit multiple blocks in sequence

Example output

<tool_call>
<function=glob>
<parameter=pattern>**/*.py</parameter>
</function>
</tool_call>

Parsing the output

import re

FUNCTION_RE = re.compile(
    r"<tool_call>\s*<function=([A-Za-z_][A-Za-z0-9_]*)>\s*"
    r"(.*?)\s*</function>\s*</tool_call>",
    flags=re.DOTALL,
)
PARAMETER_RE = re.compile(
    r"<parameter=([A-Za-z_][A-Za-z0-9_]*)>\s*"
    r"(.*?)\s*</parameter>",
    flags=re.DOTALL,
)

def parse_tool_calls(text):
    calls = []
    for match in FUNCTION_RE.finditer(text):
        function_name = match.group(1)
        body = match.group(2)
        params = {}
        for key, value in PARAMETER_RE.findall(body):
            value = value.strip()
            if value.startswith("[") or value.startswith("{"):
                import json
                params[key] = json.loads(value)
            else:
                params[key] = value
        calls.append({"function": function_name, "parameters": params})
    return calls

Model Details

Property Value
Base model TaimoorSiddiqui/Hopcoder-Mini-9B
Architecture Qwen3.5ForConditionalGeneration (multimodal)
Model loader AutoModelForImageTextToText
Precision BF16
PEFT type LoRA
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, in_proj_qkv, in_proj_z, in_proj_b, out_proj (excludes vision tower)
Trainable parameters 0.12% of total (LoRA only)
Max sequence length 1024 tokens

Training Details

Hardware

Property Value
GPU NVIDIA H200 (Modal cloud)
CPU 16 physical cores
RAM 64 GiB
Training time ~45 min (453 steps)

Training data

Dataset Source Samples
xLAM function-calling Salesforce/xlam-function-calling-60k 3,500
Hermes function-calling NousResearch/hermes-function-calling-v1 (config: func_calling_singleturn) 1,900
Targeted CLI examples 8 tools x 240 examples x 2 repeats 3,840
Total training examples ~5,400+

Targeted CLI tools (8)

These are the real CLI agent tools the adapter was specifically trained on:

  1. ask_user_question — Show interactive questions in the CLI
  2. todo_write — Create or update a structured task list
  3. read_file — Read a UTF-8 text file
  4. search_code — Search source files for a text or regex pattern
  5. glob — Find files by glob pattern
  6. grep_search — Search file contents for a regex pattern
  7. edit — Replace text in a file with new content
  8. run_shell_command — Execute a shell command and return output

Training hyperparameters

Parameter Value
Learning rate 1e-4
Epochs 1.0
Train batch size 4
Eval batch size 4
Gradient accumulation 4
Effective batch size 16
LR scheduler Cosine
Warmup ratio 0.05
Weight decay 0.01
Max grad norm 1.0
Optimizer AdamW (fused)
Precision BF16 + TF32
Gradient checkpointing Disabled
Dataloader workers 16
Seed 42

Training metrics

Metric Value
Total steps 453
Train loss 0.0034
Eval loss 0.0237

Tool prompt format

The system prompt uses compact tool signatures instead of verbose JSON schemas:

<available_tools>
<tool name="glob" args="pattern:path, path?">Find files by glob pattern (e.g., **/*.py).</tool>
<tool name="grep_search" args="pattern:path, path?, glob?">Search file contents for a regex pattern.</tool>
</available_tools>
  • Required parameters have no suffix; optional parameters are marked with ?
  • Type annotations are compact (e.g., array[object{label,description}])
  • Descriptions are truncated to 120 characters (tools) / 60 characters (parameters)

How to Use

Installation

pip install torch transformers peft

Quick start

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel

MODEL_ID = "TaimoorSiddiqui/Hopcoder-Mini-9B"
ADAPTER_ID = "TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200"

processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
tokenizer = processor.tokenizer
if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

Generating a tool call

SYSTEM_PROMPT = (
    "Use the provided tools whenever the request requires one.\n\n"
    "For a tool request, emit only complete native tool-call blocks. "
    "Never emit a function name as a top-level tag. Never leave unmatched "
    "parameter, function, or tool_call tags. Arrays and objects inside "
    "parameter blocks must be valid JSON. Do not use Markdown fences.\n\n"
)

TOOLS_XML = (
    "<available_tools>\n"
    "<tool name=\"glob\" args=\"pattern:path, path?\">"
    "Find files by glob pattern (e.g., **/*.py).</tool>\n"
    "</available_tools>"
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT + TOOLS_XML},
    {"role": "user", "content": "Find all Python files in the project."},
]

prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=384,
        do_sample=False,
        repetition_penalty=1.05,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

generated = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True).strip())

Generation parameters

Parameter Value
max_new_tokens 384
do_sample False (greedy)
repetition_penalty 1.05
enable_thinking False

Benchmark

The benchmark evaluates the adapter on 40 cases:

  • 30 targeted cases — 5 per CLI tool x 6 tools (ask_user_question, todo_write, glob, grep_search, edit, run_shell_command)
  • 10 general cases — xLAM-style queries with 10 different tools (get_weather, search_flights, calculate_mortgage, send_email, book_restaurant, get_stock_price, create_event, translate_text, get_directions, set_reminder)

Validation criteria

Each generated tool call is validated on:

  1. Complete tool-call block — must contain at least one valid block
  2. No extra prose — no text outside tool-call blocks
  3. No markdown fences — no code blocks
  4. Balanced tags — matching counts of opening/closing tags for tool_call, function, and parameter
  5. Correct function name — the called function matches the expected one
  6. Valid JSON — array/object parameter values must be valid JSON
  7. Required parameters — all required parameters must be present
  8. No top-level function tags — function name must not appear as a standalone XML tag

Running the benchmark

The benchmark runs on Modal with an H200 GPU:

python -m modal run --detach hopcoder_benchmark.py

The benchmark script (hopcoder_benchmark.py) is included in this repository.


Limitations

  1. Function confusion — The model occasionally confuses similar tools (e.g., search_code vs grep_search, read_file vs edit)
  2. Missing parameters — Rare cases of omitting required parameters for complex tools
  3. Single-turn only — The adapter was trained on single-turn examples; multi-turn conversations may require additional fine-tuning
  4. CLI-focused — The 8 targeted tools are CLI agent tools; the adapter has not been tested with real-world API tools
  5. Compact schema format — The system prompt uses a compact XML-like tool signature format, not standard JSON schemas. This may not be compatible with all tool-calling frameworks

Training Infrastructure

Component Value
Platform Modal
GPU NVIDIA H200
Image Debian slim (Python 3.12)
Key libraries torch 2.10.0, transformers 5.12.1, peft 0.19.1, datasets 5.0.0, accelerate 1.14.0
HF cache Modal volume (hopcoder-hf-cache)
Training output Modal volume (hopcoder-training)

Files

File Description
adapter_config.json LoRA configuration (r=16, alpha=32, dropout=0.05)
adapter_model.safetensors Trained LoRA weights
chat_template.jinja Chat template for the base model
processor_config.json Processor configuration
tokenizer.json Tokenizer data
tokenizer_config.json Tokenizer configuration
hopcoder_benchmark.py Benchmark script (40 cases, 6 targeted + 10 general tools)

Citation

@misc{hopcoder-mini-9b-native-toolcall-lora-h200,
  author = {Taimoor Siddiqui},
  title = {HopCoder-Mini-9B Native Tool-Call LoRA Adapter (H200)},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200}
}

License

Apache 2.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200