Instructions to use TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("TaimoorSiddiqui/Hopcoder-Mini-9B") model = PeftModel.from_pretrained(base_model, "TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200") - Notebooks
- Google Colab
- Kaggle
HopCoder-Mini-9B Native Tool-Call LoRA (H200)
Overview
A LoRA fine-tune of HopCoder-Mini-9B (a Qwen3.5 multimodal model) that teaches the model to emit native tool-call blocks in a compact XML-like format instead of JSON-based function-calling schemas. Trained on a Modal H200 GPU in BF16 precision using a blend of xLAM and Hermes function-calling data plus 1,920 targeted CLI-tool examples.
Benchmark Results
| Metric | Score |
|---|---|
| Overall accuracy | 92.5% (37/40) |
| Targeted CLI tools | 93.3% (28/30) |
| General tool-call (xLAM-style) | 90.0% (9/10) |
| Average latency | 2.73 s/case |
| Total generation time | 109.04 s |
Per-tool breakdown
| Tool | Accuracy | Avg latency |
|---|---|---|
ask_user_question |
5/5 (100%) | 4.25 s |
todo_write |
5/5 (100%) | 4.19 s |
glob |
5/5 (100%) | 1.59 s |
run_shell_command |
5/5 (100%) | 2.08 s |
grep_search |
4/5 (80%) | 1.95 s |
edit |
4/5 (80%) | 2.65 s |
get_weather |
1/1 (100%) | 1.34 s |
search_flights |
1/1 (100%) | 2.45 s |
calculate_mortgage |
1/1 (100%) | 3.02 s |
send_email |
1/1 (100%) | 2.74 s |
book_restaurant |
1/1 (100%) | 3.59 s |
get_stock_price |
1/1 (100%) | 1.39 s |
create_event |
0/1 (0%) | 3.30 s |
translate_text |
1/1 (100%) | 2.65 s |
get_directions |
1/1 (100%) | 2.90 s |
set_reminder |
1/1 (100%) | 2.10 s |
Failure analysis (3 cases)
| Case | Expected | Got | Error |
|---|---|---|---|
| 16 | grep_search |
search_code |
Wrong function selected |
| 24 | edit |
read_file |
Wrong function selected |
| 37 | create_event |
create_event |
Missing title parameter |
Native Tool-Call Format
The adapter emits tool calls in a compact XML-like format using special tokens:
<tool_call>
<function=FUNCTION_NAME>
<parameter=KEY>VALUE</parameter>
<parameter=KEY>VALUE</parameter>
</function>
</tool_call>
Key characteristics:
- No JSON wrapping — parameters are individual XML-like tags, not a JSON object
- No markdown fences — output is pure tool-call blocks
- Balanced tags — every opening tag has a matching closing tag
- Arrays and objects — JSON is used only for complex parameter values (arrays/objects)
- Multiple calls — the model can emit multiple blocks in sequence
Example output
<tool_call>
<function=glob>
<parameter=pattern>**/*.py</parameter>
</function>
</tool_call>
Parsing the output
import re
FUNCTION_RE = re.compile(
r"<tool_call>\s*<function=([A-Za-z_][A-Za-z0-9_]*)>\s*"
r"(.*?)\s*</function>\s*</tool_call>",
flags=re.DOTALL,
)
PARAMETER_RE = re.compile(
r"<parameter=([A-Za-z_][A-Za-z0-9_]*)>\s*"
r"(.*?)\s*</parameter>",
flags=re.DOTALL,
)
def parse_tool_calls(text):
calls = []
for match in FUNCTION_RE.finditer(text):
function_name = match.group(1)
body = match.group(2)
params = {}
for key, value in PARAMETER_RE.findall(body):
value = value.strip()
if value.startswith("[") or value.startswith("{"):
import json
params[key] = json.loads(value)
else:
params[key] = value
calls.append({"function": function_name, "parameters": params})
return calls
Model Details
| Property | Value |
|---|---|
| Base model | TaimoorSiddiqui/Hopcoder-Mini-9B |
| Architecture | Qwen3.5ForConditionalGeneration (multimodal) |
| Model loader | AutoModelForImageTextToText |
| Precision | BF16 |
| PEFT type | LoRA |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, in_proj_qkv, in_proj_z, in_proj_b, out_proj (excludes vision tower) |
| Trainable parameters | 0.12% of total (LoRA only) |
| Max sequence length | 1024 tokens |
Training Details
Hardware
| Property | Value |
|---|---|
| GPU | NVIDIA H200 (Modal cloud) |
| CPU | 16 physical cores |
| RAM | 64 GiB |
| Training time | ~45 min (453 steps) |
Training data
| Dataset | Source | Samples |
|---|---|---|
| xLAM function-calling | Salesforce/xlam-function-calling-60k | 3,500 |
| Hermes function-calling | NousResearch/hermes-function-calling-v1 (config: func_calling_singleturn) |
1,900 |
| Targeted CLI examples | 8 tools x 240 examples x 2 repeats | 3,840 |
| Total training examples | ~5,400+ |
Targeted CLI tools (8)
These are the real CLI agent tools the adapter was specifically trained on:
ask_user_question— Show interactive questions in the CLItodo_write— Create or update a structured task listread_file— Read a UTF-8 text filesearch_code— Search source files for a text or regex patternglob— Find files by glob patterngrep_search— Search file contents for a regex patternedit— Replace text in a file with new contentrun_shell_command— Execute a shell command and return output
Training hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 1e-4 |
| Epochs | 1.0 |
| Train batch size | 4 |
| Eval batch size | 4 |
| Gradient accumulation | 4 |
| Effective batch size | 16 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Optimizer | AdamW (fused) |
| Precision | BF16 + TF32 |
| Gradient checkpointing | Disabled |
| Dataloader workers | 16 |
| Seed | 42 |
Training metrics
| Metric | Value |
|---|---|
| Total steps | 453 |
| Train loss | 0.0034 |
| Eval loss | 0.0237 |
Tool prompt format
The system prompt uses compact tool signatures instead of verbose JSON schemas:
<available_tools>
<tool name="glob" args="pattern:path, path?">Find files by glob pattern (e.g., **/*.py).</tool>
<tool name="grep_search" args="pattern:path, path?, glob?">Search file contents for a regex pattern.</tool>
</available_tools>
- Required parameters have no suffix; optional parameters are marked with
? - Type annotations are compact (e.g.,
array[object{label,description}]) - Descriptions are truncated to 120 characters (tools) / 60 characters (parameters)
How to Use
Installation
pip install torch transformers peft
Quick start
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel
MODEL_ID = "TaimoorSiddiqui/Hopcoder-Mini-9B"
ADAPTER_ID = "TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200"
processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
tokenizer = processor.tokenizer
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForImageTextToText.from_pretrained(
MODEL_ID,
trust_remote_code=True,
dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()
Generating a tool call
SYSTEM_PROMPT = (
"Use the provided tools whenever the request requires one.\n\n"
"For a tool request, emit only complete native tool-call blocks. "
"Never emit a function name as a top-level tag. Never leave unmatched "
"parameter, function, or tool_call tags. Arrays and objects inside "
"parameter blocks must be valid JSON. Do not use Markdown fences.\n\n"
)
TOOLS_XML = (
"<available_tools>\n"
"<tool name=\"glob\" args=\"pattern:path, path?\">"
"Find files by glob pattern (e.g., **/*.py).</tool>\n"
"</available_tools>"
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT + TOOLS_XML},
{"role": "user", "content": "Find all Python files in the project."},
]
prompt = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=384,
do_sample=False,
repetition_penalty=1.05,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
generated = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True).strip())
Generation parameters
| Parameter | Value |
|---|---|
max_new_tokens |
384 |
do_sample |
False (greedy) |
repetition_penalty |
1.05 |
enable_thinking |
False |
Benchmark
The benchmark evaluates the adapter on 40 cases:
- 30 targeted cases — 5 per CLI tool x 6 tools (ask_user_question, todo_write, glob, grep_search, edit, run_shell_command)
- 10 general cases — xLAM-style queries with 10 different tools (get_weather, search_flights, calculate_mortgage, send_email, book_restaurant, get_stock_price, create_event, translate_text, get_directions, set_reminder)
Validation criteria
Each generated tool call is validated on:
- Complete tool-call block — must contain at least one valid block
- No extra prose — no text outside tool-call blocks
- No markdown fences — no code blocks
- Balanced tags — matching counts of opening/closing tags for
tool_call,function, andparameter - Correct function name — the called function matches the expected one
- Valid JSON — array/object parameter values must be valid JSON
- Required parameters — all required parameters must be present
- No top-level function tags — function name must not appear as a standalone XML tag
Running the benchmark
The benchmark runs on Modal with an H200 GPU:
python -m modal run --detach hopcoder_benchmark.py
The benchmark script (hopcoder_benchmark.py) is included in this repository.
Limitations
- Function confusion — The model occasionally confuses similar tools (e.g.,
search_codevsgrep_search,read_filevsedit) - Missing parameters — Rare cases of omitting required parameters for complex tools
- Single-turn only — The adapter was trained on single-turn examples; multi-turn conversations may require additional fine-tuning
- CLI-focused — The 8 targeted tools are CLI agent tools; the adapter has not been tested with real-world API tools
- Compact schema format — The system prompt uses a compact XML-like tool signature format, not standard JSON schemas. This may not be compatible with all tool-calling frameworks
Training Infrastructure
| Component | Value |
|---|---|
| Platform | Modal |
| GPU | NVIDIA H200 |
| Image | Debian slim (Python 3.12) |
| Key libraries | torch 2.10.0, transformers 5.12.1, peft 0.19.1, datasets 5.0.0, accelerate 1.14.0 |
| HF cache | Modal volume (hopcoder-hf-cache) |
| Training output | Modal volume (hopcoder-training) |
Files
| File | Description |
|---|---|
adapter_config.json |
LoRA configuration (r=16, alpha=32, dropout=0.05) |
adapter_model.safetensors |
Trained LoRA weights |
chat_template.jinja |
Chat template for the base model |
processor_config.json |
Processor configuration |
tokenizer.json |
Tokenizer data |
tokenizer_config.json |
Tokenizer configuration |
hopcoder_benchmark.py |
Benchmark script (40 cases, 6 targeted + 10 general tools) |
Citation
@misc{hopcoder-mini-9b-native-toolcall-lora-h200,
author = {Taimoor Siddiqui},
title = {HopCoder-Mini-9B Native Tool-Call LoRA Adapter (H200)},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200}
}
License
Apache 2.0
- Downloads last month
- -
Model tree for TaimoorSiddiqui/Hopcoder-Mini-9B-Native-ToolCall-LoRA-H200
Base model
Qwen/Qwen3.5-9B-Base