YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ› οΈ Qwen 2.5-0.5B Tool Call Fine-Tuning (Phase 1 β€” Synthetic SFT)

Fine-tuning Qwen 2.5-0.5B on a small synthetic dataset to make structured JSON tool calls using QLoRA.

This is Phase 1 of a 3-phase experiment. See the full experiment repo for SFT + GRPO on real data.


πŸ“Œ What This Does

Trains a 0.5B model to respond to natural language with structured tool calls:

{"tool": "get_stock_price", "arguments": {"ticker": "MSFT"}}

πŸ€– Setup

Model Qwen/Qwen2.5-0.5B-Instruct
Method QLoRA (4-bit, LoRA rank 16)
Dataset 14 hand-written synthetic examples
Tools get_weather, search_web, calculator, get_stock_price
Platform Google Colab (T4 GPU)

πŸ“Š Results

Evaluated on 10 queries (5 seen during training, 5 unseen):

Metric Score
JSON Valid 30%
Correct Tool Name 20%
Correct Arguments 10%
Full Match (strict) 0%
In-distribution 0/5
Out-of-distribution 0/5

Key observations:

  • Model learned the concept of JSON tool calls but hallucinated tool names ("weather" instead of "get_weather")
  • Wrong argument keys ("ticker_symbol" instead of "ticker")
  • Calculator answered directly in plain text instead of calling the tool
  • 14 examples is too small β€” model memorises poorly and doesn't generalise

πŸ“ Files

File Description
qwen25_tool_call_finetune.py Training script β€” SFT with synthetic data
Qwen2.5-0.5B_tool_call_finetune_eval.py Evaluation script β€” scores model on 10 queries

πŸ’‘ Lessons Learned

  • 14 examples is not enough β€” model fails to generalise even to similar queries
  • Exact argument matching is strict β€” "ticker_symbol" vs "ticker" counts as failure
  • Bigger dataset + RL needed β†’ See Phase 2 & 3 for improvement

πŸ”§ Stack

Qwen2.5-0.5B β€’ QLoRA β€’ SFT β€’ trl β€’ peft β€’ Google Colab T4

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Balasandhya/llm-tool-call-lora-Qwen0.5B 1