File size: 40,951 Bytes

1fa3c6c

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ii5Zkit6eSqU"
   },
   "source": [
    "# Teaching Tool Calling with Supervised Fine-Tuning (SFT) using TRL on a Free Colab Notebook\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gJVcVKOteSqV"
   },
   "source": [
    "![trl banner](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hzt0BrvoeSqW"
   },
   "source": [
    "Learn how to teach a language model to perform **tool calling** using **Supervised Fine-Tuning (SFT)** with **LoRA/QLoRA** and the [**TRL**](https://github.com/huggingface/trl) library.\n",
    "\n",
    "The model used in this notebook does not have native tool-calling support. We extend its Jinja2 chat template (via `tiny_aya_chat_template.jinja`) to serialize tool schemas into the system preamble and render tool calls as structured `<tool_call>` XML inside the model's native `<|START_RESPONSE|>` / `<|END_RESPONSE|>` delimiters. The modified template is saved with the tokenizer, making inference reproducible: just load the tokenizer from the output directory and call `apply_chat_template` with `tools=TOOLS`.\n",
    "\n",
    "- [TRL GitHub Repository](https://github.com/huggingface/trl) — star us to support the project!\n",
    "- [Official TRL Examples](https://huggingface.co/docs/trl/example_overview)\n",
    "- [Community Tutorials](https://huggingface.co/docs/trl/community_tutorials)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3PfX1aj5eSqW"
   },
   "source": [
    "## Key concepts\n",
    "\n",
    "- **SFT**: Trains a model on example input-output pairs to align its behavior with a desired task.\n",
    "- **Tool Calling**: The ability of a model to respond with a structured function call instead of free-form text.\n",
    "- **LoRA**: Updates only a small set of low-rank parameters, reducing training cost and memory usage.\n",
    "- **QLoRA**: A quantized variant of LoRA that enables fine-tuning larger models on limited hardware.\n",
    "- **TRL**: The Hugging Face library that makes fine-tuning and reinforcement learning simple and efficient."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QDMcKeoEeSqW"
   },
   "source": [
    "## Install dependencies\n",
    "\n",
    "We'll install **TRL** with the **PEFT** extra, which brings in all main dependencies such as **Transformers** and **PEFT** (parameter-efficient fine-tuning). We also install **trackio** for experiment logging, and **bitsandbytes** for 4-bit quantization,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Ey-TuYPrXTLG",
    "outputId": "a4fd8cfe-624e-4185-ab59-e6901514cb96"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m17.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m60.7/60.7 MB\u001b[0m \u001b[31m42.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.2/24.2 MB\u001b[0m \u001b[31m109.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.0/56.0 kB\u001b[0m \u001b[31m6.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m9.9/9.9 MB\u001b[0m \u001b[31m131.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m540.5/540.5 kB\u001b[0m \u001b[31m44.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25h"
     ]
    }
   ],
   "source": [
    "!pip install -Uq \"trl[peft]\" trackio bitsandbytes liger-kernel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Aw8_T-Z0eSqW"
   },
   "source": [
    "### Log in to Hugging Face\n",
    "\n",
    "Log in to your Hugging Face account to push the fine-tuned model to the Hub and access gated models. You can find your access token on your [account settings page](https://huggingface.co/settings/tokens)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_qaeDZwXXTLG"
   },
   "outputs": [],
   "source": [
    "from huggingface_hub import notebook_login\n",
    "\n",
    "notebook_login()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XPnDpJgIeSqX"
   },
   "source": [
    "## Load Dataset\n",
    "\n",
    "We load the [**bebechien/SimpleToolCalling**](https://huggingface.co/datasets/bebechien/SimpleToolCalling) dataset, which contains user queries paired with the correct tool call to handle each request. Each sample provides a `user_content`, a `tool_name`, and `tool_arguments`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "zfJY_8AzXTLG"
   },
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "dataset_name = \"bebechien/SimpleToolCalling\"\n",
    "dataset = load_dataset(dataset_name, split=\"train\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ygeMXzKGXTLH",
    "outputId": "a1ed3a8b-f515-4cda-eeb2-db0355ed2c02"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dataset({\n",
       "    features: ['user_content', 'tool_name', 'tool_arguments'],\n",
       "    num_rows: 40\n",
       "})"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "O_GkvqtReSqX"
   },
   "source": [
    "## Prepare Tool-Calling Data\n",
    "\n",
    "We define two tools: `search_knowledge_base` for internal company documents and `search_google` for public information. We then write a custom Jinja2 chat template that extends the model's default template with two additions:\n",
    "\n",
    "1. A **Tool Use** section is appended to the system preamble when `tools` is passed to `apply_chat_template`.\n",
    "2. Assistant turns with `tool_calls` render the call as structured `<tool_call>` inside the model's existing `<|START_RESPONSE|>` / `<|END_RESPONSE|>` delimiters.\n",
    "\n",
    "Each training sample uses the standard `tool_calls` message format with a `tools` key — SFTTrainer passes these to `apply_chat_template` automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "jaAgXeWtXTLH"
   },
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "# These are the tool schemas that are used in the dataset\n",
    "TOOLS = [\n",
    "    {\n",
    "        \"type\": \"function\",\n",
    "        \"function\": {\n",
    "            \"name\": \"search_knowledge_base\",\n",
    "            \"description\": \"Search internal company documents, policies and project data.\",\n",
    "            \"parameters\": {\n",
    "                \"type\": \"object\",\n",
    "                \"properties\": {\"query\": {\"type\": \"string\", \"description\": \"query string\"}},\n",
    "                \"required\": [\"query\"],\n",
    "            },\n",
    "            \"return\": {\"type\": \"string\"},\n",
    "        },\n",
    "    },\n",
    "    {\n",
    "        \"type\": \"function\",\n",
    "        \"function\": {\n",
    "            \"name\": \"search_google\",\n",
    "            \"description\": \"Search public information.\",\n",
    "            \"parameters\": {\n",
    "                \"type\": \"object\",\n",
    "                \"properties\": {\"query\": {\"type\": \"string\", \"description\": \"query string\"}},\n",
    "                \"required\": [\"query\"],\n",
    "            },\n",
    "            \"return\": {\"type\": \"string\"},\n",
    "        },\n",
    "    },\n",
    "]\n",
    "\n",
    "def create_conversation(sample):\n",
    "    return {\n",
    "        \"prompt\": [{\"role\": \"user\", \"content\": sample[\"user_content\"]}],\n",
    "        \"completion\": [\n",
    "            {\n",
    "                \"role\": \"assistant\",\n",
    "                \"tool_calls\": [\n",
    "                    {\n",
    "                        \"type\": \"function\",\n",
    "                        \"function\": {\n",
    "                            \"name\": sample[\"tool_name\"],\n",
    "                            \"arguments\": json.loads(sample[\"tool_arguments\"]),\n",
    "                        },\n",
    "                    }\n",
    "                ],\n",
    "            },\n",
    "        ],\n",
    "        \"tools\": TOOLS,\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "32p512R2XTLH"
   },
   "outputs": [],
   "source": [
    "dataset = dataset.map(create_conversation, remove_columns=dataset.features)\n",
    "\n",
    "# Split dataset into 50% training samples and 50% test samples\n",
    "dataset = dataset.train_test_split(test_size=0.5, shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Plnjef-PeSqX"
   },
   "source": [
    "Let's inspect an example from the training set to verify the format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "f4QI6wJjXTLH",
    "outputId": "2156adb4-7bed-4e29-84c5-54e6d45e5500"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'messages': [{'content': 'How do I configure the VPN for the New York office?',\n",
       "   'role': 'user',\n",
       "   'tool_calls': None},\n",
       "  {'content': None,\n",
       "   'role': 'assistant',\n",
       "   'tool_calls': [{'function': {'arguments': {'query': 'VPN configuration guide New York office'},\n",
       "      'name': 'search_knowledge_base'},\n",
       "     'type': 'function'}]}],\n",
       " 'tools': [{'function': {'description': 'Search internal company documents, policies and project data.',\n",
       "    'name': 'search_knowledge_base',\n",
       "    'parameters': {'properties': {'query': {'description': 'query string',\n",
       "       'type': 'string'}},\n",
       "     'required': ['query'],\n",
       "     'type': 'object'},\n",
       "    'return': {'type': 'string'}},\n",
       "   'type': 'function'},\n",
       "  {'function': {'description': 'Search public information.',\n",
       "    'name': 'search_google',\n",
       "    'parameters': {'properties': {'query': {'description': 'query string',\n",
       "       'type': 'string'}},\n",
       "     'required': ['query'],\n",
       "     'type': 'object'},\n",
       "    'return': {'type': 'string'}},\n",
       "   'type': 'function'}]}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset['train'][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fBIGKl_UXTLH",
    "outputId": "edd8e968-c7e4-418d-b9e9-26773aee1366"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    train: Dataset({\n",
       "        features: ['messages', 'tools'],\n",
       "        num_rows: 20\n",
       "    })\n",
       "    test: Dataset({\n",
       "        features: ['messages', 'tools'],\n",
       "        num_rows: 20\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aud6U3c2eSqX"
   },
   "source": [
    "## Load Model and Configure LoRA/QLoRA\n",
    "\n",
    "Choose the model you want to fine-tune. This notebook uses [`CohereLabs/tiny-aya-global`](https://huggingface.co/CohereLabs/tiny-aya-global) by default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_j_LF12IXTLH"
   },
   "outputs": [],
   "source": [
    "model_id, output_dir = \"CohereLabs/tiny-aya-global\", \"tiny-aya-global-SFT\"     # ✅ ~9.1 GB VRAM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gpTZHjpJeSqX"
   },
   "source": [
    "Load the model with 4-bit quantization using `BitsAndBytesConfig` (QLoRA). To use standard LoRA without quantization, comment out the `quantization_config` parameter. We also load the tokenizer separately so we can install the custom chat template before training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "680888237b78477ea653adb2ecea7fa8"
     ]
    },
    "id": "jGpTDV6sXTLH",
    "outputId": "fc33f7a6-bfd0-4228-80cd-e0aeb67bbd42"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "680888237b78477ea653adb2ecea7fa8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading weights:   0%|          | 0/290 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import torch\n",
    "from transformers import AutoModelForCausalLM, BitsAndBytesConfig\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    model_id,\n",
    "    attn_implementation=\"sdpa\",                   # Change to Flash Attention if GPU has support\n",
    "    dtype=torch.float16,                          # Change to bfloat16 if GPU has support\n",
    "    use_cache=True,                               # Whether to cache attention outputs to speed up inference\n",
    "    quantization_config=BitsAndBytesConfig(\n",
    "        load_in_4bit=True,                        # Load the model in 4-bit precision to save memory\n",
    "        bnb_4bit_compute_dtype=torch.float16,     # Data type used for internal computations in quantization\n",
    "        bnb_4bit_use_double_quant=True,           # Use double quantization to improve accuracy\n",
    "        bnb_4bit_quant_type=\"nf4\"                 # Type of quantization. \"nf4\" is recommended for recent LLMs\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "zMt6lzeVXTLH"
   },
   "outputs": [],
   "source": [
    "!wget https://raw.githubusercontent.com/huggingface/trl/refs/heads/main/examples/scripts/tiny_aya_chat_template.jinja"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "z_Rek3WueSqX"
   },
   "source": [
    "Configure LoRA. Instead of updating the model's original weights, we fine-tune a lightweight **LoRA adapter**. The `target_modules` specify which layers receive the adapter — update these if using a different model architecture."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "2zCetOerXTLH"
   },
   "outputs": [],
   "source": [
    "from peft import LoraConfig\n",
    "\n",
    "# You may need to update `target_modules` depending on the architecture of your chosen model.\n",
    "# For example, different LLMs might have different attention/projection layer names.\n",
    "peft_config = LoraConfig(\n",
    "    r=32,\n",
    "    lora_alpha=32,\n",
    "    target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\",],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Bb3onre7eSqX"
   },
   "source": [
    "## Train Model\n",
    "\n",
    "Configure the training run with `SFTConfig`. The settings below are tuned for low memory usage. For full details on available parameters, see the [TRL SFTConfig documentation](https://huggingface.co/docs/trl/sft_trainer#trl.SFTConfig)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ehnCG4PCXTLH"
   },
   "outputs": [],
   "source": [
    "from trl import SFTConfig\n",
    "\n",
    "training_args = SFTConfig(\n",
    "    # Training schedule / optimization\n",
    "    per_device_train_batch_size = 1,      # Batch size per GPU\n",
    "    gradient_accumulation_steps = 4,      # Effective batch size = 1 * 4 = 4\n",
    "    warmup_steps = 5,\n",
    "    learning_rate = 2e-4,                 # Learning rate for the optimizer\n",
    "    optim = \"paged_adamw_8bit\",           # Optimizer\n",
    "    chat_template_path= \"tiny_aya_chat_template.jinja\",  # Use the tool-aware chat template\n",
    "\n",
    "    # Logging / reporting\n",
    "    logging_steps=1,                      # Log training metrics every N steps\n",
    "    report_to=\"trackio\",                  # Experiment tracking tool\n",
    "    trackio_space_id=output_dir,          # HF Space where the experiment tracking will be saved\n",
    "    output_dir=output_dir,                # Where to save model checkpoints and logs\n",
    "\n",
    "    max_length=1024,                      # Maximum input sequence length\n",
    "    activation_offloading=True,           # Offload activations to CPU to reduce GPU memory usage\n",
    "\n",
    "    # Hub integration\n",
    "    push_to_hub=True,                     # Automatically push the trained model to the Hugging Face Hub\n",
    "                                          # The model will be saved under your Hub account in the repository named `output_dir`\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "08b8f06974874b15b030cea99ad74e25",
      "93ab294914624fe08ebf220fa2db5e8c",
      "f13449d4661e42d0a1e745495f03a35e",
      "d390249e870b44e3a6fa02abeadcc779"
     ]
    },
    "id": "LM-zo-ERXTLH",
    "outputId": "935f5fd7-e626-4b4b-9d2e-f476da296cd4"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "08b8f06974874b15b030cea99ad74e25",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Tokenizing train dataset:   0%|          | 0/20 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "93ab294914624fe08ebf220fa2db5e8c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Truncating train dataset:   0%|          | 0/20 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f13449d4661e42d0a1e745495f03a35e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Tokenizing eval dataset:   0%|          | 0/20 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d390249e870b44e3a6fa02abeadcc779",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Truncating eval dataset:   0%|          | 0/20 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from trl import SFTTrainer\n",
    "\n",
    "trainer = SFTTrainer(\n",
    "    model=model,\n",
    "    args=training_args,\n",
    "    train_dataset=dataset['train'],\n",
    "    peft_config=peft_config\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2qH3gx_peSqY"
   },
   "source": [
    "Show memory stats before training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ZZ0NHq3eXTLI",
    "outputId": "762c046d-a08f-4eb5-b582-62d938daea2c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.494 GB.\n",
      "4.648 GB of memory reserved.\n"
     ]
    }
   ],
   "source": [
    "gpu_stats = torch.cuda.get_device_properties(0)\n",
    "start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n",
    "max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n",
    "\n",
    "print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n",
    "print(f\"{start_gpu_memory} GB of memory reserved.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ADbht0eqeSqY"
   },
   "source": [
    "And train!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1WROkArkXTLI",
    "outputId": "c0c25c4d-0a1c-41f0-8cac-566d5a517ee2"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 6}.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "* Trackio project initialized: huggingface\n",
      "* Trackio metrics will be synced to Hugging Face Dataset: sergiopaniego/tiny-aya-global-SFT-dataset\n",
      "* Creating new space: https://huggingface.co/spaces/sergiopaniego/tiny-aya-global-SFT\n",
      "* View dashboard by going to: https://sergiopaniego-tiny-aya-global-SFT.hf.space/\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"https://sergiopaniego-tiny-aya-global-SFT.hf.space/\" width=\"100%\" height=\"1000px\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "* GPU detected, enabling automatic GPU metrics logging\n",
      "* Created new run: sergiopaniego-1771428231\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div>\n",
       "      \n",
       "      <progress value='15' max='15' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
       "      [15/15 00:52, Epoch 3/3]\n",
       "    </div>\n",
       "    <table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       " <tr style=\"text-align: left;\">\n",
       "      <th>Step</th>\n",
       "      <th>Training Loss</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>3.095131</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>3.083373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2.951535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2.625918</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>2.254464</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>1.939976</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>1.694891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>1.558982</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>1.430660</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>1.305176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>1.192725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>1.120383</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>1.052859</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>0.985858</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>0.970833</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table><p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "* Run finished. Uploading logs to Trackio (please wait...)\n"
     ]
    }
   ],
   "source": [
    "trainer_stats = trainer.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4MGKFi1-eSqY"
   },
   "source": [
    "Show memory stats after training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3f68GA6TXTLI",
    "outputId": "321e90ee-757a-41fc-c6a2-4ba40a6e6b3c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "59.2841 seconds used for training.\n",
      "0.99 minutes used for training.\n",
      "Peak reserved memory = 11.928 GB.\n",
      "Peak reserved memory for training = 7.28 GB.\n",
      "Peak reserved memory % of max memory = 30.202 %.\n",
      "Peak reserved memory for training % of max memory = 18.433 %.\n"
     ]
    }
   ],
   "source": [
    "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n",
    "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n",
    "used_percentage = round(used_memory / max_memory * 100, 3)\n",
    "lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n",
    "\n",
    "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n",
    "print(f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\")\n",
    "print(f\"Peak reserved memory = {used_memory} GB.\")\n",
    "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n",
    "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n",
    "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ONWy4NOAeSqY"
   },
   "source": [
    "## Save the Fine-Tuned Model\n",
    "\n",
    "Save the trained LoRA adapter locally and push it to the Hugging Face Hub."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "4951424bb90e4dbbaea8c9b88c592872",
      "1669afd0e52443d090adab0fbe663c66",
      "5ee5f6b74e7246eea99c0d84c2a27bc0",
      "15b4a13592c14102af0d3f8a999f3d36",
      "7216a7d56c364a0d92e079d9848946d3",
      "83e8f73e00004b718cbba7be0ecc45e1",
      "05db21352f614288864f88c1ba794ee9",
      "d7821b8cd21f4fb78237479ff081511b",
      "00c9f462a4584b22b1e38dfcc5f86af3",
      "22cacea841ba48c29b7a74ea17a50b4e"
     ]
    },
    "id": "9qz-fRZyXTLI",
    "outputId": "9ff41250-0786-4ec6-fe41-dfc6b611d0b5"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4951424bb90e4dbbaea8c9b88c592872",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Processing Files (0 / 0)      : |          |  0.00B /  0.00B            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1669afd0e52443d090adab0fbe663c66",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "New Data Upload               : |          |  0.00B /  0.00B            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5ee5f6b74e7246eea99c0d84c2a27bc0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  ...bal-SFT/training_args.bin: 100%|##########| 5.58kB / 5.58kB            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "15b4a13592c14102af0d3f8a999f3d36",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  ...global-SFT/tokenizer.json: 100%|##########| 21.4MB / 21.4MB            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7216a7d56c364a0d92e079d9848946d3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  ...adapter_model.safetensors:  35%|###4      | 41.9MB /  121MB            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "83e8f73e00004b718cbba7be0ecc45e1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Processing Files (0 / 0)      : |          |  0.00B /  0.00B            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "05db21352f614288864f88c1ba794ee9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "New Data Upload               : |          |  0.00B /  0.00B            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d7821b8cd21f4fb78237479ff081511b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  ...bal-SFT/training_args.bin: 100%|##########| 5.58kB / 5.58kB            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "00c9f462a4584b22b1e38dfcc5f86af3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  ...adapter_model.safetensors:  35%|###4      | 41.9MB /  121MB            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "22cacea841ba48c29b7a74ea17a50b4e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  ...global-SFT/tokenizer.json: 100%|##########| 21.4MB / 21.4MB            "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "CommitInfo(commit_url='https://huggingface.co/sergiopaniego/tiny-aya-global-SFT/commit/c59baa62c6bb5a3c3be2d33b482522a00783a5b4', commit_message='End of training', commit_description='', oid='c59baa62c6bb5a3c3be2d33b482522a00783a5b4', pr_url=None, repo_url=RepoUrl('https://huggingface.co/sergiopaniego/tiny-aya-global-SFT', endpoint='https://huggingface.co', repo_type='model', repo_id='sergiopaniego/tiny-aya-global-SFT'), pr_revision=None, pr_num=None)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trainer.save_model(output_dir)\n",
    "trainer.push_to_hub(dataset_name=dataset_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wNA4AIE4SiUg"
   },
   "source": [
    "## Load the Fine-Tuned Model and Run Inference\n",
    "\n",
    "Load the trained LoRA adapter on top of the base model and merge it into the weights for efficient inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "9d6a109e605d440ab2c115d969796859"
     ]
    },
    "id": "b5CmxYtpXTLI",
    "outputId": "10ebe012-9ffe-4096-f155-648af855aa80"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9d6a109e605d440ab2c115d969796859",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading weights:   0%|          | 0/290 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "Cohere2ForCausalLM(\n",
       "  (model): Cohere2Model(\n",
       "    (embed_tokens): Embedding(262144, 2048, padding_idx=0)\n",
       "    (layers): ModuleList(\n",
       "      (0-35): 36 x Cohere2DecoderLayer(\n",
       "        (self_attn): Cohere2Attention(\n",
       "          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)\n",
       "          (k_proj): Linear(in_features=2048, out_features=512, bias=False)\n",
       "          (v_proj): Linear(in_features=2048, out_features=512, bias=False)\n",
       "          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)\n",
       "        )\n",
       "        (mlp): Cohere2MLP(\n",
       "          (gate_proj): Linear(in_features=2048, out_features=11008, bias=False)\n",
       "          (up_proj): Linear(in_features=2048, out_features=11008, bias=False)\n",
       "          (down_proj): Linear(in_features=11008, out_features=2048, bias=False)\n",
       "          (act_fn): SiLUActivation()\n",
       "        )\n",
       "        (input_layernorm): Cohere2LayerNorm()\n",
       "      )\n",
       "    )\n",
       "    (norm): Cohere2LayerNorm()\n",
       "    (rotary_emb): Cohere2RotaryEmbedding()\n",
       "  )\n",
       "  (lm_head): Linear(in_features=2048, out_features=262144, bias=False)\n",
       ")"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "from peft import PeftModel\n",
    "\n",
    "# Load from output_dir to get the tokenizer with the updated chat template\n",
    "tokenizer = AutoTokenizer.from_pretrained(output_dir)\n",
    "\n",
    "base_model = AutoModelForCausalLM.from_pretrained(\n",
    "    model_id,\n",
    "    attn_implementation=\"sdpa\",\n",
    "    dtype=torch.float16,\n",
    "    device_map=\"auto\",\n",
    ")\n",
    "\n",
    "model = PeftModel.from_pretrained(base_model, output_dir)\n",
    "model = model.merge_and_unload()\n",
    "model.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "597CRjB332N1",
    "outputId": "2d44e300-fa14-4c07-f90c-3a383272f59a"
   },
   "source": [
    "Define a prediction function that uses `apply_chat_template` with `tools=TOOLS` to construct the prompt. The model generates a JSON tool call inside its native response delimiters; `skip_special_tokens=True` strips those delimiters, leaving just the JSON string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "AcG1y25fXTLI"
   },
   "outputs": [],
   "source": [
    "def generate_prediction(prompt):\n",
    "    text = tokenizer.apply_chat_template(\n",
    "        prompt, tools=TOOLS, tokenize=False, add_generation_prompt=True\n",
    "    )\n",
    "    model_inputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\n",
    "\n",
    "    generated_ids = model.generate(\n",
    "        **model_inputs,\n",
    "        max_new_tokens=512,\n",
    "    )\n",
    "    output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]\n",
    "    return tokenizer.decode(output_ids, skip_special_tokens=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EXIimS8iSiUi"
   },
   "source": [
    "Let's test the fine-tuned model on an example from the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "iMC9YFy6XTLI",
    "outputId": "0f84c976-aa1c-49a1-ed17-329bfb3fd0e8"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Query: [{'content': 'What is the latest version of Node.js?', 'role': 'user'}]\n",
      "Predicted Output: <tool_call>\n",
      "<function=search_google>\n",
      "<parameter=query>node.js latest version\n",
      "</parameter>\n",
      "</function>\n",
      "</tool_call>\n"
     ]
    }
   ],
   "source": [
    "sample_test_data = dataset[\"test\"][0] # Get a sample from the test set\n",
    "\n",
    "user_content = sample_test_data[\"prompt\"]\n",
    "\n",
    "print(f\"User Query: {user_content}\")\n",
    "\n",
    "predicted_output = generate_prediction(user_content)\n",
    "print(f\"Predicted Output: {predicted_output}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-r85c-aa7C7k"
   },
   "source": [
    "You can still use the strong multilingual model capabilities:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "UGePqQGVXTLI",
    "outputId": "adcd21ca-ca45-43d5-a3cc-02a47377e51b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User Query: [{'role': 'user', 'content': \"Explica en español qué significa la palabra japonesa 'ikigai' y da un ejemplo práctico.\"}]\n",
      "Predicted Output: <tool_call>\n",
      "<function=search_google>\n",
      "<parameter=query>ikigai significado y ejemplo\n",
      "</parameter>\n",
      "</function>\n",
      "</tool_call>\n"
     ]
    }
   ],
   "source": [
    "user_content = \"Explica en español qué significa la palabra japonesa 'ikigai' y da un ejemplo práctico.\" # Spanish question\n",
    "user_content = [{\"role\": \"user\", \"content\": user_content}]\n",
    "\n",
    "print(f\"User Query: {user_content}\")\n",
    "\n",
    "predicted_output = generate_prediction(user_content)\n",
    "print(f\"Predicted Output: {predicted_output}\")"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}