Multi-agent router fine-tuned model

Browse files

Files changed (3) hide show

README.md +35 -340
evaluation_results.json +68 -71
training_analysis_interactive.html +0 -0

README.md CHANGED Viewed

@@ -1,363 +1,58 @@
 ---
-language:
-- en
-license: gemma
 library_name: transformers
 tags:
-- function-calling
-- multi-agent
-- router
-- gemma
-- fine-tuned
-- customer-support
-base_model: google/functiongemma-270m-it
-datasets:
-- bhaiyahnsingh45/multiagent-router-finetuning
-metrics:
-- accuracy
-pipeline_tag: text-generation
-widget:
-- text: "My app keeps crashing when I upload large files"
-  example_title: "Technical Issue"
-- text: "I need a refund for my subscription"
-  example_title: "Billing Request"
-- text: "What integrations do you support?"
-  example_title: "Product Info"
 ---
-# Multi-Agent Router (Fine-tuned FunctionGemma 270M)
-<div align="center">
-  <img src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.png" alt="Hugging Face" width="100"/>
-  **Intelligent routing model for multi-agent customer support systems**
-  [![License: Gemma](https://img.shields.io/badge/License-Gemma-blue.svg)](https://ai.google.dev/gemma/terms)
-  [![Model: FunctionGemma](https://img.shields.io/badge/Model-FunctionGemma-orange.svg)](https://huggingface.co/google/functiongemma-270m-it)
-  [![Dataset](https://img.shields.io/badge/Dataset-Available-green.svg)](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning)
-</div>
-## 📋 Model Description
-This model is a **fine-tuned version of Google's FunctionGemma 270M** specifically trained for intelligent routing in multi-agent customer support systems. It learns to:
-1. **Classify user intent** from natural language queries
-2. **Route to the appropriate specialist agent**
-3. **Extract relevant parameters** (priority, urgency, category)
-### 🤖 Supported Agents
-The model routes queries to three specialized agents:
-| Agent | Handles | Parameters |
-|-------|---------|------------|
-| 🔧 **Technical Support** | Crashes, bugs, API errors, authentication issues | `issue_type`, `priority` |
-| 💰 **Billing** | Payments, refunds, subscriptions, invoices | `request_type`, `urgency` |
-| 📊 **Product Info** | Features, integrations, plans, compliance | `query_type`, `category` |
-## 🎯 Training Details
-### Base Model
-- **Model**: `google/functiongemma-270m-it`
-- **Parameters**: 270 Million
-- **Architecture**: Gemma with function calling capabilities
-### Fine-tuning Configuration
-- **Training Samples**: 92
-- **Test Samples**: 23
-- **Epochs**: 15
-- **Batch Size**: 4
-- **Learning Rate**: 5e-05
-- **GPU**: NVIDIA T4 (Google Colab Free Tier)
-- **Training Time**: ~5-8 minutes
-### Dataset
-Fine-tuned on [bhaiyahnsingh45/multiagent-router-finetuning](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning) containing 85 realistic customer support queries across three categories.
-## 📊 Performance
-| Metric | Before Training | After Training | Improvement |
-|--------|----------------|----------------|-------------|
-| **Accuracy** | 4.3% | 60.9% | **+56.5%** |
-| **Correct Predictions** | 1/23 | 14/23 | +13 |
-### Per-Agent Performance
-- **Technical Support**: High accuracy on crash reports, API errors, authentication issues
-- **Billing**: Excellent routing for refunds, payments, subscription management
-- **Product Info**: Strong performance on feature queries, integrations, compliance questions
-## 🚀 Quick Start
-### Installation
-```bash
-pip install transformers torch
-```
-### Basic Usage
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-import re
-import json
-# Load model and tokenizer
-model_name = "bhaiyahnsingh45/functiongemma-multiagent-router"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    device_map="auto",
-    torch_dtype="auto"
-)
-# Define your agent tools
-from transformers.utils import get_json_schema
-def technical_support_agent(issue_type: str, priority: str) -> str:
-    """
-    Routes technical issues to specialized support team.
-    Args:
-        issue_type: Type of technical issue (crash, authentication, performance, api_error, etc.)
-        priority: Priority level (low, medium, high)
-    """
-    return f"Routing to Technical Support: {issue_type} with {priority} priority"
-def billing_agent(request_type: str, urgency: str) -> str:
-    """
-    Routes billing and payment queries.
-    Args:
-        request_type: Type of request (refund, invoice, upgrade, cancellation, etc.)
-        urgency: How urgent (low, medium, high)
-    """
-    return f"Routing to Billing: {request_type} with {urgency} urgency"
-def product_info_agent(query_type: str, category: str) -> str:
-    """
-    Routes product information queries.
-    Args:
-        query_type: Type of query (features, comparison, integrations, limits, etc.)
-        category: Category (plans, storage, mobile, security, etc.)
-    """
-    return f"Routing to Product Info: {query_type} about {category}"
-# Get tool schemas
-AGENT_TOOLS = [
-    get_json_schema(technical_support_agent),
-    get_json_schema(billing_agent),
-    get_json_schema(product_info_agent)
-]
-# System message
-SYSTEM_MSG = "You are an intelligent routing agent that directs customer queries to the appropriate specialized agent."
-# Function to route queries
-def route_query(user_query: str):
-    """Route a user query to the appropriate agent"""
-    messages = [
-        {"role": "developer", "content": SYSTEM_MSG},
-        {"role": "user", "content": user_query}
-    ]
-    # Format prompt
-    inputs = tokenizer.apply_chat_template(
-        messages,
-        tools=AGENT_TOOLS,
-        add_generation_prompt=True,
-        return_dict=True,
-        return_tensors="pt"
-    )
-    # Generate
-    outputs = model.generate(
-        **inputs.to(model.device),
-        max_new_tokens=128,
-        pad_token_id=tokenizer.eos_token_id
-    )
-    # Decode
-    result = tokenizer.decode(
-        outputs[0][len(inputs["input_ids"][0]):],
-        skip_special_tokens=False
-    )
-    return result
-# Example usage
-query = "My app crashes when I try to upload large files"
-result = route_query(query)
-print(f"Query: {query}")
-print(f"Routing: {result}")
-```
-### Expected Output Format
-```
-<start_function_call>call:technical_support_agent{issue_type:crash,priority:high}<end_function_call>
-```
-## 💡 Usage Examples
-### Example 1: Technical Issue
-```python
-query = "I'm getting a 500 error when calling the API"
-result = route_query(query)
-# Output: technical_support_agent(issue_type="api_error", priority="high")
-```
-### Example 2: Billing Request
-```python
-query = "I need a refund for my annual subscription"
-result = route_query(query)
-# Output: billing_agent(request_type="refund", urgency="medium")
-```
-### Example 3: Product Question
-```python
-query = "What integrations do you support for project management?"
-result = route_query(query)
-# Output: product_info_agent(query_type="integrations", category="project_management")
-```
-## 🔧 Advanced Usage: Parse Function Calls
 ```python
-def parse_function_call(output: str) -> dict:
-    """Extract function name and arguments from model output"""
-    pattern = r'<start_function_call>call:(\w+)\{([^}]+)\}<end_function_call>'
-    match = re.search(pattern, output)
-    if match:
-        func_name = match.group(1)
-        params_str = match.group(2)
-        # Parse parameters
-        params = {}
-        param_pattern = r'(\w+):(?:<escape>(.*?)<escape>|([^,{}]+))'
-        for p_match in re.finditer(param_pattern, params_str):
-            key = p_match.group(1)
-            val = p_match.group(2) or p_match.group(3).strip()
-            params[key] = val
-        return {
-            "agent": func_name,
-            "parameters": params
-        }
-    return {"agent": "unknown", "parameters": {}}
-# Use it
-query = "I was charged twice this month"
-result = route_query(query)
-parsed = parse_function_call(result)
-print(parsed)
-# Output: {'agent': 'billing_agent', 'parameters': {'request_type': 'dispute', 'urgency': 'high'}}
 ```
-## 🏗️ Integration Example
-```python
-class MultiAgentRouter:
-    def __init__(self, model_name: str):
-        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-        self.model = AutoModelForCausalLM.from_pretrained(
-            model_name,
-            device_map="auto",
-            torch_dtype="auto"
-        )
-        self.system_msg = "You are an intelligent routing agent..."
-    def route(self, query: str) -> dict:
-        """Route query and return agent + parameters"""
-        messages = [
-            {"role": "developer", "content": self.system_msg},
-            {"role": "user", "content": query}
-        ]
-        inputs = self.tokenizer.apply_chat_template(
-            messages,
-            tools=AGENT_TOOLS,
-            add_generation_prompt=True,
-            return_dict=True,
-            return_tensors="pt"
-        )
-        outputs = self.model.generate(
-            **inputs.to(self.model.device),
-            max_new_tokens=128,
-            pad_token_id=self.tokenizer.eos_token_id
-        )
-        result = self.tokenizer.decode(
-            outputs[0][len(inputs["input_ids"][0]):],
-            skip_special_tokens=False
-        )
-        return parse_function_call(result)
-# Usage
-router = MultiAgentRouter("bhaiyahnsingh45/functiongemma-multiagent-router")
-routing = router.route("My payment failed but I don't know why")
-print(f"Route to: {routing['agent']}")
-print(f"Parameters: {routing['parameters']}")
-```
-## 📈 Evaluation
-The model was evaluated on a held-out test set of 23 queries:
-- **Routing Accuracy**: 60.9%
-- **False Positive Rate**: 39.1%
-- **Average Inference Time**: ~50ms on T4 GPU
-## ⚠️ Limitations
-1. **Language**: Currently supports English only
-2. **Domain**: Optimized for customer support; may need fine-tuning for other domains
-3. **Agents**: Limited to 3 agent types (can be extended with additional training)
-4. **Context**: Works best with single-turn queries; multi-turn conversations may need context handling
-5. **Edge Cases**: Ambiguous queries may require fallback logic
-## 🔮 Future Improvements
-- [ ] Add support for more languages
-- [ ] Expand to 5+ agent types (sales, feedback, onboarding)
-- [ ] Handle multi-turn conversations
-- [ ] Add confidence scores for routing decisions
-- [ ] Support for compound queries requiring multiple agents
-## 📝 Citation
 ```bibtex
-@misc{functiongemma_multiagent_router,
-  author = {Bhaiya Singh},
-  title = {Multi-Agent Router: Fine-tuned FunctionGemma for Customer Support},
-  year = {2025},
-  publisher = {Hugging Face},
-  howpublished = {\url{https://huggingface.co/bhaiyahnsingh45/functiongemma-multiagent-router}}
 }
-```
-## 📄 License
-This model inherits the [Gemma License](https://ai.google.dev/gemma/terms) from the base model.
-## 🙏 Acknowledgments
-- Base model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)
-- Training framework: [Hugging Face TRL](https://github.com/huggingface/trl)
-- Dataset: [bhaiyahnsingh45/multiagent-router-finetuning](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning)
-## 📧 Contact
-For questions, issues, or collaboration opportunities:
-- Open an issue on the [model repository](https://huggingface.co/bhaiyahnsingh45/functiongemma-multiagent-router)
-- Dataset issues: [dataset repository](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning)
----
-**Built with ❤️ using FunctionGemma and Hugging Face Transformers**

 ---
+base_model: google/functiongemma-270m-it
 library_name: transformers
+model_name: functiongemma-multiagent-router
 tags:
+- generated_from_trainer
+- sft
+- trl
+licence: license
 ---
+# Model Card for functiongemma-multiagent-router
+This model is a fine-tuned version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
 ```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="bhaiyahnsingh45/functiongemma-multiagent-router", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
 ```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- TRL: 0.26.2
+- Transformers: 4.57.3
+- Pytorch: 2.9.0+cu126
+- Datasets: 4.0.0
+- Tokenizers: 0.22.1
+## Citations
+Cite TRL as:
 ```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
 }
+```

evaluation_results.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "metadata": {
     "base_model": "google/functiongemma-270m-it",
-    "training_date": "2026-01-05 16:17:51",
     "num_train_samples": 92,
     "num_test_samples": 23,
     "num_epochs": 15,
@@ -15,13 +15,13 @@
       "total": 23
     },
     "after_training": {
-      "accuracy": 60.86956521739131,
-      "correct": 14,
       "total": 23
     },
     "improvement": {
-      "accuracy_gain": 56.52173913043479,
-      "additional_correct": 13
     }
   },
   "detailed_results": {
@@ -410,13 +410,13 @@
           "request_type": null,
           "urgency": null
         },
-        "predicted_agent": "product_info_agent",
         "predicted_arguments": {
-          "category": "API_REGIONAL",
-          "query_type": "API_CONNECT"
         },
-        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>API_REGIONAL<escape>,query_type:<escape>API_CONNECT<escape>}<end_function_call><start_function_response>",
-        "correct": false
       },
       {
         "query": "I need to change the billing email from old@company.com to new@company.com",
@@ -432,9 +432,9 @@
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
           "request_type": "change_billing_email",
-          "urgency": "low"
         },
-        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>change_billing_email<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -448,12 +448,9 @@
           "request_type": "tax",
           "urgency": "low"
         },
-        "predicted_agent": "product_info_agent",
-        "predicted_arguments": {
-          "category": "tax_exempt",
-          "query_type": "free/nonprofit"
-        },
-        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>tax_exempt<escape>,query_type:<escape>free/nonprofit<escape>}<end_function_call><start_function_response>",
         "correct": false
       },
       {
@@ -470,9 +467,9 @@
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
           "issue_type": "crash",
-          "priority": "medium"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>crash<escape>,priority:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -488,10 +485,10 @@
         },
         "predicted_agent": "product_info_agent",
         "predicted_arguments": {
-          "category": "storage",
-          "query_type": "comparison"
         },
-        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>storage<escape>,query_type:<escape>comparison<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -526,10 +523,10 @@
         },
         "predicted_agent": "product_info_agent",
         "predicted_arguments": {
-          "category": "pricing",
           "query_type": "features"
         },
-        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>pricing<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -545,10 +542,10 @@
         },
         "predicted_agent": "product_info_agent",
         "predicted_arguments": {
-          "category": "cloud",
           "query_type": "comparison"
         },
-        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>cloud<escape>,query_type:<escape>comparison<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -565,9 +562,9 @@
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
           "request_type": "GDPR",
-          "urgency": "high"
         },
-        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>GDPR<escape>,urgency:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": false
       },
       {
@@ -600,13 +597,13 @@
           "request_type": null,
           "urgency": null
         },
-        "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
-          "issue_type": "integration",
-          "priority": "high"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>integration<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
-        "correct": false
       },
       {
         "query": "I want to upgrade from Basic to Premium, what's the price difference?",
@@ -619,13 +616,13 @@
           "request_type": "upgrade",
           "urgency": "medium"
         },
-        "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
-          "issue_type": "upgrade",
-          "priority": "high"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>upgrade<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
-        "correct": false
       },
       {
         "query": "What's your uptime SLA for production environments?",
@@ -638,13 +635,13 @@
           "request_type": null,
           "urgency": null
         },
-        "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
-          "issue_type": "performance_exception",
-          "priority": "high"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>performance_exception<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
-        "correct": false
       },
       {
         "query": "I see two charges of $99 on my credit card for this month",
@@ -659,10 +656,10 @@
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
-          "request_type": "billing_charge",
-          "urgency": "low"
         },
-        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>billing_charge<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -676,13 +673,13 @@
           "request_type": "discount",
           "urgency": "low"
         },
-        "predicted_agent": "product_info_agent",
         "predicted_arguments": {
-          "category": "plans",
-          "query_type": "annual_details"
         },
-        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>plans<escape>,query_type:<escape>annual_details<escape>}<end_function_call><start_function_response>",
-        "correct": false
       },
       {
         "query": "The dashboard takes forever to load, sometimes over 30 seconds",
@@ -697,10 +694,10 @@
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
-          "issue_type": "api_error",
           "priority": "high"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>api_error<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -716,10 +713,10 @@
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
-          "request_type": "cancel_renewal",
           "urgency": "medium"
         },
-        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>cancel_renewal<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -735,10 +732,10 @@
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
-          "request_type": "credit_card_expiration",
-          "urgency": "medium"
         },
-        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>credit_card_expiration<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -754,10 +751,10 @@
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
-          "request_type": "payment",
-          "urgency": "low"
         },
-        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>payment<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -773,10 +770,10 @@
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
-          "issue_type": "premium",
           "priority": "high"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>premium<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": false
       },
       {
@@ -792,10 +789,10 @@
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
-          "issue_type": "error",
-          "priority": "high"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>error<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -811,10 +808,10 @@
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
-          "issue_type": "error",
           "priority": "high"
         },
-        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>error<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
@@ -831,9 +828,9 @@
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
           "request_type": "cancel_subscription",
-          "urgency": "medium"
         },
-        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>cancel_subscription<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": true
       }
     ]

 {
   "metadata": {
     "base_model": "google/functiongemma-270m-it",
+    "training_date": "2026-01-05 16:24:06",
     "num_train_samples": 92,
     "num_test_samples": 23,
     "num_epochs": 15,
       "total": 23
     },
     "after_training": {
+      "accuracy": 82.6086956521739,
+      "correct": 19,
       "total": 23
     },
     "improvement": {
+      "accuracy_gain": 78.26086956521739,
+      "additional_correct": 18
     }
   },
   "detailed_results": {
           "request_type": null,
           "urgency": null
         },
+        "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
+          "issue_type": "API_ERROR",
+          "priority": "medium"
         },
+        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>API_ERROR<escape>,priority:<escape>medium<escape>}<end_function_call><start_function_response>",
+        "correct": true
       },
       {
         "query": "I need to change the billing email from old@company.com to new@company.com",
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
           "request_type": "change_billing_email",
+          "urgency": "high"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>change_billing_email<escape>,urgency:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
           "request_type": "tax",
           "urgency": "low"
         },
+        "predicted_agent": "NONE",
+        "predicted_arguments": {},
+        "raw_output": "I am sorry, but I cannot assist with providing tax exemption forms. My current capabilities are limited to routing technical issues to the specialized technical support team.<end_of_turn>",
         "correct": false
       },
       {
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
           "issue_type": "crash",
+          "priority": "high"
         },
+        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>crash<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "product_info_agent",
         "predicted_arguments": {
+          "category": "plans",
+          "query_type": "features"
         },
+        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>plans<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "product_info_agent",
         "predicted_arguments": {
+          "category": "volumes",
           "query_type": "features"
         },
+        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>volumes<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "product_info_agent",
         "predicted_arguments": {
+          "category": "versions",
           "query_type": "comparison"
         },
+        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>versions<escape>,query_type:<escape>comparison<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
           "request_type": "GDPR",
+          "urgency": "medium"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>GDPR<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": false
       },
       {
           "request_type": null,
           "urgency": null
         },
+        "predicted_agent": "product_info_agent",
         "predicted_arguments": {
+          "category": "security",
+          "query_type": "features"
         },
+        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>security<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
+        "correct": true
       },
       {
         "query": "I want to upgrade from Basic to Premium, what's the price difference?",
           "request_type": "upgrade",
           "urgency": "medium"
         },
+        "predicted_agent": "billing_agent",
         "predicted_arguments": {
+          "request_type": "upgrade",
+          "urgency": "medium"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>upgrade<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
+        "correct": true
       },
       {
         "query": "What's your uptime SLA for production environments?",
           "request_type": null,
           "urgency": null
         },
+        "predicted_agent": "product_info_agent",
         "predicted_arguments": {
+          "category": "volumes",
+          "query_type": "availability"
         },
+        "raw_output": "<start_function_call>call:product_info_agent{category:<escape>volumes<escape>,query_type:<escape>availability<escape>}<end_function_call><start_function_response>",
+        "correct": true
       },
       {
         "query": "I see two charges of $99 on my credit card for this month",
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
+          "request_type": "charge_request",
+          "urgency": "high"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>charge_request<escape>,urgency:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
           "request_type": "discount",
           "urgency": "low"
         },
+        "predicted_agent": "billing_agent",
         "predicted_arguments": {
+          "request_type": "annual_plans",
+          "urgency": "low"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>annual_plans<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
+        "correct": true
       },
       {
         "query": "The dashboard takes forever to load, sometimes over 30 seconds",
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
+          "issue_type": "dashboard",
           "priority": "high"
         },
+        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>dashboard<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
+          "request_type": "billing_upgrade",
           "urgency": "medium"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>billing_upgrade<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
+          "request_type": "expiry",
+          "urgency": "low"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>expiry<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
+          "request_type": "refund",
+          "urgency": "medium"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>refund<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
+          "issue_type": "pricing",
           "priority": "high"
         },
+        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>pricing<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": false
       },
       {
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
+          "issue_type": "notifications",
+          "priority": "medium"
         },
+        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>notifications<escape>,priority:<escape>medium<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         },
         "predicted_agent": "technical_support_agent",
         "predicted_arguments": {
+          "issue_type": "performance",
           "priority": "high"
         },
+        "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>performance<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
         "correct": true
       },
       {
         "predicted_agent": "billing_agent",
         "predicted_arguments": {
           "request_type": "cancel_subscription",
+          "urgency": "low"
         },
+        "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>cancel_subscription<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
         "correct": true
       }
     ]

training_analysis_interactive.html CHANGED Viewed

The diff for this file is too large to render. See raw diff