bhaiyahnsingh45 commited on
Commit
a36a613
·
verified ·
1 Parent(s): 32fc795

Multi-agent router fine-tuned model

Browse files
README.md CHANGED
@@ -1,363 +1,58 @@
1
  ---
2
- language:
3
- - en
4
- license: gemma
5
  library_name: transformers
 
6
  tags:
7
- - function-calling
8
- - multi-agent
9
- - router
10
- - gemma
11
- - fine-tuned
12
- - customer-support
13
- base_model: google/functiongemma-270m-it
14
- datasets:
15
- - bhaiyahnsingh45/multiagent-router-finetuning
16
- metrics:
17
- - accuracy
18
- pipeline_tag: text-generation
19
- widget:
20
- - text: "My app keeps crashing when I upload large files"
21
- example_title: "Technical Issue"
22
- - text: "I need a refund for my subscription"
23
- example_title: "Billing Request"
24
- - text: "What integrations do you support?"
25
- example_title: "Product Info"
26
  ---
27
 
28
- # Multi-Agent Router (Fine-tuned FunctionGemma 270M)
29
-
30
- <div align="center">
31
- <img src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.png" alt="Hugging Face" width="100"/>
32
-
33
- **Intelligent routing model for multi-agent customer support systems**
34
-
35
- [![License: Gemma](https://img.shields.io/badge/License-Gemma-blue.svg)](https://ai.google.dev/gemma/terms)
36
- [![Model: FunctionGemma](https://img.shields.io/badge/Model-FunctionGemma-orange.svg)](https://huggingface.co/google/functiongemma-270m-it)
37
- [![Dataset](https://img.shields.io/badge/Dataset-Available-green.svg)](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning)
38
- </div>
39
-
40
- ## 📋 Model Description
41
-
42
- This model is a **fine-tuned version of Google's FunctionGemma 270M** specifically trained for intelligent routing in multi-agent customer support systems. It learns to:
43
-
44
- 1. **Classify user intent** from natural language queries
45
- 2. **Route to the appropriate specialist agent**
46
- 3. **Extract relevant parameters** (priority, urgency, category)
47
-
48
- ### 🤖 Supported Agents
49
-
50
- The model routes queries to three specialized agents:
51
-
52
- | Agent | Handles | Parameters |
53
- |-------|---------|------------|
54
- | 🔧 **Technical Support** | Crashes, bugs, API errors, authentication issues | `issue_type`, `priority` |
55
- | 💰 **Billing** | Payments, refunds, subscriptions, invoices | `request_type`, `urgency` |
56
- | 📊 **Product Info** | Features, integrations, plans, compliance | `query_type`, `category` |
57
-
58
- ## 🎯 Training Details
59
-
60
- ### Base Model
61
- - **Model**: `google/functiongemma-270m-it`
62
- - **Parameters**: 270 Million
63
- - **Architecture**: Gemma with function calling capabilities
64
-
65
- ### Fine-tuning Configuration
66
- - **Training Samples**: 92
67
- - **Test Samples**: 23
68
- - **Epochs**: 15
69
- - **Batch Size**: 4
70
- - **Learning Rate**: 5e-05
71
- - **GPU**: NVIDIA T4 (Google Colab Free Tier)
72
- - **Training Time**: ~5-8 minutes
73
-
74
- ### Dataset
75
- Fine-tuned on [bhaiyahnsingh45/multiagent-router-finetuning](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning) containing 85 realistic customer support queries across three categories.
76
-
77
- ## 📊 Performance
78
-
79
- | Metric | Before Training | After Training | Improvement |
80
- |--------|----------------|----------------|-------------|
81
- | **Accuracy** | 4.3% | 60.9% | **+56.5%** |
82
- | **Correct Predictions** | 1/23 | 14/23 | +13 |
83
-
84
- ### Per-Agent Performance
85
- - **Technical Support**: High accuracy on crash reports, API errors, authentication issues
86
- - **Billing**: Excellent routing for refunds, payments, subscription management
87
- - **Product Info**: Strong performance on feature queries, integrations, compliance questions
88
-
89
- ## 🚀 Quick Start
90
-
91
- ### Installation
92
-
93
- ```bash
94
- pip install transformers torch
95
- ```
96
-
97
- ### Basic Usage
98
-
99
- ```python
100
- from transformers import AutoTokenizer, AutoModelForCausalLM
101
- import re
102
- import json
103
-
104
- # Load model and tokenizer
105
- model_name = "bhaiyahnsingh45/functiongemma-multiagent-router"
106
- tokenizer = AutoTokenizer.from_pretrained(model_name)
107
- model = AutoModelForCausalLM.from_pretrained(
108
- model_name,
109
- device_map="auto",
110
- torch_dtype="auto"
111
- )
112
-
113
- # Define your agent tools
114
- from transformers.utils import get_json_schema
115
-
116
- def technical_support_agent(issue_type: str, priority: str) -> str:
117
- """
118
- Routes technical issues to specialized support team.
119
-
120
- Args:
121
- issue_type: Type of technical issue (crash, authentication, performance, api_error, etc.)
122
- priority: Priority level (low, medium, high)
123
- """
124
- return f"Routing to Technical Support: {issue_type} with {priority} priority"
125
-
126
- def billing_agent(request_type: str, urgency: str) -> str:
127
- """
128
- Routes billing and payment queries.
129
-
130
- Args:
131
- request_type: Type of request (refund, invoice, upgrade, cancellation, etc.)
132
- urgency: How urgent (low, medium, high)
133
- """
134
- return f"Routing to Billing: {request_type} with {urgency} urgency"
135
-
136
- def product_info_agent(query_type: str, category: str) -> str:
137
- """
138
- Routes product information queries.
139
-
140
- Args:
141
- query_type: Type of query (features, comparison, integrations, limits, etc.)
142
- category: Category (plans, storage, mobile, security, etc.)
143
- """
144
- return f"Routing to Product Info: {query_type} about {category}"
145
-
146
- # Get tool schemas
147
- AGENT_TOOLS = [
148
- get_json_schema(technical_support_agent),
149
- get_json_schema(billing_agent),
150
- get_json_schema(product_info_agent)
151
- ]
152
-
153
- # System message
154
- SYSTEM_MSG = "You are an intelligent routing agent that directs customer queries to the appropriate specialized agent."
155
-
156
- # Function to route queries
157
- def route_query(user_query: str):
158
- """Route a user query to the appropriate agent"""
159
-
160
- messages = [
161
- {"role": "developer", "content": SYSTEM_MSG},
162
- {"role": "user", "content": user_query}
163
- ]
164
-
165
- # Format prompt
166
- inputs = tokenizer.apply_chat_template(
167
- messages,
168
- tools=AGENT_TOOLS,
169
- add_generation_prompt=True,
170
- return_dict=True,
171
- return_tensors="pt"
172
- )
173
-
174
- # Generate
175
- outputs = model.generate(
176
- **inputs.to(model.device),
177
- max_new_tokens=128,
178
- pad_token_id=tokenizer.eos_token_id
179
- )
180
-
181
- # Decode
182
- result = tokenizer.decode(
183
- outputs[0][len(inputs["input_ids"][0]):],
184
- skip_special_tokens=False
185
- )
186
 
187
- return result
 
188
 
189
- # Example usage
190
- query = "My app crashes when I try to upload large files"
191
- result = route_query(query)
192
- print(f"Query: {query}")
193
- print(f"Routing: {result}")
194
- ```
195
-
196
- ### Expected Output Format
197
-
198
- ```
199
- <start_function_call>call:technical_support_agent{issue_type:crash,priority:high}<end_function_call>
200
- ```
201
-
202
- ## 💡 Usage Examples
203
-
204
- ### Example 1: Technical Issue
205
- ```python
206
- query = "I'm getting a 500 error when calling the API"
207
- result = route_query(query)
208
- # Output: technical_support_agent(issue_type="api_error", priority="high")
209
- ```
210
-
211
- ### Example 2: Billing Request
212
- ```python
213
- query = "I need a refund for my annual subscription"
214
- result = route_query(query)
215
- # Output: billing_agent(request_type="refund", urgency="medium")
216
- ```
217
-
218
- ### Example 3: Product Question
219
- ```python
220
- query = "What integrations do you support for project management?"
221
- result = route_query(query)
222
- # Output: product_info_agent(query_type="integrations", category="project_management")
223
- ```
224
-
225
- ## 🔧 Advanced Usage: Parse Function Calls
226
 
227
  ```python
228
- def parse_function_call(output: str) -> dict:
229
- """Extract function name and arguments from model output"""
230
-
231
- pattern = r'<start_function_call>call:(\w+)\{([^}]+)\}<end_function_call>'
232
- match = re.search(pattern, output)
233
-
234
- if match:
235
- func_name = match.group(1)
236
- params_str = match.group(2)
237
-
238
- # Parse parameters
239
- params = {}
240
- param_pattern = r'(\w+):(?:<escape>(.*?)<escape>|([^,{}]+))'
241
- for p_match in re.finditer(param_pattern, params_str):
242
- key = p_match.group(1)
243
- val = p_match.group(2) or p_match.group(3).strip()
244
- params[key] = val
245
-
246
- return {
247
- "agent": func_name,
248
- "parameters": params
249
- }
250
 
251
- return {"agent": "unknown", "parameters": {}}
252
-
253
- # Use it
254
- query = "I was charged twice this month"
255
- result = route_query(query)
256
- parsed = parse_function_call(result)
257
- print(parsed)
258
- # Output: {'agent': 'billing_agent', 'parameters': {'request_type': 'dispute', 'urgency': 'high'}}
259
  ```
260
 
261
- ## 🏗️ Integration Example
262
-
263
- ```python
264
- class MultiAgentRouter:
265
- def __init__(self, model_name: str):
266
- self.tokenizer = AutoTokenizer.from_pretrained(model_name)
267
- self.model = AutoModelForCausalLM.from_pretrained(
268
- model_name,
269
- device_map="auto",
270
- torch_dtype="auto"
271
- )
272
- self.system_msg = "You are an intelligent routing agent..."
273
-
274
- def route(self, query: str) -> dict:
275
- """Route query and return agent + parameters"""
276
- messages = [
277
- {"role": "developer", "content": self.system_msg},
278
- {"role": "user", "content": query}
279
- ]
280
-
281
- inputs = self.tokenizer.apply_chat_template(
282
- messages,
283
- tools=AGENT_TOOLS,
284
- add_generation_prompt=True,
285
- return_dict=True,
286
- return_tensors="pt"
287
- )
288
-
289
- outputs = self.model.generate(
290
- **inputs.to(self.model.device),
291
- max_new_tokens=128,
292
- pad_token_id=self.tokenizer.eos_token_id
293
- )
294
-
295
- result = self.tokenizer.decode(
296
- outputs[0][len(inputs["input_ids"][0]):],
297
- skip_special_tokens=False
298
- )
299
-
300
- return parse_function_call(result)
301
-
302
- # Usage
303
- router = MultiAgentRouter("bhaiyahnsingh45/functiongemma-multiagent-router")
304
- routing = router.route("My payment failed but I don't know why")
305
- print(f"Route to: {routing['agent']}")
306
- print(f"Parameters: {routing['parameters']}")
307
- ```
308
 
309
- ## 📈 Evaluation
310
 
311
- The model was evaluated on a held-out test set of 23 queries:
312
 
313
- - **Routing Accuracy**: 60.9%
314
- - **False Positive Rate**: 39.1%
315
- - **Average Inference Time**: ~50ms on T4 GPU
316
 
317
- ## ⚠️ Limitations
318
 
319
- 1. **Language**: Currently supports English only
320
- 2. **Domain**: Optimized for customer support; may need fine-tuning for other domains
321
- 3. **Agents**: Limited to 3 agent types (can be extended with additional training)
322
- 4. **Context**: Works best with single-turn queries; multi-turn conversations may need context handling
323
- 5. **Edge Cases**: Ambiguous queries may require fallback logic
324
 
325
- ## 🔮 Future Improvements
326
 
327
- - [ ] Add support for more languages
328
- - [ ] Expand to 5+ agent types (sales, feedback, onboarding)
329
- - [ ] Handle multi-turn conversations
330
- - [ ] Add confidence scores for routing decisions
331
- - [ ] Support for compound queries requiring multiple agents
332
 
333
- ## 📝 Citation
334
 
 
 
335
  ```bibtex
336
- @misc{functiongemma_multiagent_router,
337
- author = {Bhaiya Singh},
338
- title = {Multi-Agent Router: Fine-tuned FunctionGemma for Customer Support},
339
- year = {2025},
340
- publisher = {Hugging Face},
341
- howpublished = {\url{https://huggingface.co/bhaiyahnsingh45/functiongemma-multiagent-router}}
 
342
  }
343
- ```
344
-
345
- ## 📄 License
346
-
347
- This model inherits the [Gemma License](https://ai.google.dev/gemma/terms) from the base model.
348
-
349
- ## 🙏 Acknowledgments
350
-
351
- - Base model: [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)
352
- - Training framework: [Hugging Face TRL](https://github.com/huggingface/trl)
353
- - Dataset: [bhaiyahnsingh45/multiagent-router-finetuning](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning)
354
-
355
- ## 📧 Contact
356
-
357
- For questions, issues, or collaboration opportunities:
358
- - Open an issue on the [model repository](https://huggingface.co/bhaiyahnsingh45/functiongemma-multiagent-router)
359
- - Dataset issues: [dataset repository](https://huggingface.co/datasets/bhaiyahnsingh45/multiagent-router-finetuning)
360
-
361
- ---
362
-
363
- **Built with ❤️ using FunctionGemma and Hugging Face Transformers**
 
1
  ---
2
+ base_model: google/functiongemma-270m-it
 
 
3
  library_name: transformers
4
+ model_name: functiongemma-multiagent-router
5
  tags:
6
+ - generated_from_trainer
7
+ - sft
8
+ - trl
9
+ licence: license
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
+ # Model Card for functiongemma-multiagent-router
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ This model is a fine-tuned version of [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
+ ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ```python
20
+ from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="bhaiyahnsingh45/functiongemma-multiagent-router", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
 
 
 
 
26
  ```
27
 
28
+ ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
+
31
 
 
32
 
33
+ This model was trained with SFT.
 
 
34
 
35
+ ### Framework versions
36
 
37
+ - TRL: 0.26.2
38
+ - Transformers: 4.57.3
39
+ - Pytorch: 2.9.0+cu126
40
+ - Datasets: 4.0.0
41
+ - Tokenizers: 0.22.1
42
 
43
+ ## Citations
44
 
 
 
 
 
 
45
 
 
46
 
47
+ Cite TRL as:
48
+
49
  ```bibtex
50
+ @misc{vonwerra2022trl,
51
+ title = {{TRL: Transformer Reinforcement Learning}},
52
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
53
+ year = 2020,
54
+ journal = {GitHub repository},
55
+ publisher = {GitHub},
56
+ howpublished = {\url{https://github.com/huggingface/trl}}
57
  }
58
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
evaluation_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "metadata": {
3
  "base_model": "google/functiongemma-270m-it",
4
- "training_date": "2026-01-05 16:17:51",
5
  "num_train_samples": 92,
6
  "num_test_samples": 23,
7
  "num_epochs": 15,
@@ -15,13 +15,13 @@
15
  "total": 23
16
  },
17
  "after_training": {
18
- "accuracy": 60.86956521739131,
19
- "correct": 14,
20
  "total": 23
21
  },
22
  "improvement": {
23
- "accuracy_gain": 56.52173913043479,
24
- "additional_correct": 13
25
  }
26
  },
27
  "detailed_results": {
@@ -410,13 +410,13 @@
410
  "request_type": null,
411
  "urgency": null
412
  },
413
- "predicted_agent": "product_info_agent",
414
  "predicted_arguments": {
415
- "category": "API_REGIONAL",
416
- "query_type": "API_CONNECT"
417
  },
418
- "raw_output": "<start_function_call>call:product_info_agent{category:<escape>API_REGIONAL<escape>,query_type:<escape>API_CONNECT<escape>}<end_function_call><start_function_response>",
419
- "correct": false
420
  },
421
  {
422
  "query": "I need to change the billing email from old@company.com to new@company.com",
@@ -432,9 +432,9 @@
432
  "predicted_agent": "billing_agent",
433
  "predicted_arguments": {
434
  "request_type": "change_billing_email",
435
- "urgency": "low"
436
  },
437
- "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>change_billing_email<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
438
  "correct": true
439
  },
440
  {
@@ -448,12 +448,9 @@
448
  "request_type": "tax",
449
  "urgency": "low"
450
  },
451
- "predicted_agent": "product_info_agent",
452
- "predicted_arguments": {
453
- "category": "tax_exempt",
454
- "query_type": "free/nonprofit"
455
- },
456
- "raw_output": "<start_function_call>call:product_info_agent{category:<escape>tax_exempt<escape>,query_type:<escape>free/nonprofit<escape>}<end_function_call><start_function_response>",
457
  "correct": false
458
  },
459
  {
@@ -470,9 +467,9 @@
470
  "predicted_agent": "technical_support_agent",
471
  "predicted_arguments": {
472
  "issue_type": "crash",
473
- "priority": "medium"
474
  },
475
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>crash<escape>,priority:<escape>medium<escape>}<end_function_call><start_function_response>",
476
  "correct": true
477
  },
478
  {
@@ -488,10 +485,10 @@
488
  },
489
  "predicted_agent": "product_info_agent",
490
  "predicted_arguments": {
491
- "category": "storage",
492
- "query_type": "comparison"
493
  },
494
- "raw_output": "<start_function_call>call:product_info_agent{category:<escape>storage<escape>,query_type:<escape>comparison<escape>}<end_function_call><start_function_response>",
495
  "correct": true
496
  },
497
  {
@@ -526,10 +523,10 @@
526
  },
527
  "predicted_agent": "product_info_agent",
528
  "predicted_arguments": {
529
- "category": "pricing",
530
  "query_type": "features"
531
  },
532
- "raw_output": "<start_function_call>call:product_info_agent{category:<escape>pricing<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
533
  "correct": true
534
  },
535
  {
@@ -545,10 +542,10 @@
545
  },
546
  "predicted_agent": "product_info_agent",
547
  "predicted_arguments": {
548
- "category": "cloud",
549
  "query_type": "comparison"
550
  },
551
- "raw_output": "<start_function_call>call:product_info_agent{category:<escape>cloud<escape>,query_type:<escape>comparison<escape>}<end_function_call><start_function_response>",
552
  "correct": true
553
  },
554
  {
@@ -565,9 +562,9 @@
565
  "predicted_agent": "billing_agent",
566
  "predicted_arguments": {
567
  "request_type": "GDPR",
568
- "urgency": "high"
569
  },
570
- "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>GDPR<escape>,urgency:<escape>high<escape>}<end_function_call><start_function_response>",
571
  "correct": false
572
  },
573
  {
@@ -600,13 +597,13 @@
600
  "request_type": null,
601
  "urgency": null
602
  },
603
- "predicted_agent": "technical_support_agent",
604
  "predicted_arguments": {
605
- "issue_type": "integration",
606
- "priority": "high"
607
  },
608
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>integration<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
609
- "correct": false
610
  },
611
  {
612
  "query": "I want to upgrade from Basic to Premium, what's the price difference?",
@@ -619,13 +616,13 @@
619
  "request_type": "upgrade",
620
  "urgency": "medium"
621
  },
622
- "predicted_agent": "technical_support_agent",
623
  "predicted_arguments": {
624
- "issue_type": "upgrade",
625
- "priority": "high"
626
  },
627
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>upgrade<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
628
- "correct": false
629
  },
630
  {
631
  "query": "What's your uptime SLA for production environments?",
@@ -638,13 +635,13 @@
638
  "request_type": null,
639
  "urgency": null
640
  },
641
- "predicted_agent": "technical_support_agent",
642
  "predicted_arguments": {
643
- "issue_type": "performance_exception",
644
- "priority": "high"
645
  },
646
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>performance_exception<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
647
- "correct": false
648
  },
649
  {
650
  "query": "I see two charges of $99 on my credit card for this month",
@@ -659,10 +656,10 @@
659
  },
660
  "predicted_agent": "billing_agent",
661
  "predicted_arguments": {
662
- "request_type": "billing_charge",
663
- "urgency": "low"
664
  },
665
- "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>billing_charge<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
666
  "correct": true
667
  },
668
  {
@@ -676,13 +673,13 @@
676
  "request_type": "discount",
677
  "urgency": "low"
678
  },
679
- "predicted_agent": "product_info_agent",
680
  "predicted_arguments": {
681
- "category": "plans",
682
- "query_type": "annual_details"
683
  },
684
- "raw_output": "<start_function_call>call:product_info_agent{category:<escape>plans<escape>,query_type:<escape>annual_details<escape>}<end_function_call><start_function_response>",
685
- "correct": false
686
  },
687
  {
688
  "query": "The dashboard takes forever to load, sometimes over 30 seconds",
@@ -697,10 +694,10 @@
697
  },
698
  "predicted_agent": "technical_support_agent",
699
  "predicted_arguments": {
700
- "issue_type": "api_error",
701
  "priority": "high"
702
  },
703
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>api_error<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
704
  "correct": true
705
  },
706
  {
@@ -716,10 +713,10 @@
716
  },
717
  "predicted_agent": "billing_agent",
718
  "predicted_arguments": {
719
- "request_type": "cancel_renewal",
720
  "urgency": "medium"
721
  },
722
- "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>cancel_renewal<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
723
  "correct": true
724
  },
725
  {
@@ -735,10 +732,10 @@
735
  },
736
  "predicted_agent": "billing_agent",
737
  "predicted_arguments": {
738
- "request_type": "credit_card_expiration",
739
- "urgency": "medium"
740
  },
741
- "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>credit_card_expiration<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
742
  "correct": true
743
  },
744
  {
@@ -754,10 +751,10 @@
754
  },
755
  "predicted_agent": "billing_agent",
756
  "predicted_arguments": {
757
- "request_type": "payment",
758
- "urgency": "low"
759
  },
760
- "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>payment<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
761
  "correct": true
762
  },
763
  {
@@ -773,10 +770,10 @@
773
  },
774
  "predicted_agent": "technical_support_agent",
775
  "predicted_arguments": {
776
- "issue_type": "premium",
777
  "priority": "high"
778
  },
779
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>premium<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
780
  "correct": false
781
  },
782
  {
@@ -792,10 +789,10 @@
792
  },
793
  "predicted_agent": "technical_support_agent",
794
  "predicted_arguments": {
795
- "issue_type": "error",
796
- "priority": "high"
797
  },
798
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>error<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
799
  "correct": true
800
  },
801
  {
@@ -811,10 +808,10 @@
811
  },
812
  "predicted_agent": "technical_support_agent",
813
  "predicted_arguments": {
814
- "issue_type": "error",
815
  "priority": "high"
816
  },
817
- "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>error<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
818
  "correct": true
819
  },
820
  {
@@ -831,9 +828,9 @@
831
  "predicted_agent": "billing_agent",
832
  "predicted_arguments": {
833
  "request_type": "cancel_subscription",
834
- "urgency": "medium"
835
  },
836
- "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>cancel_subscription<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
837
  "correct": true
838
  }
839
  ]
 
1
  {
2
  "metadata": {
3
  "base_model": "google/functiongemma-270m-it",
4
+ "training_date": "2026-01-05 16:24:06",
5
  "num_train_samples": 92,
6
  "num_test_samples": 23,
7
  "num_epochs": 15,
 
15
  "total": 23
16
  },
17
  "after_training": {
18
+ "accuracy": 82.6086956521739,
19
+ "correct": 19,
20
  "total": 23
21
  },
22
  "improvement": {
23
+ "accuracy_gain": 78.26086956521739,
24
+ "additional_correct": 18
25
  }
26
  },
27
  "detailed_results": {
 
410
  "request_type": null,
411
  "urgency": null
412
  },
413
+ "predicted_agent": "technical_support_agent",
414
  "predicted_arguments": {
415
+ "issue_type": "API_ERROR",
416
+ "priority": "medium"
417
  },
418
+ "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>API_ERROR<escape>,priority:<escape>medium<escape>}<end_function_call><start_function_response>",
419
+ "correct": true
420
  },
421
  {
422
  "query": "I need to change the billing email from old@company.com to new@company.com",
 
432
  "predicted_agent": "billing_agent",
433
  "predicted_arguments": {
434
  "request_type": "change_billing_email",
435
+ "urgency": "high"
436
  },
437
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>change_billing_email<escape>,urgency:<escape>high<escape>}<end_function_call><start_function_response>",
438
  "correct": true
439
  },
440
  {
 
448
  "request_type": "tax",
449
  "urgency": "low"
450
  },
451
+ "predicted_agent": "NONE",
452
+ "predicted_arguments": {},
453
+ "raw_output": "I am sorry, but I cannot assist with providing tax exemption forms. My current capabilities are limited to routing technical issues to the specialized technical support team.<end_of_turn>",
 
 
 
454
  "correct": false
455
  },
456
  {
 
467
  "predicted_agent": "technical_support_agent",
468
  "predicted_arguments": {
469
  "issue_type": "crash",
470
+ "priority": "high"
471
  },
472
+ "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>crash<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
473
  "correct": true
474
  },
475
  {
 
485
  },
486
  "predicted_agent": "product_info_agent",
487
  "predicted_arguments": {
488
+ "category": "plans",
489
+ "query_type": "features"
490
  },
491
+ "raw_output": "<start_function_call>call:product_info_agent{category:<escape>plans<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
492
  "correct": true
493
  },
494
  {
 
523
  },
524
  "predicted_agent": "product_info_agent",
525
  "predicted_arguments": {
526
+ "category": "volumes",
527
  "query_type": "features"
528
  },
529
+ "raw_output": "<start_function_call>call:product_info_agent{category:<escape>volumes<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
530
  "correct": true
531
  },
532
  {
 
542
  },
543
  "predicted_agent": "product_info_agent",
544
  "predicted_arguments": {
545
+ "category": "versions",
546
  "query_type": "comparison"
547
  },
548
+ "raw_output": "<start_function_call>call:product_info_agent{category:<escape>versions<escape>,query_type:<escape>comparison<escape>}<end_function_call><start_function_response>",
549
  "correct": true
550
  },
551
  {
 
562
  "predicted_agent": "billing_agent",
563
  "predicted_arguments": {
564
  "request_type": "GDPR",
565
+ "urgency": "medium"
566
  },
567
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>GDPR<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
568
  "correct": false
569
  },
570
  {
 
597
  "request_type": null,
598
  "urgency": null
599
  },
600
+ "predicted_agent": "product_info_agent",
601
  "predicted_arguments": {
602
+ "category": "security",
603
+ "query_type": "features"
604
  },
605
+ "raw_output": "<start_function_call>call:product_info_agent{category:<escape>security<escape>,query_type:<escape>features<escape>}<end_function_call><start_function_response>",
606
+ "correct": true
607
  },
608
  {
609
  "query": "I want to upgrade from Basic to Premium, what's the price difference?",
 
616
  "request_type": "upgrade",
617
  "urgency": "medium"
618
  },
619
+ "predicted_agent": "billing_agent",
620
  "predicted_arguments": {
621
+ "request_type": "upgrade",
622
+ "urgency": "medium"
623
  },
624
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>upgrade<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
625
+ "correct": true
626
  },
627
  {
628
  "query": "What's your uptime SLA for production environments?",
 
635
  "request_type": null,
636
  "urgency": null
637
  },
638
+ "predicted_agent": "product_info_agent",
639
  "predicted_arguments": {
640
+ "category": "volumes",
641
+ "query_type": "availability"
642
  },
643
+ "raw_output": "<start_function_call>call:product_info_agent{category:<escape>volumes<escape>,query_type:<escape>availability<escape>}<end_function_call><start_function_response>",
644
+ "correct": true
645
  },
646
  {
647
  "query": "I see two charges of $99 on my credit card for this month",
 
656
  },
657
  "predicted_agent": "billing_agent",
658
  "predicted_arguments": {
659
+ "request_type": "charge_request",
660
+ "urgency": "high"
661
  },
662
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>charge_request<escape>,urgency:<escape>high<escape>}<end_function_call><start_function_response>",
663
  "correct": true
664
  },
665
  {
 
673
  "request_type": "discount",
674
  "urgency": "low"
675
  },
676
+ "predicted_agent": "billing_agent",
677
  "predicted_arguments": {
678
+ "request_type": "annual_plans",
679
+ "urgency": "low"
680
  },
681
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>annual_plans<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
682
+ "correct": true
683
  },
684
  {
685
  "query": "The dashboard takes forever to load, sometimes over 30 seconds",
 
694
  },
695
  "predicted_agent": "technical_support_agent",
696
  "predicted_arguments": {
697
+ "issue_type": "dashboard",
698
  "priority": "high"
699
  },
700
+ "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>dashboard<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
701
  "correct": true
702
  },
703
  {
 
713
  },
714
  "predicted_agent": "billing_agent",
715
  "predicted_arguments": {
716
+ "request_type": "billing_upgrade",
717
  "urgency": "medium"
718
  },
719
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>billing_upgrade<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
720
  "correct": true
721
  },
722
  {
 
732
  },
733
  "predicted_agent": "billing_agent",
734
  "predicted_arguments": {
735
+ "request_type": "expiry",
736
+ "urgency": "low"
737
  },
738
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>expiry<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
739
  "correct": true
740
  },
741
  {
 
751
  },
752
  "predicted_agent": "billing_agent",
753
  "predicted_arguments": {
754
+ "request_type": "refund",
755
+ "urgency": "medium"
756
  },
757
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>refund<escape>,urgency:<escape>medium<escape>}<end_function_call><start_function_response>",
758
  "correct": true
759
  },
760
  {
 
770
  },
771
  "predicted_agent": "technical_support_agent",
772
  "predicted_arguments": {
773
+ "issue_type": "pricing",
774
  "priority": "high"
775
  },
776
+ "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>pricing<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
777
  "correct": false
778
  },
779
  {
 
789
  },
790
  "predicted_agent": "technical_support_agent",
791
  "predicted_arguments": {
792
+ "issue_type": "notifications",
793
+ "priority": "medium"
794
  },
795
+ "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>notifications<escape>,priority:<escape>medium<escape>}<end_function_call><start_function_response>",
796
  "correct": true
797
  },
798
  {
 
808
  },
809
  "predicted_agent": "technical_support_agent",
810
  "predicted_arguments": {
811
+ "issue_type": "performance",
812
  "priority": "high"
813
  },
814
+ "raw_output": "<start_function_call>call:technical_support_agent{issue_type:<escape>performance<escape>,priority:<escape>high<escape>}<end_function_call><start_function_response>",
815
  "correct": true
816
  },
817
  {
 
828
  "predicted_agent": "billing_agent",
829
  "predicted_arguments": {
830
  "request_type": "cancel_subscription",
831
+ "urgency": "low"
832
  },
833
+ "raw_output": "<start_function_call>call:billing_agent{request_type:<escape>cancel_subscription<escape>,urgency:<escape>low<escape>}<end_function_call><start_function_response>",
834
  "correct": true
835
  }
836
  ]
training_analysis_interactive.html CHANGED
The diff for this file is too large to render. See raw diff