| | --- |
| | library_name: transformers |
| | license: mit |
| | task_categories: |
| | - text-generation |
| | language: |
| | - en |
| | tags: |
| | - agent |
| | - Agentic Learning |
| | - tool use |
| | - BFCL |
| | --- |
| | |
| | [](https://huggingface.co/collections/prem-research/funcdex) [](https://huggingface.co/datasets/prem-research/Funcdex-MT-Function-Calling) [](https://github.com/prem-research/Funcdex-Synthesizer) [](https://www.premai.io/) |
| |
|
| | # Funcdex-0.6B-gmail_googlecalendar |
| | |
| | <div align="center"> |
| | <img src="assets/funcdex_hero.png" alt="Funcdex Hero" width="70%"> |
| | </div> |
| | |
| | Funcdex-0.6B is a research preview model by Prem Labs. It has been trained on a mix of [Funcdex-MT-Function-Calling](https://huggingface.co/datasets/prem-research/Funcdex-MT-Function-Calling), Instruct-Following, Single-turn function datasets. It is a LoRA finetune of Qwen3-0.6B (with thinking disabled). |
| | |
| | This model excels at Multi-turn Function Calling with tools from `gmail` and `googlecalendar`. |
| | |
| | The code used to generate the dataset can be found [here](https://github.com/prem-research/Funcdex-Synthesizer). |
| | |
| | # Evaluation |
| | |
| | |
| | <div align="center"> |
| | <img src="assets/line_plot.png" alt="Line Plot" width="80%"> |
| | </div> |
| | |
| | ## Results |
| | |
| | ### BFCL v3 |
| | - We filtered BFCLv3 examples relevant to the toolkits/bundles and report performance: |
| | - The filtered set is only 83 examples. Further emphasizing the need for workflow/toolkit-specialized workflows. |
| | |
| | <table border="1" class="dataframe"> |
| | <thead> |
| | <tr style="text-align: center;"> |
| | <th>LLM</th> |
| | <th>Acc %</th> |
| | </tr> |
| | </thead> |
| | <tbody> |
| | <tr style="text-align: center;"> |
| | <td>GPT-5 Mini<br>(medium)</td> |
| | <td>0.71</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td>Qwen3-1.7B</td> |
| | <td>0.82</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><strong><a href="https://huggingface.co/prem-research/Funcdex-1.7B">Funcdex-1.7B</a><strong></td> |
| | <td><strong>0.86</strong></td> |
| | </tr> |
| | </tbody> |
| | </table> |
| | |
| | |
| | ### Funcdex-MT: Overall Performance |
| | |
| | <table border="1" class="dataframe"> |
| | <thead> |
| | <tr style="text-align: center;"> |
| | <th>LLM</th> |
| | <th>Exact Match</th> |
| | <th>String Ratio</th> |
| | <th>Total Cost ($)</th> |
| | </tr> |
| | </thead> |
| | <tbody> |
| | <tr style="text-align: center;"> |
| | <td>GPT-OSS-120B<br>(medium)</td> |
| | <td>0.35</td> |
| | <td>0.51</td> |
| | <td>9.32</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td>GPT-5 Mini<br>(medium)</td> |
| | <td>0.35</td> |
| | <td>0.58</td> |
| | <td>99.71</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td>GPT-5<br>(minimal)</td> |
| | <td>0.18</td> |
| | <td>0.59</td> |
| | <td>205.45</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td>Qwen3-0.6B</td> |
| | <td>0.27</td> |
| | <td>0.59</td> |
| | <td>2.83</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td>Qwen3-1.7B</td> |
| | <td>0.27</td> |
| | <td>0.69</td> |
| | <td>5.73</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><strong><a href="https://huggingface.co/collections/prem-research/funcdex">Funcdex-0.6B</a></strong></td> |
| | <td><strong>0.39</strong></td> |
| | <td><strong>0.70</strong></td> |
| | <td><strong>0.19</strong></td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><strong><a href="https://huggingface.co/prem-research/Funcdex-1.7B">Funcdex-1.7B</a></strong></td> |
| | <td><strong>0.43</strong></td> |
| | <td><strong>0.81</strong></td> |
| | <td>5.64</td> |
| | </tr> |
| | </tbody> |
| | </table> |
| | |
| | ### Funcdex-MT: Toolkit-Level Performance |
| | |
| | <table border="1" class="dataframe"> |
| | <thead> |
| | <tr style="text-align: center;"> |
| | <th rowspan="2">Toolkit</th> |
| | <th colspan="2">GPT-OSS-120B<br>(medium)</th> |
| | <th colspan="2">GPT-5<br>(minimal)</th> |
| | <th colspan="2">GPT-5 Mini<br>(medium)</th> |
| | <th colspan="2">Qwen3-0.6B</th> |
| | <th colspan="3">Funcdex-0.6B</th> |
| | <th colspan="2">Qwen3-1.7B</th> |
| | <th colspan="3">Funcdex-1.7B</th> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>LoRA Checkpoint</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>LoRA Checkpoint</th> |
| | </tr> |
| | </thead> |
| | <tbody> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/asana.png" width="20" height="20" style="vertical-align: middle;"/> Asana</td> |
| | <td>0.38</td> |
| | <td>0.47</td> |
| | <td>0.12</td> |
| | <td>0.68</td> |
| | <td>0.49</td> |
| | <td>0.71</td> |
| | <td>0.33</td> |
| | <td>0.63</td> |
| | <td>0.46</td> |
| | <td>0.69</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-asana">🤗</a></td> |
| | <td>0.30</td> |
| | <td>0.79</td> |
| | <td>0.52</td> |
| | <td>0.82</td> |
| | <td rowspan="10"><a href="https://huggingface.co/prem-research/Funcdex-1.7B">🤗</a></td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/calendly.png" width="20" height="20" style="vertical-align: middle;"/> Calendly</td> |
| | <td>0.47</td> |
| | <td>0.56</td> |
| | <td>0.41</td> |
| | <td>0.63</td> |
| | <td>0.41</td> |
| | <td>0.56</td> |
| | <td>0.44</td> |
| | <td>0.66</td> |
| | <td>0.54</td> |
| | <td>0.78</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-calendly">🤗</a></td> |
| | <td>0.47</td> |
| | <td>0.74</td> |
| | <td>0.54</td> |
| | <td>0.86</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/gmail.png" width="20" height="20" style="vertical-align: middle;"/> Gmail</td> |
| | <td>0.48</td> |
| | <td>0.70</td> |
| | <td>0.24</td> |
| | <td>0.69</td> |
| | <td>0.50</td> |
| | <td>0.73</td> |
| | <td>0.27</td> |
| | <td>0.61</td> |
| | <td>0.47</td> |
| | <td>0.72</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-gmail">🤗</a></td> |
| | <td>0.31</td> |
| | <td>0.73</td> |
| | <td>0.53</td> |
| | <td>0.83</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/google-calendar.png" width="20" height="20" style="vertical-align: middle;"/> Calendar</td> |
| | <td>0.27</td> |
| | <td>0.52</td> |
| | <td>0.20</td> |
| | <td>0.50</td> |
| | <td>0.21</td> |
| | <td>0.51</td> |
| | <td>0.21</td> |
| | <td>0.53</td> |
| | <td>0.39</td> |
| | <td>0.74</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googlecalendar">🤗</a></td> |
| | <td>0.23</td> |
| | <td>0.64</td> |
| | <td>0.47</td> |
| | <td>0.83</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/docs.png" width="20" height="20" style="vertical-align: middle;"/> Docs</td> |
| | <td>0.19</td> |
| | <td>0.38</td> |
| | <td>0.07</td> |
| | <td>0.49</td> |
| | <td>0.18</td> |
| | <td>0.46</td> |
| | <td>0.07</td> |
| | <td>0.58</td> |
| | <td>0.13</td> |
| | <td>0.64</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledocs">🤗</a></td> |
| | <td>0.11</td> |
| | <td>0.62</td> |
| | <td>0.18</td> |
| | <td>0.79</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/google-drive.png" width="20" height="20" style="vertical-align: middle;"/> Drive</td> |
| | <td>0.34</td> |
| | <td>0.52</td> |
| | <td>0.19</td> |
| | <td>0.61</td> |
| | <td>0.38</td> |
| | <td>0.58</td> |
| | <td>0.26</td> |
| | <td>0.65</td> |
| | <td>0.40</td> |
| | <td>0.75</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledrive">🤗</a></td> |
| | <td>0.26</td> |
| | <td>0.73</td> |
| | <td>0.48</td> |
| | <td>0.82</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/jira.png" width="20" height="20" style="vertical-align: middle;"/> Jira</td> |
| | <td>0.47</td> |
| | <td>0.53</td> |
| | <td>0.17</td> |
| | <td>0.65</td> |
| | <td>0.47</td> |
| | <td>0.66</td> |
| | <td>0.51</td> |
| | <td>0.69</td> |
| | <td>0.58</td> |
| | <td>0.76</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-jira">🤗</a></td> |
| | <td>0.47</td> |
| | <td>0.76</td> |
| | <td>0.59</td> |
| | <td>0.83</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/stripe.png" width="20" height="20" style="vertical-align: middle;"/> Stripe</td> |
| | <td>0.15</td> |
| | <td>0.37</td> |
| | <td>0.10</td> |
| | <td>0.46</td> |
| | <td>0.12</td> |
| | <td>0.39</td> |
| | <td>0.08</td> |
| | <td>0.50</td> |
| | <td>0.17</td> |
| | <td>0.71</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-stripe">🤗</a></td> |
| | <td>0.09</td> |
| | <td>0.56</td> |
| | <td>0.16</td> |
| | <td>0.80</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/to-do-list.png" width="20" height="20" style="vertical-align: middle;"/> Todoist</td> |
| | <td>0.65</td> |
| | <td>0.74</td> |
| | <td>0.19</td> |
| | <td>0.72</td> |
| | <td>0.64</td> |
| | <td>0.79</td> |
| | <td>0.57</td> |
| | <td>0.87</td> |
| | <td>0.65</td> |
| | <td>0.88</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-todoist">🤗</a></td> |
| | <td>0.55</td> |
| | <td>0.91</td> |
| | <td>0.72</td> |
| | <td>0.94</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/whatsapp.png" width="20" height="20" style="vertical-align: middle;"/> Whatsapp</td> |
| | <td>0.23</td> |
| | <td>0.39</td> |
| | <td>0.13</td> |
| | <td>0.47</td> |
| | <td>0.24</td> |
| | <td>0.43</td> |
| | <td>0.20</td> |
| | <td>0.43</td> |
| | <td>0.28</td> |
| | <td>0.64</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-whatsapp">🤗</a></td> |
| | <td>0.26</td> |
| | <td>0.55</td> |
| | <td>0.31</td> |
| | <td>0.71</td> |
| | </tr> |
| | </tbody> |
| | </table> |
| | |
| | - Funcdex-0.6B are specialized models. Reported number is the average performance of each specific model in their respective subset. |
| | |
| | ### Funcdex-MT: Bundle/Multi-toolkit Performance: |
| | |
| | <table border="1" class="dataframe"> |
| | <thead> |
| | <tr style="text-align: center;"> |
| | <th rowspan="2">Bundle</th> |
| | <th colspan="2">GPT-OSS-120B<br>(medium)</th> |
| | <th colspan="2">GPT-5<br>(minimal)</th> |
| | <th colspan="2">GPT-5 Mini<br>(medium)</th> |
| | <th colspan="2">Qwen3-0.6B</th> |
| | <th colspan="3">Funcdex-0.6B</th> |
| | <th colspan="2">Qwen3-1.7B</th> |
| | <th colspan="3">Funcdex-1.7B</th> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>LoRA Checkpoint</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>EM</th> |
| | <th>SR</th> |
| | <th>LoRA Checkpoint</th> |
| | </tr> |
| | </thead> |
| | <tbody> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/gmail.png" width="20" height="20" style="vertical-align: middle;"/>Gmail<img src="assets/icons/google-calendar.png" width="20" height="20" style="vertical-align: middle;"/>Calendar</td> |
| | <td>0.28</td> |
| | <td>0.53</td> |
| | <td>0.15</td> |
| | <td>0.54</td> |
| | <td>0.22</td> |
| | <td>0.56</td> |
| | <td>0.19</td> |
| | <td>0.51</td> |
| | <td>0.26</td> |
| | <td>0.54</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-gmail_googlecalendar">🤗</a></td> |
| | <td>0.17</td> |
| | <td>0.61</td> |
| | <td>0.32</td> |
| | <td>0.71</td> |
| | <td rowspan="5"><a href="https://huggingface.co/prem-research/Funcdex-1.7B">🤗</a></td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/google-drive.png" width="20" height="20" style="vertical-align: middle;"/>Drive <img src="assets/icons/calendly.png" width="20" height="20" style="vertical-align: middle;"/> Calendly <img src="assets/icons/google-calendar.png" width="20" height="20" style="vertical-align: middle;"/> Calendar</td> |
| | <td>0.32</td> |
| | <td>0.45</td> |
| | <td>0.17</td> |
| | <td>0.52</td> |
| | <td>0.35</td> |
| | <td>0.47</td> |
| | <td>0.19</td> |
| | <td>0.49</td> |
| | <td>0.35</td> |
| | <td>0.60</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledrive_calendly_googlecalendar">🤗</a></td> |
| | <td>0.15</td> |
| | <td>0.66</td> |
| | <td>0.40</td> |
| | <td>0.78</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/google-drive.png" width="20" height="20" style="vertical-align: middle;"/>Drive <img src="assets/icons/docs.png" width="20" height="20" style="vertical-align: middle;"/> Docs</td> |
| | <td>0.28</td> |
| | <td>0.37</td> |
| | <td>0.12</td> |
| | <td>0.50</td> |
| | <td>0.33</td> |
| | <td>0.47</td> |
| | <td>0.18</td> |
| | <td>0.54</td> |
| | <td>0.34</td> |
| | <td>0.70</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-googledrive_googledocs">🤗</a></td> |
| | <td>0.19</td> |
| | <td>0.68</td> |
| | <td>0.43</td> |
| | <td>0.76</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/jira.png" width="20" height="20" style="vertical-align: middle;"/>Jira <img src="assets/icons/gmail.png" width="20" height="20" style="vertical-align: middle;"/> Gmail</td> |
| | <td>0.42</td> |
| | <td>0.60</td> |
| | <td>0.18</td> |
| | <td>0.66</td> |
| | <td>0.36</td> |
| | <td>0.66</td> |
| | <td>0.29</td> |
| | <td>0.61</td> |
| | <td>0.39</td> |
| | <td>0.71</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-jira_gmail">🤗</a></td> |
| | <td>0.28</td> |
| | <td>0.72</td> |
| | <td>0.44</td> |
| | <td>0.82</td> |
| | </tr> |
| | <tr style="text-align: center;"> |
| | <td><img src="assets/icons/whatsapp.png" width="20" height="20" style="vertical-align: middle;"/>Whatsapp <img src="assets/icons/to-do-list.png" width="20" height="20" style="vertical-align: middle;"/> Todoist</td> |
| | <td>0.32</td> |
| | <td>0.58</td> |
| | <td>0.19</td> |
| | <td>0.66</td> |
| | <td>0.35</td> |
| | <td>0.69</td> |
| | <td>0.26</td> |
| | <td>0.50</td> |
| | <td>0.41</td> |
| | <td>0.70</td> |
| | <td><a href="https://huggingface.co/prem-research/Funcdex-0.6B-whatsapp_todoist">🤗</a></td> |
| | <td>0.27</td> |
| | <td>0.68</td> |
| | <td>0.39</td> |
| | <td>0.77</td> |
| | </tr> |
| | </tbody> |
| | </table> |
| | |
| | |
| | ## Inference |
| | |
| | - Given a conversation, we extract all tuples `(context_messages, function_calls)` and use it to generate predictions. We ignore the `content` field and only evaluate `function_calls` generated by an LLM. |
| | - We use vLLM deployment with `tool_choice="auto"`. |
| |
|
| | ## Metrics |
| |
|
| | Given a list of predicted and reference function calls, we report two metrics: |
| | - **Function Call String Match (SR)**: We perform greedy match and report best-matched string ratio using `difflib.SequenceMatcher.ratio`. The number reported is average string ratio. |
| | - **Exact Match (EM)**: Same as above, but we perform exact string match instead. The number reported is EM F1 Score. |
| |
|
| | EM is a strict metric, and penalizes string arguments in function calls that may be "okay", e.g. `"email_content": "This is an example."` v/s `"email_content": "This is an Example."`, both only differ by one letter. |
| |
|
| |
|
| | # Quickstart |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | from peft import PeftModel |
| | import torch |
| | import json |
| | |
| | # Load model and tokenizer |
| | base_model_name = "ojus1/Qwen3-0.6B-Instruct" |
| | model_name = "prem-research/Funcdex-0.6B-gmail_googlecalendar" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | base_model_name, |
| | torch_dtype="auto", |
| | device_map="auto" |
| | ) |
| | |
| | model = PeftModel.from_pretrained( |
| | base_model, |
| | model_name, |
| | torch_dtype="auto", |
| | device_map="auto" |
| | ) |
| | |
| | # Define tools (Gmail + Calendar combined) |
| | tools = [ |
| | { |
| | "type": "function", |
| | "function": { |
| | "name": "FETCH_MESSAGE_BY_MESSAGE_ID", |
| | "description": "Fetch a specific email by message ID", |
| | "parameters": { |
| | "type": "object", |
| | "properties": { |
| | "message_id": {"type": "string", "description": "Message ID"}, |
| | "user_id": {"type": "string", "description": "User ID"} |
| | }, |
| | "required": ["message_id", "user_id"] |
| | } |
| | } |
| | }, |
| | { |
| | "type": "function", |
| | "function": { |
| | "name": "FORWARD_EMAIL_MESSAGE", |
| | "description": "Forward an email message", |
| | "parameters": { |
| | "type": "object", |
| | "properties": { |
| | "message_id": {"type": "string", "description": "Message ID to forward"}, |
| | "recipient_email": {"type": "string", "description": "Recipient email"}, |
| | "additional_text": {"type": "string", "description": "Additional message text"} |
| | }, |
| | "required": ["message_id", "recipient_email"] |
| | } |
| | } |
| | }, |
| | { |
| | "type": "function", |
| | "function": { |
| | "name": "CREATE_EVENT", |
| | "description": "Create a calendar event", |
| | "parameters": { |
| | "type": "object", |
| | "properties": { |
| | "summary": {"type": "string", "description": "Event title"}, |
| | "start_time": {"type": "string", "description": "Start time"}, |
| | "end_time": {"type": "string", "description": "End time"} |
| | }, |
| | "required": ["summary", "start_time", "end_time"] |
| | } |
| | } |
| | } |
| | ] |
| | |
| | # Define conversation |
| | messages = [ |
| | {"role": "system", "content": "You are a helpful assistant that can help with tasks by using tools."}, |
| | {"role": "user", "content": "Fetch the email with message ID 'msg_12345' for user 'me'."} |
| | ] |
| | |
| | # Apply chat template with tools |
| | formatted_input = tokenizer.apply_chat_template( |
| | messages, |
| | tools=tools, |
| | tokenize=False, |
| | add_generation_prompt=True |
| | ) |
| | |
| | # Tokenize and generate |
| | input_tokens = tokenizer(formatted_input, return_tensors="pt").to(model.device) |
| | output = model.generate(**input_tokens, max_new_tokens=256, do_sample=False) |
| | response = tokenizer.decode(output[0][input_tokens['input_ids'].shape[1]:], skip_special_tokens=True) |
| | |
| | print("Response:", response) |
| | ``` |
| |
|
| | ## Deployment with vLLM |
| |
|
| | `vllm serve ojus1/Qwen3-0.6B-Instruct --enable-lora --lora-modules prem-research/Funcdex-0.6B=prem-research/Funcdex-0.6B-gmail_googlecalendar --enable-auto-tool-choice --tool-call-parser hermes` |
| |
|
| | For best results, provide detailed system-prompt to steer the tool-use behaviour. |
| |
|
| | # License |
| |
|
| | The models, code and the dataset are licensed under MIT License. |