prem-research
/

Funcdex-1.7B

Agentic Learning

Model card Files Files and versions

ojus1 commited on Nov 15, 2025

Commit

b4bb154

·

verified ·

1 Parent(s): fbccbfc

Update README.md

Files changed (1) hide show

README.md +14 -14

README.md CHANGED Viewed

@@ -33,20 +33,6 @@ The code used to generate the dataset can be found [here](https://github.com/pre
   <img src="assets/line_plot.png" alt="Line Plot" width="80%">
 </div>
-## Inference
-- Given a conversation, we extract all tuples `(context_messages, function_calls)` and use it to generate predictions. We ignore the `content` field and only evaluate `function_calls` generated by an LLM.
-- We use vLLM deployment with `tool_choice="auto"`.
-## Metrics
-Given a list of predicted and reference function calls, we report two metrics:
-- **Function Call String Match (SR)**: We perform greedy match and report best-matched string ratio using `difflib.SequenceMatcher.ratio`. The number reported is average string ratio.
-- **Exact Match (EM)**: Same as above, but we perform exact string match instead. The number reported is EM F1 Score.
-EM is a strict metric, and penalizes string arguments in function calls that may be "okay", e.g. `"email_content": "This is an example."` v/s `"email_content": "This is an Example."`, both only differ by one letter.
 ## Results
 ### BFCL v3
@@ -483,6 +469,20 @@ EM is a strict metric, and penalizes string arguments in function calls that may
 </table>
 # Quickstart
 ```python

   <img src="assets/line_plot.png" alt="Line Plot" width="80%">
 </div>
 ## Results
 ### BFCL v3
 </table>
+## Inference
+- Given a conversation, we extract all tuples `(context_messages, function_calls)` and use it to generate predictions. We ignore the `content` field and only evaluate `function_calls` generated by an LLM.
+- We use vLLM deployment with `tool_choice="auto"`.
+## Metrics
+Given a list of predicted and reference function calls, we report two metrics:
+- **Function Call String Match (SR)**: We perform greedy match and report best-matched string ratio using `difflib.SequenceMatcher.ratio`. The number reported is average string ratio.
+- **Exact Match (EM)**: Same as above, but we perform exact string match instead. The number reported is EM F1 Score.
+EM is a strict metric, and penalizes string arguments in function calls that may be "okay", e.g. `"email_content": "This is an example."` v/s `"email_content": "This is an Example."`, both only differ by one letter.
 # Quickstart
 ```python