bfcl-trending-models

Sleeping

App Files Files Community

dvilasuero HF Staff commited on 10 days ago

Commit

f4977c9

verified ·

1 Parent(s): 005831e

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +27 -7

README.md CHANGED Viewed

@@ -8,16 +8,36 @@ sdk_version: "latest"
 pinned: false
 ---
-# Inspect Evals/bfcl
-Live log viewer for eval results stored in [dvilasuero/bfcl-trending-models](https://huggingface.co/dvilasuero/bfcl-trending-models).
-This Space runs `inspect view` to display real-time evaluation logs from the dataset.
-## View Logs
-Logs are automatically displayed from: `hf://datasets/dvilasuero/bfcl-trending-models/logs`
-## Dataset
-Results are stored in: [dvilasuero/bfcl-trending-models](https://huggingface.co/dvilasuero/bfcl-trending-models)

 pinned: false
 ---
+# bfcl
+This eval was run using [evaljobs](https://github.com/dvsrepo/evaljobs).
+## Command
+```bash
+evaljobs inspect_evals/bfcl \
+  --model hf-inference-providers/moonshotai/Kimi-K2-Thinking,hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct,hf-inference-providers/openai/gpt-oss-20b,hf-inference-providers/zai-org/GLM-4.6,hf-inference-providers/openai/gpt-oss-120b,hf-inference-providers/deepseek-ai/DeepSeek-V3.2-Exp,hf-inference-providers/meta-llama/Llama-3.2-3B-Instruct,hf-inference-providers/Qwen/Qwen2.5-7B-Instruct,hf-inference-providers/Qwen/Qwen3-4B-Instruct-2507,hf-inference-providers/deepseek-ai/DeepSeek-R1 \
+  --name bfcl-trending-models
+```
+## Run with other models
+To run this eval with a different model, use:
+```bash
+evaljobs inspect_evals/bfcl \
+  --model <your-model> \
+  --name <your-name> \
+  --flavor cpu-basic
+```
+## Inspect eval command
+The eval was executed with:
+```bash
+inspect eval-set inspect_evals/bfcl \
+  --model hf-inference-providers/moonshotai/Kimi-K2-Thinking,hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct,hf-inference-providers/openai/gpt-oss-20b,hf-inference-providers/zai-org/GLM-4.6,hf-inference-providers/openai/gpt-oss-120b,hf-inference-providers/deepseek-ai/DeepSeek-V3.2-Exp,hf-inference-providers/meta-llama/Llama-3.2-3B-Instruct,hf-inference-providers/Qwen/Qwen2.5-7B-Instruct,hf-inference-providers/Qwen/Qwen3-4B-Instruct-2507,hf-inference-providers/deepseek-ai/DeepSeek-R1 \
+  --log-shared \
+  --log-buffer 100
+```