dvilasuero HF Staff commited on
Commit
f4977c9
·
verified ·
1 Parent(s): 005831e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +27 -7
README.md CHANGED
@@ -8,16 +8,36 @@ sdk_version: "latest"
8
  pinned: false
9
  ---
10
 
11
- # Inspect Evals/bfcl
12
 
13
- Live log viewer for eval results stored in [dvilasuero/bfcl-trending-models](https://huggingface.co/dvilasuero/bfcl-trending-models).
14
 
15
- This Space runs `inspect view` to display real-time evaluation logs from the dataset.
16
 
17
- ## View Logs
 
 
 
 
18
 
19
- Logs are automatically displayed from: `hf://datasets/dvilasuero/bfcl-trending-models/logs`
20
 
21
- ## Dataset
22
 
23
- Results are stored in: [dvilasuero/bfcl-trending-models](https://huggingface.co/dvilasuero/bfcl-trending-models)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  pinned: false
9
  ---
10
 
11
+ # bfcl
12
 
13
+ This eval was run using [evaljobs](https://github.com/dvsrepo/evaljobs).
14
 
15
+ ## Command
16
 
17
+ ```bash
18
+ evaljobs inspect_evals/bfcl \
19
+ --model hf-inference-providers/moonshotai/Kimi-K2-Thinking,hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct,hf-inference-providers/openai/gpt-oss-20b,hf-inference-providers/zai-org/GLM-4.6,hf-inference-providers/openai/gpt-oss-120b,hf-inference-providers/deepseek-ai/DeepSeek-V3.2-Exp,hf-inference-providers/meta-llama/Llama-3.2-3B-Instruct,hf-inference-providers/Qwen/Qwen2.5-7B-Instruct,hf-inference-providers/Qwen/Qwen3-4B-Instruct-2507,hf-inference-providers/deepseek-ai/DeepSeek-R1 \
20
+ --name bfcl-trending-models
21
+ ```
22
 
23
+ ## Run with other models
24
 
25
+ To run this eval with a different model, use:
26
 
27
+ ```bash
28
+ evaljobs inspect_evals/bfcl \
29
+ --model <your-model> \
30
+ --name <your-name> \
31
+ --flavor cpu-basic
32
+ ```
33
+
34
+ ## Inspect eval command
35
+
36
+ The eval was executed with:
37
+
38
+ ```bash
39
+ inspect eval-set inspect_evals/bfcl \
40
+ --model hf-inference-providers/moonshotai/Kimi-K2-Thinking,hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct,hf-inference-providers/openai/gpt-oss-20b,hf-inference-providers/zai-org/GLM-4.6,hf-inference-providers/openai/gpt-oss-120b,hf-inference-providers/deepseek-ai/DeepSeek-V3.2-Exp,hf-inference-providers/meta-llama/Llama-3.2-3B-Instruct,hf-inference-providers/Qwen/Qwen2.5-7B-Instruct,hf-inference-providers/Qwen/Qwen3-4B-Instruct-2507,hf-inference-providers/deepseek-ai/DeepSeek-R1 \
41
+ --log-shared \
42
+ --log-buffer 100
43
+ ```