Elron commited on
Commit
99e4642
·
0 Parent(s):

Duplicate from open-agent-leaderboard/smolagent

Browse files
.eval_results/open_agent_leaderboard_openai_aws_claude-opus-4-5.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.6633
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: Claude Opus 4.5'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.7
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: Claude Opus 4.5'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.61
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: Claude Opus 4.5'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.65
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: Claude Opus 4.5'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.72
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: Claude Opus 4.5'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.78
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: Claude Opus 4.5'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.58
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: Claude Opus 4.5'
.eval_results/open_agent_leaderboard_openai_azure_deepseek-v3.2.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.4092
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: DeepSeek V3.2'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.13
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: DeepSeek V3.2'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.21
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: DeepSeek V3.2'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.56
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: DeepSeek V3.2'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.6
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: DeepSeek V3.2'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.77
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: DeepSeek V3.2'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.84
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: DeepSeek V3.2'
.eval_results/open_agent_leaderboard_openai_azure_gpt-5.2-2025-12-11.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.3796
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: GPT-5.2'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.07
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: GPT-5.2'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.26
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: GPT-5.2'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.5253
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: GPT-5.2'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.6
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: GPT-5.2'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.68
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: GPT-5.2'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.71
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: GPT-5.2'
.eval_results/open_agent_leaderboard_openai_azure_kimi-k2.5.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.42
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: Kimi K2.5'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.11
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: Kimi K2.5'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.33
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: Kimi K2.5'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.5761
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: Kimi K2.5'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.56
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: Kimi K2.5'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.7245
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: Kimi K2.5'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.7071
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: Kimi K2.5'
.eval_results/open_agent_leaderboard_openai_gcp_gemini-3-pro-preview.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.5569
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: Gemini 3 Pro'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.13
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: Gemini 3 Pro'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.57
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: Gemini 3 Pro'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.7576
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: Gemini 3 Pro'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.68
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: Gemini 3 Pro'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.75
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: Gemini 3 Pro'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.88
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: Gemini 3 Pro'
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - agent
4
+ - agent-evaluation
5
+ - agent-card
6
+ ---
7
+
8
+ # Smolagent
9
+
10
+ This is a tracking repo for [Smolagent](https://github.com/huggingface/smolagents), used by the [Open Agent Leaderboard](https://www.exgentic.ai) to report evaluation results on HuggingFace.
11
+
12
+ Hugging Face's lightweight agent framework using code-based actions rather than JSON tool calls.
13
+
14
+ - **Framework**: [smolagents](https://github.com/huggingface/smolagents)
15
+ - **Leaderboard**: [Open Agent Leaderboard](https://www.exgentic.ai)
16
+ - **Paper**: [arXiv:2602.22953](https://arxiv.org/abs/2602.22953)