Elron commited on
Commit
f428ee9
·
0 Parent(s):

Duplicate from open-agent-leaderboard/react-shortlisting

Browse files
.eval_results/open_agent_leaderboard_openai_aws_claude-opus-4-5.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.6173
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: Claude Opus 4.5'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.64
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: Claude Opus 4.5'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.49
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: Claude Opus 4.5'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.6061
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: Claude Opus 4.5'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.66
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: Claude Opus 4.5'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.78
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: Claude Opus 4.5'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.76
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: Claude Opus 4.5'
.eval_results/open_agent_leaderboard_openai_azure_deepseek-v3.2.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.446
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: DeepSeek V3.2'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.04
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: DeepSeek V3.2'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.36
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: DeepSeek V3.2'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.6875
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: DeepSeek V3.2'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.56
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: DeepSeek V3.2'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.82
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: DeepSeek V3.2'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.71
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: DeepSeek V3.2'
.eval_results/open_agent_leaderboard_openai_azure_gpt-5.2-2025-12-11.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.4625
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: GPT-5.2'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.22
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: GPT-5.2'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.46
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: GPT-5.2'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.57
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: GPT-5.2'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.54
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: GPT-5.2'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.73
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: GPT-5.2'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.53
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: GPT-5.2'
.eval_results/open_agent_leaderboard_openai_azure_kimi-k2.5.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.4276
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: Kimi K2.5'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.1
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: Kimi K2.5'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.34
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: Kimi K2.5'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.5714
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: Kimi K2.5'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.62
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: Kimi K2.5'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.6465
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: Kimi K2.5'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.83
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: Kimi K2.5'
.eval_results/open_agent_leaderboard_openai_gcp_gemini-3-pro-preview.yaml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: open-agent-leaderboard/results
3
+ task_id: overall
4
+ value: 0.6225
5
+ source:
6
+ url: https://www.exgentic.ai
7
+ name: Open Agent Leaderboard
8
+ notes: 'model: Gemini 3 Pro'
9
+ - dataset:
10
+ id: open-agent-leaderboard/results
11
+ task_id: appworld
12
+ value: 0.55
13
+ source:
14
+ url: https://www.exgentic.ai
15
+ name: Open Agent Leaderboard
16
+ notes: 'model: Gemini 3 Pro'
17
+ - dataset:
18
+ id: open-agent-leaderboard/results
19
+ task_id: browsecomp_plus
20
+ value: 0.48
21
+ source:
22
+ url: https://www.exgentic.ai
23
+ name: Open Agent Leaderboard
24
+ notes: 'model: Gemini 3 Pro'
25
+ - dataset:
26
+ id: open-agent-leaderboard/results
27
+ task_id: swebench
28
+ value: 0.71
29
+ source:
30
+ url: https://www.exgentic.ai
31
+ name: Open Agent Leaderboard
32
+ notes: 'model: Gemini 3 Pro'
33
+ - dataset:
34
+ id: open-agent-leaderboard/results
35
+ task_id: taubench_airline
36
+ value: 0.7
37
+ source:
38
+ url: https://www.exgentic.ai
39
+ name: Open Agent Leaderboard
40
+ notes: 'model: Gemini 3 Pro'
41
+ - dataset:
42
+ id: open-agent-leaderboard/results
43
+ task_id: taubench_retail
44
+ value: 0.82
45
+ source:
46
+ url: https://www.exgentic.ai
47
+ name: Open Agent Leaderboard
48
+ notes: 'model: Gemini 3 Pro'
49
+ - dataset:
50
+ id: open-agent-leaderboard/results
51
+ task_id: taubench_telecom
52
+ value: 0.73
53
+ source:
54
+ url: https://www.exgentic.ai
55
+ name: Open Agent Leaderboard
56
+ notes: 'model: Gemini 3 Pro'
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - agent
4
+ - agent-evaluation
5
+ - agent-card
6
+ ---
7
+
8
+ # React + Shortlisting
9
+
10
+ This is a tracking repo for [React + Shortlisting](https://github.com/Exgentic/exgentic), used by the [Open Agent Leaderboard](https://www.exgentic.ai) to report evaluation results on HuggingFace.
11
+
12
+ ReAct agent with tool shortlisting — dynamically filters available tools per step to reduce context and improve accuracy.
13
+
14
+ - **Framework**: [litellm + exgentic](https://github.com/Exgentic/exgentic)
15
+ - **Leaderboard**: [Open Agent Leaderboard](https://www.exgentic.ai)
16
+ - **Paper**: [arXiv:2602.22953](https://arxiv.org/abs/2602.22953)