PeterKruger commited on
Commit
511127b
·
verified ·
1 Parent(s): 687a524

minor typo

Browse files
Files changed (1) hide show
  1. README.md +14 -15
README.md CHANGED
@@ -12,7 +12,7 @@ license: apache-2.0
12
 
13
  ## Organization Description
14
 
15
- **AutoBench** is the premier LLM evaluation and routing infrastructure for the Agentic Era. We are dedicated to solving the LLM evaluation crisis by moving the industry beyond static, domian-rigid, easily gameable text prompts and build the first open LLM-based API Router for the agentic era.
16
 
17
  Pioneering the **"Collective-LLM-as-a-Judge"** methodology, AutoBench uses massive pools of LLMs to dynamically generate tasks, execute multi-turn workflows, and granularly evaluate performance across the AI ecosystem. Today, AutoBench provides fully automated, highly correlated, and strictly un-gameable benchmarking. Furthermore, we leverage the massive synthetic execution datasets generated by our benchmarks to train next-generation **Agentic LLM Routers**, helping agent developers and enterprises optimize for both absolute quality and unit economics.
18
 
@@ -57,20 +57,6 @@ Our methodology is scientifically validated and continuously peer-reviewed. We e
57
  * **eZecute:** The venture builder for enabling the industrialization and scaling of this platform.
58
  * **AWS Startups:** For compute credits.
59
 
60
- ### Citation
61
- If you use AutoBench in your research, please cite our validation paper:
62
-
63
- ```bibtex
64
- @misc{autobench2025,
65
- title={AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment},
66
- author={AutoBench},
67
- year={2025},
68
- eprint={2510.22593},
69
- archivePrefix={arXiv},
70
- primaryClass={cs.CL},
71
- url={[https://arxiv.org/abs/2510.22593](https://arxiv.org/abs/2510.22593)},
72
- }
73
-
74
  ## Explore, Connect, and Contribute
75
 
76
  Whether you are an AI researcher, a prompt engineer, or an enterprise IT architect deploying autonomous agents, AutoBench has the data you need to stop flying blind.
@@ -84,3 +70,16 @@ Whether you are an AI researcher, a prompt engineer, or an enterprise IT archite
84
 
85
  *Inference Support: Running a compute-intensive benchmark like AutoBench can be expensive. We welcome all inference API providers to support us with free inference credits to expand the scope of our evaluations.*
86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  ## Organization Description
14
 
15
+ **[AutoBench](https://autobench.org/)** is the premier LLM evaluation and routing infrastructure for the Agentic Era. We are dedicated to solving the LLM evaluation crisis by moving the industry beyond static, domian-rigid, easily gameable text prompts and build the first open LLM-based API Router for the agentic era.
16
 
17
  Pioneering the **"Collective-LLM-as-a-Judge"** methodology, AutoBench uses massive pools of LLMs to dynamically generate tasks, execute multi-turn workflows, and granularly evaluate performance across the AI ecosystem. Today, AutoBench provides fully automated, highly correlated, and strictly un-gameable benchmarking. Furthermore, we leverage the massive synthetic execution datasets generated by our benchmarks to train next-generation **Agentic LLM Routers**, helping agent developers and enterprises optimize for both absolute quality and unit economics.
18
 
 
57
  * **eZecute:** The venture builder for enabling the industrialization and scaling of this platform.
58
  * **AWS Startups:** For compute credits.
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ## Explore, Connect, and Contribute
61
 
62
  Whether you are an AI researcher, a prompt engineer, or an enterprise IT architect deploying autonomous agents, AutoBench has the data you need to stop flying blind.
 
70
 
71
  *Inference Support: Running a compute-intensive benchmark like AutoBench can be expensive. We welcome all inference API providers to support us with free inference credits to expand the scope of our evaluations.*
72
 
73
+ ### Citation
74
+ If you use AutoBench in your research, please cite our validation paper:
75
+
76
+ ```bibtex
77
+ @misc{autobench2025,
78
+ title={AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment},
79
+ author={AutoBench},
80
+ year={2025},
81
+ eprint={2510.22593},
82
+ archivePrefix={arXiv},
83
+ primaryClass={cs.CL},
84
+ url={[https://arxiv.org/abs/2510.22593](https://arxiv.org/abs/2510.22593)},
85
+ }