minor typo
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ license: apache-2.0
|
|
| 12 |
|
| 13 |
## Organization Description
|
| 14 |
|
| 15 |
-
**AutoBench** is the premier LLM evaluation and routing infrastructure for the Agentic Era. We are dedicated to solving the LLM evaluation crisis by moving the industry beyond static, domian-rigid, easily gameable text prompts and build the first open LLM-based API Router for the agentic era.
|
| 16 |
|
| 17 |
Pioneering the **"Collective-LLM-as-a-Judge"** methodology, AutoBench uses massive pools of LLMs to dynamically generate tasks, execute multi-turn workflows, and granularly evaluate performance across the AI ecosystem. Today, AutoBench provides fully automated, highly correlated, and strictly un-gameable benchmarking. Furthermore, we leverage the massive synthetic execution datasets generated by our benchmarks to train next-generation **Agentic LLM Routers**, helping agent developers and enterprises optimize for both absolute quality and unit economics.
|
| 18 |
|
|
@@ -57,20 +57,6 @@ Our methodology is scientifically validated and continuously peer-reviewed. We e
|
|
| 57 |
* **eZecute:** The venture builder for enabling the industrialization and scaling of this platform.
|
| 58 |
* **AWS Startups:** For compute credits.
|
| 59 |
|
| 60 |
-
### Citation
|
| 61 |
-
If you use AutoBench in your research, please cite our validation paper:
|
| 62 |
-
|
| 63 |
-
```bibtex
|
| 64 |
-
@misc{autobench2025,
|
| 65 |
-
title={AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment},
|
| 66 |
-
author={AutoBench},
|
| 67 |
-
year={2025},
|
| 68 |
-
eprint={2510.22593},
|
| 69 |
-
archivePrefix={arXiv},
|
| 70 |
-
primaryClass={cs.CL},
|
| 71 |
-
url={[https://arxiv.org/abs/2510.22593](https://arxiv.org/abs/2510.22593)},
|
| 72 |
-
}
|
| 73 |
-
|
| 74 |
## Explore, Connect, and Contribute
|
| 75 |
|
| 76 |
Whether you are an AI researcher, a prompt engineer, or an enterprise IT architect deploying autonomous agents, AutoBench has the data you need to stop flying blind.
|
|
@@ -84,3 +70,16 @@ Whether you are an AI researcher, a prompt engineer, or an enterprise IT archite
|
|
| 84 |
|
| 85 |
*Inference Support: Running a compute-intensive benchmark like AutoBench can be expensive. We welcome all inference API providers to support us with free inference credits to expand the scope of our evaluations.*
|
| 86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
## Organization Description
|
| 14 |
|
| 15 |
+
**[AutoBench](https://autobench.org/)** is the premier LLM evaluation and routing infrastructure for the Agentic Era. We are dedicated to solving the LLM evaluation crisis by moving the industry beyond static, domian-rigid, easily gameable text prompts and build the first open LLM-based API Router for the agentic era.
|
| 16 |
|
| 17 |
Pioneering the **"Collective-LLM-as-a-Judge"** methodology, AutoBench uses massive pools of LLMs to dynamically generate tasks, execute multi-turn workflows, and granularly evaluate performance across the AI ecosystem. Today, AutoBench provides fully automated, highly correlated, and strictly un-gameable benchmarking. Furthermore, we leverage the massive synthetic execution datasets generated by our benchmarks to train next-generation **Agentic LLM Routers**, helping agent developers and enterprises optimize for both absolute quality and unit economics.
|
| 18 |
|
|
|
|
| 57 |
* **eZecute:** The venture builder for enabling the industrialization and scaling of this platform.
|
| 58 |
* **AWS Startups:** For compute credits.
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
## Explore, Connect, and Contribute
|
| 61 |
|
| 62 |
Whether you are an AI researcher, a prompt engineer, or an enterprise IT architect deploying autonomous agents, AutoBench has the data you need to stop flying blind.
|
|
|
|
| 70 |
|
| 71 |
*Inference Support: Running a compute-intensive benchmark like AutoBench can be expensive. We welcome all inference API providers to support us with free inference credits to expand the scope of our evaluations.*
|
| 72 |
|
| 73 |
+
### Citation
|
| 74 |
+
If you use AutoBench in your research, please cite our validation paper:
|
| 75 |
+
|
| 76 |
+
```bibtex
|
| 77 |
+
@misc{autobench2025,
|
| 78 |
+
title={AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment},
|
| 79 |
+
author={AutoBench},
|
| 80 |
+
year={2025},
|
| 81 |
+
eprint={2510.22593},
|
| 82 |
+
archivePrefix={arXiv},
|
| 83 |
+
primaryClass={cs.CL},
|
| 84 |
+
url={[https://arxiv.org/abs/2510.22593](https://arxiv.org/abs/2510.22593)},
|
| 85 |
+
}
|