Spaces:

AutoBench
/

README

Running

App Files Files Community

PeterKruger commited on 16 days ago

Commit

511127b

verified ·

1 Parent(s): 687a524

minor typo

Browse files

Files changed (1) hide show

README.md +14 -15

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ license: apache-2.0
 ## Organization Description
-**AutoBench** is the premier LLM evaluation and routing infrastructure for the Agentic Era. We are dedicated to solving the LLM evaluation crisis by moving the industry beyond static, domian-rigid, easily gameable text prompts and build the first open LLM-based API Router for the agentic era.
 Pioneering the **"Collective-LLM-as-a-Judge"** methodology, AutoBench uses massive pools of LLMs to dynamically generate tasks, execute multi-turn workflows, and granularly evaluate performance across the AI ecosystem. Today, AutoBench provides fully automated, highly correlated, and strictly un-gameable benchmarking. Furthermore, we leverage the massive synthetic execution datasets generated by our benchmarks to train next-generation **Agentic LLM Routers**, helping agent developers and enterprises optimize for both absolute quality and unit economics.
@@ -57,20 +57,6 @@ Our methodology is scientifically validated and continuously peer-reviewed. We e
 * **eZecute:** The venture builder for enabling the industrialization and scaling of this platform.
 * **AWS Startups:** For compute credits.
-### Citation
-If you use AutoBench in your research, please cite our validation paper:
-```bibtex
-@misc{autobench2025,
-      title={AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment},
-      author={AutoBench},
-      year={2025},
-      eprint={2510.22593},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={[https://arxiv.org/abs/2510.22593](https://arxiv.org/abs/2510.22593)},
-}
 ## Explore, Connect, and Contribute
 Whether you are an AI researcher, a prompt engineer, or an enterprise IT architect deploying autonomous agents, AutoBench has the data you need to stop flying blind.
@@ -84,3 +70,16 @@ Whether you are an AI researcher, a prompt engineer, or an enterprise IT archite
 *Inference Support: Running a compute-intensive benchmark like AutoBench can be expensive. We welcome all inference API providers to support us with free inference credits to expand the scope of our evaluations.*

 ## Organization Description
+**[AutoBench](https://autobench.org/)** is the premier LLM evaluation and routing infrastructure for the Agentic Era. We are dedicated to solving the LLM evaluation crisis by moving the industry beyond static, domian-rigid, easily gameable text prompts and build the first open LLM-based API Router for the agentic era.
 Pioneering the **"Collective-LLM-as-a-Judge"** methodology, AutoBench uses massive pools of LLMs to dynamically generate tasks, execute multi-turn workflows, and granularly evaluate performance across the AI ecosystem. Today, AutoBench provides fully automated, highly correlated, and strictly un-gameable benchmarking. Furthermore, we leverage the massive synthetic execution datasets generated by our benchmarks to train next-generation **Agentic LLM Routers**, helping agent developers and enterprises optimize for both absolute quality and unit economics.
 * **eZecute:** The venture builder for enabling the industrialization and scaling of this platform.
 * **AWS Startups:** For compute credits.
 ## Explore, Connect, and Contribute
 Whether you are an AI researcher, a prompt engineer, or an enterprise IT architect deploying autonomous agents, AutoBench has the data you need to stop flying blind.
 *Inference Support: Running a compute-intensive benchmark like AutoBench can be expensive. We welcome all inference API providers to support us with free inference credits to expand the scope of our evaluations.*
+### Citation
+If you use AutoBench in your research, please cite our validation paper:
+```bibtex
+@misc{autobench2025,
+      title={AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment},
+      author={AutoBench},
+      year={2025},
+      eprint={2510.22593},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={[https://arxiv.org/abs/2510.22593](https://arxiv.org/abs/2510.22593)},
+}