FeatureSelect Leaderboard

About AlpacaEval

AlpacaEval an LLM-based automatic evaluation that is fast, cheap, and reliable. It is based on the AlpacaFarm evaluation set, which tests the ability of models to follow general user instructions. These responses are then compared to reference responses (Davinci003 for AlpacaEval, GPT-4 Preview for AlpacaEval 2.0) by the provided GPT-4 based auto-annotators, which results in the win rates presented above. AlpacaEval displays a high agreement rate with ground truth human annotations, and leaderboard rankings on AlpacaEval are very correlated with leaderboard rankings based on human annotators. Please see our documentation for more details on our analysis.

Adding new models

We welcome new model contributions to the leaderboard from the community! To do so, please follow the steps in the contributions section. Specifically, you'll need to run the model on the evaluation set, auto-annotate the outputs, and submit a PR with the model config and leaderboard results. We've also set up a Discord for community support and discussion.

Adding new evaluators or eval sets

We also welcome contributions for new evaluators or new eval sets! For making new evaluators, we release our ground-truth human annotations and comparison metrics. We also release a rough guide to follow for making new eval sets. We specifically encourage contributions for harder instructions distributions and for safety testing of LLMs.

AlpacaEval limitations

这里是简介

FeatureSelect Leaderboard

An Automatic Evaluator for FeatureSelect Methods

About AlpacaEval

Adding new models

Adding new evaluators or eval sets

AlpacaEval limitations