INTRODUCTION_TEXT = """ A **Coding Agent** is more than just a model - it's the combination of a **Model** and a **Harness** (the tool/framework driving the model). This leaderboard tracks how these components work together, because the same model can perform very differently depending on the harness it's paired with. """ LLM_BENCHMARKS_TEXT = """ ## What is a Coding Agent? A coding agent is a system that autonomously solves software engineering tasks - reading code, reasoning about bugs, and writing patches. Its performance depends on two components: - **Model** - The underlying language model (e.g. Claude Opus 4.7, Qwen3.6-35B) - **Harness** - The framework or tool that orchestrates the model's actions (e.g. Claude Code, OpenCode, Pi) ## How to Read the Table | Column | Description | |--------|-------------| | **Benchmark** | The benchmark used for evaluation (e.g. SWE-bench Verified - 500 real GitHub issues) | | **Harness** | The agent framework driving the model. | | **Model** | The language model being evaluated | | **Skills** | The set of instructions guiding the agent's behavior | | **Score** | Outcome of the benchmark, often the fraction of tasks solved correctly (higher is better) | | **Precision** | Model weight format (e.g. bf16, fp4) - affects speed, memory footprint, and quality | ## Key Concepts - **FOSS vs Proprietary** - Filters let you compare fully open-source agents against proprietary ones. A FOSS model with a FOSS harness means anyone can reproduce the result - **Skills** - Some harnesses augment the model with extra capabilities (tools, retrieval, etc.). Listed in the "skills" column when present - **Internal results (`*`)** - Benchmarks run by the model provider where the harness and environment were not made public. These are useful reference points but are not independently reproducible ## Learn More Visit the [GitHub repo](https://github.com/redhat-et/coding_agent_bench) for details about the project, methodology, and how to submit your own results. """