INTRODUCTION_TEXT = """ A **Coding Agent** is more than just a model - it's the combination of a **Model** and a **Harness** (the tool/framework driving the model). This leaderboard tracks how these components work together, because the same model can perform very differently depending on the harness and skills it's paired with. """ LLM_BENCHMARKS_TEXT = """ ## What is a Coding Agent? A coding agent is a system that autonomously solves software engineering tasks - reading code, reasoning about bugs, and writing patches. Its performance depends on two components: - **Model** - The underlying language model (e.g. Claude Opus 4.7, Qwen3.6-35B) - **Harness** - The framework or tool that orchestrates the model's actions (e.g. Claude Code, OpenCode, Pi) ## How to Read the Table | Column | Description | |--------|-------------| | **Dataset** | The benchmark used for evaluation (e.g. SWE-bench Verified - 500 real GitHub issues) | | **Harness** | The agent framework driving the model. Entries marked with `*` are **internal** - the provider ran the benchmark but did not publish the harness or environment | | **Model** | The language model being evaluated | | **Skills** | The set of instructions guiding the agent's behavior | | **Environment** | The benchmark runtime. Also marked `*` when internal | | **Score** | Outcome of the benchmark, often the fraction of tasks solved correctly (higher is better) | | **Precision** | Model weight format (e.g. bf16, fp4) - affects speed, memory footprint, and quality | ## Key Concepts - **FOSS vs Proprietary** - Filters let you compare fully open-source agents against proprietary ones. A FOSS model with a FOSS harness means anyone can reproduce the result - **Skills** - Some harnesses augment the model with extra capabilities (tools, retrieval, etc.). Listed in the "skills" column when present - **Internal results (`*`)** - Benchmarks run by the model provider where the harness and environment were not made public. These are useful reference points but are not independently reproducible ## Learn More Visit the [GitHub repo](https://github.com/redhat-et/coding_agent_bench) for details about the project, methodology, and how to submit your own results. """ CITATION_BUTTON_TEXT = "TBD" CITATION_BUTTON_LABEL = "Citation"