| INTRODUCTION_TEXT = """ |
| A **Coding Agent** is more than just a model - it's the combination of a **Model** and a **Harness** (the tool/framework driving the model). |
| This leaderboard tracks how these components work together, because the same model can perform very differently depending on the harness it's paired with. |
| """ |
|
|
| LLM_BENCHMARKS_TEXT = """ |
| ## What is a Coding Agent? |
| |
| A coding agent is a system that autonomously solves software engineering tasks - reading code, reasoning about bugs, and writing patches. Its performance depends on two components: |
| |
| - **Model** - The underlying language model (e.g. Claude Opus 4.7, Qwen3.6-35B) |
| - **Harness** - The framework or tool that orchestrates the model's actions (e.g. Claude Code, OpenCode, Pi) |
| |
| ## How to Read the Table |
| |
| | Column | Description | |
| |--------|-------------| |
| | **Benchmark** | The benchmark used for evaluation (e.g. SWE-bench Verified - 500 real GitHub issues) | |
| | **Harness** | The agent framework driving the model. | |
| | **Model** | The language model being evaluated | |
| | **Skills** | The set of instructions guiding the agent's behavior | |
| | **Score** | Outcome of the benchmark, often the fraction of tasks solved correctly (higher is better) | |
| | **Precision** | Model weight format (e.g. bf16, fp4) - affects speed, memory footprint, and quality | |
| |
| ## Key Concepts |
| |
| - **FOSS vs Proprietary** - Filters let you compare fully open-source agents against proprietary ones. A FOSS model with a FOSS harness means anyone can reproduce the result |
| - **Skills** - Some harnesses augment the model with extra capabilities (tools, retrieval, etc.). Listed in the "skills" column when present |
| - **Internal results (`*`)** - Benchmarks run by the model provider where the harness and environment were not made public. These are useful reference points but are not independently reproducible |
| |
| ## Learn More |
| |
| Visit the [GitHub repo](https://github.com/redhat-et/coding_agent_bench) for details about the project, methodology, and how to submit your own results. |
| """ |
|
|