Spaces:

taagarwa
/

coding-agent-leaderboard

Running

App Files Files Community

coding-agent-leaderboard / src /display /text_blocks.py

taagarwa

🐛 Remove environment; fix slider

fc97436 about 3 hours ago

raw

history blame contribute delete

2.02 kB

	INTRODUCTION_TEXT = """
	A Coding Agent is more than just a model - it's the combination of a Model and a Harness (the tool/framework driving the model).
	This leaderboard tracks how these components work together, because the same model can perform very differently depending on the harness it's paired with.
	"""

	LLM_BENCHMARKS_TEXT = """
	## What is a Coding Agent?

	A coding agent is a system that autonomously solves software engineering tasks - reading code, reasoning about bugs, and writing patches. Its performance depends on two components:

	- Model - The underlying language model (e.g. Claude Opus 4.7, Qwen3.6-35B)
	- Harness - The framework or tool that orchestrates the model's actions (e.g. Claude Code, OpenCode, Pi)

	## How to Read the Table

	\| Column \| Description \|
	\|--------\|-------------\|
	\| Benchmark \| The benchmark used for evaluation (e.g. SWE-bench Verified - 500 real GitHub issues) \|
	\| Harness \| The agent framework driving the model. \|
	\| Model \| The language model being evaluated \|
	\| Skills \| The set of instructions guiding the agent's behavior \|
	\| Score \| Outcome of the benchmark, often the fraction of tasks solved correctly (higher is better) \|
	\| Precision \| Model weight format (e.g. bf16, fp4) - affects speed, memory footprint, and quality \|

	## Key Concepts

	- FOSS vs Proprietary - Filters let you compare fully open-source agents against proprietary ones. A FOSS model with a FOSS harness means anyone can reproduce the result
	- Skills - Some harnesses augment the model with extra capabilities (tools, retrieval, etc.). Listed in the "skills" column when present
	- *Internal results (``)** - Benchmarks run by the model provider where the harness and environment were not made public. These are useful reference points but are not independently reproducible

	## Learn More

	Visit the [GitHub repo](https://github.com/redhat-et/coding_agent_bench) for details about the project, methodology, and how to submit your own results.
	"""

	INTRODUCTION_TEXT = """
	A Coding Agent is more than just a model - it's the combination of a Model and a Harness (the tool/framework driving the model).
	This leaderboard tracks how these components work together, because the same model can perform very differently depending on the harness it's paired with.
	"""

	LLM_BENCHMARKS_TEXT = """
	## What is a Coding Agent?

	A coding agent is a system that autonomously solves software engineering tasks - reading code, reasoning about bugs, and writing patches. Its performance depends on two components:

	- Model - The underlying language model (e.g. Claude Opus 4.7, Qwen3.6-35B)
	- Harness - The framework or tool that orchestrates the model's actions (e.g. Claude Code, OpenCode, Pi)

	## How to Read the Table

	\| Column \| Description \|
	\|--------\|-------------\|
	\| Benchmark \| The benchmark used for evaluation (e.g. SWE-bench Verified - 500 real GitHub issues) \|
	\| Harness \| The agent framework driving the model. \|
	\| Model \| The language model being evaluated \|
	\| Skills \| The set of instructions guiding the agent's behavior \|
	\| Score \| Outcome of the benchmark, often the fraction of tasks solved correctly (higher is better) \|
	\| Precision \| Model weight format (e.g. bf16, fp4) - affects speed, memory footprint, and quality \|

	## Key Concepts

	- FOSS vs Proprietary - Filters let you compare fully open-source agents against proprietary ones. A FOSS model with a FOSS harness means anyone can reproduce the result
	- Skills - Some harnesses augment the model with extra capabilities (tools, retrieval, etc.). Listed in the "skills" column when present
	- *Internal results (``)** - Benchmarks run by the model provider where the harness and environment were not made public. These are useful reference points but are not independently reproducible

	## Learn More

	Visit the [GitHub repo](https://github.com/redhat-et/coding_agent_bench) for details about the project, methodology, and how to submit your own results.
	"""