burtenshaw commited on
Commit Β·
7e23461
1
Parent(s): e8d0aea
docs: update swarm sweeper article copy
Browse files
app/src/content/article.mdx
CHANGED
|
@@ -22,7 +22,7 @@ template: "article"
|
|
| 22 |
showPdf: false
|
| 23 |
tableOfContentsAutoCollapse: true
|
| 24 |
licence: >
|
| 25 |
-
Text and diagrams are licensed under <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" rel="noopener noreferrer">CCβBY 4.0</a> with the source available on <a href="https://huggingface.co/spaces/burtenshaw/
|
| 26 |
tags:
|
| 27 |
- open-source
|
| 28 |
- agents
|
|
|
|
| 22 |
showPdf: false
|
| 23 |
tableOfContentsAutoCollapse: true
|
| 24 |
licence: >
|
| 25 |
+
Text and diagrams are licensed under <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" rel="noopener noreferrer">CCβBY 4.0</a> with the source available on <a href="https://huggingface.co/spaces/burtenshaw/swarm-sweeper-blog" target="_blank" rel="noopener noreferrer">Hugging Face</a>.
|
| 26 |
tags:
|
| 27 |
- open-source
|
| 28 |
- agents
|
app/src/content/chapters/slopfarmer/content.mdx
CHANGED
|
@@ -17,23 +17,31 @@ The monthly PR rate nearly quadrupled over the period we studied. In Q3 2025, th
|
|
| 17 |
|
| 18 |
<HtmlEmbed src="d3-pr-timeline.html" title="PR volume and composition over time" desc="Monthly PR rate nearly quadrupled from ~44/mo in Q3 2025 to ~167/mo in April 2026. Feature PRs grew from 31% to 43%. Documentation dropped from 24% to 5%." />
|
| 19 |
|
|
|
|
|
|
|
| 20 |
When we discussed this at Hugging Face, we first considered simple heuristics that block bad actors. For example, account age or successfully merged PRs. Though potentially effective, these come with the high price of excluding new contributors.
|
| 21 |
|
| 22 |
*We all remember the feeling of contributing to open source projects for the first time, and nothing should take that away from people. Agent or not.*
|
| 23 |
|
| 24 |
-
|
|
|
|
|
|
|
| 25 |
|
| 26 |
*Thanks for listening community!*
|
| 27 |
|
|
|
|
|
|
|
| 28 |
Many of the first contributors are real people (CS students, junior developers) who think they are being helpful by fixing a bug with an agent. They lack the domain knowledge or project overview to tell whether their agent's output is correct. Blocking them outright will probably just deter them from our projects, or worse open source contributions generally. But reviewing their PRs takes the same time as reviewing anyone else's, and they now account for the majority of incoming contributions.
|
| 29 |
|
| 30 |
-
We are quickly moving to a world where most code will be written by agents, and what we want is not to block good contributions made consciously by people steering the agents, but block low-effort completely autonomous contributions from making on to main.
|
| 31 |
|
| 32 |
The core problem is that the people submitting low-effort PRs do not know they are low-effort.
|
| 33 |
|
|
|
|
|
|
|
| 34 |
That said, there is some value in the duplicated PRs or incorrect fixes. In effect, they highlight that (according to the agent) there is an underlying problem in the code base which may need fixing. If many agents identify a single issue, there's a stronger chance that the issue is genuine. Therefore, at their very least low quality agent PRs may contain signals in their noise.
|
| 35 |
|
| 36 |
-
One cluster makes the duplication problem concrete. Issue #43979 asked for a mechanical refactor: migrate model output tracing to standardized decorators. Between PR 43996 and PR 44722, thirty-nine separate contributors submitted PRs applying this pattern to different model files. The PRs are nearly identical in structure. Each one touches a single model, applies the same decorator swap, and references the same issue. A maintainer reviewing them individually would do the same cognitive work thirty-nine times. A single combined PR could replace all of them.
|
| 37 |
|
| 38 |
<HtmlEmbed src="d3-pr-convergence.html" title="39 duplicate PRs β 1 combined PR" desc="Issue #43979 generated 39 near-identical PRs, each applying the same output tracing decorator pattern to a different model file. All could be replaced by a single combined PR." />
|
| 39 |
|
|
@@ -49,9 +57,9 @@ The project had two parts. First, build tooling to cluster, deduplicate, and ass
|
|
| 49 |
|
| 50 |
We found and built several experimental tools for this to work. They each approach the same problem from different angles and at different layers. None of them, alone, solves the triage problem but they can be used to form custom pipelines.
|
| 51 |
|
| 52 |
-
**
|
| 53 |
|
| 54 |
-
**pr-search-cli** ([huggingface/pr-search-cli](https://github.com/huggingface/pr-search-cli)) is a command-line frontend to
|
| 55 |
|
| 56 |
**GHReplica and PRTags** ([dutifuldev/ghreplica](https://github.com/dutifuldev/ghreplica), [dutifuldev/prtags](https://github.com/dutifuldev/prtags)) are Onur Solmaz's GitHub API cache and tagging manager. GHReplica mirrors repository data through webhooks and backfill, serving it over the same API as GitHub but without rate limits. PRTags manages cluster assignments on top of that data and can automatically post comments on PRs linking them to their duplicates.
|
| 57 |
|
|
@@ -82,11 +90,19 @@ The combined PR containing the merged results is at [evalstate/transformers#42](
|
|
| 82 |
|
| 83 |
## What we found
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
The duplicate rate is high. When a visible bug is filed as an issue, it is common to see ten or twenty PRs appear within hours, all attempting the same fix. Most of these PRs are close enough in content that an agent can combine them. The combined version is usually better than any individual submission because it picks the cleanest implementation from the group.
|
| 88 |
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
To measure the result of the experiment, we evaluated a subset of models end-to-end. We ran the merged fork through [lighteval](https://github.com/huggingface/lighteval) on three small models across three standard benchmarks. The point was not to improve scores. It was to confirm that bulk-merging hundreds of agent PRs did not break inference.
|
| 92 |
|
|
@@ -120,10 +136,9 @@ Open source projects that want to stay open to contributions will need tooling l
|
|
| 120 |
|
| 121 |
### Repositories
|
| 122 |
|
| 123 |
-
- [huggingface/
|
| 124 |
-
- [huggingface/pr-search-cli](https://github.com/huggingface/pr-search-cli) β CLI frontend to
|
| 125 |
- [huggingface/pr-merger](https://github.com/huggingface/pr-merger) β ACPX merge workflows
|
| 126 |
-
- [huggingface/swarm-sweeper](https://github.com/huggingface/swarm-sweeper) β Swarm Sweeper workflow repository
|
| 127 |
- [openclaw/acpx](https://github.com/openclaw/acpx) β Agent automation framework
|
| 128 |
- [openclaw/gitcrawl](https://github.com/openclaw/gitcrawl) β GitHub data mirror and clustering (Go)
|
| 129 |
- [openclaw/clawsweeper](https://github.com/openclaw/clawsweeper) β Brute-force issue analysis
|
|
|
|
| 17 |
|
| 18 |
<HtmlEmbed src="d3-pr-timeline.html" title="PR volume and composition over time" desc="Monthly PR rate nearly quadrupled from ~44/mo in Q3 2025 to ~167/mo in April 2026. Feature PRs grew from 31% to 43%. Documentation dropped from 24% to 5%." />
|
| 19 |
|
| 20 |
+
### Heuristic gates
|
| 21 |
+
|
| 22 |
When we discussed this at Hugging Face, we first considered simple heuristics that block bad actors. For example, account age or successfully merged PRs. Though potentially effective, these come with the high price of excluding new contributors.
|
| 23 |
|
| 24 |
*We all remember the feeling of contributing to open source projects for the first time, and nothing should take that away from people. Agent or not.*
|
| 25 |
|
| 26 |
+
### Contributing guidelines
|
| 27 |
+
|
| 28 |
+
Next, we discussed honor codes in `CONTRIBUTING.md`. For example, clear warnings in the contributing guide, detailed explanations of the rules, and a commitment to progressive enforcement. We implemented this and it has considerably refined the agent contributions problem today.
|
| 29 |
|
| 30 |
*Thanks for listening community!*
|
| 31 |
|
| 32 |
+
### The real contributors
|
| 33 |
+
|
| 34 |
Many of the first contributors are real people (CS students, junior developers) who think they are being helpful by fixing a bug with an agent. They lack the domain knowledge or project overview to tell whether their agent's output is correct. Blocking them outright will probably just deter them from our projects, or worse open source contributions generally. But reviewing their PRs takes the same time as reviewing anyone else's, and they now account for the majority of incoming contributions.
|
| 35 |
|
| 36 |
+
We are quickly moving to a world where most code will be written by agents, and what we want is not to block good contributions made consciously by people steering the agents, but block low-effort completely autonomous contributions from making it on to main.
|
| 37 |
|
| 38 |
The core problem is that the people submitting low-effort PRs do not know they are low-effort.
|
| 39 |
|
| 40 |
+
### Signal in the noise
|
| 41 |
+
|
| 42 |
That said, there is some value in the duplicated PRs or incorrect fixes. In effect, they highlight that (according to the agent) there is an underlying problem in the code base which may need fixing. If many agents identify a single issue, there's a stronger chance that the issue is genuine. Therefore, at their very least low quality agent PRs may contain signals in their noise.
|
| 43 |
|
| 44 |
+
One cluster makes the duplication problem concrete. [Issue #43979](https://github.com/huggingface/transformers/issues/43979) asked for a mechanical refactor: migrate model output tracing to standardized decorators. Between [PR #43996](https://github.com/huggingface/transformers/pull/43996) and [PR #44722](https://github.com/huggingface/transformers/pull/44722), thirty-nine separate contributors submitted PRs applying this pattern to different model files. The PRs are nearly identical in structure. Each one touches a single model, applies the same decorator swap, and references the same issue. A maintainer reviewing them individually would do the same cognitive work thirty-nine times. A single combined PR could replace all of them.
|
| 45 |
|
| 46 |
<HtmlEmbed src="d3-pr-convergence.html" title="39 duplicate PRs β 1 combined PR" desc="Issue #43979 generated 39 near-identical PRs, each applying the same output tracing decorator pattern to a different model file. All could be replaced by a single combined PR." />
|
| 47 |
|
|
|
|
| 57 |
|
| 58 |
We found and built several experimental tools for this to work. They each approach the same problem from different angles and at different layers. None of them, alone, solves the triage problem but they can be used to form custom pipelines.
|
| 59 |
|
| 60 |
+
**Swarm Sweeper** ([huggingface/swarm-sweeper](https://github.com/huggingface/swarm-sweeper)) scrapes PRs, issues, and contributor profiles from GitHub into a Hugging Face dataset. It computes code similarity using IDF-weighted search and runs a clustering algorithm to group related contributions. It can publish an API server to a Space or format its output for browsable search or in a dashboard.
|
| 61 |
|
| 62 |
+
**pr-search-cli** ([huggingface/pr-search-cli](https://github.com/huggingface/pr-search-cli)) is a command-line frontend to Swarm Sweeper's output. It is packaged for `uvx` so there is no setup: `uvx pr-search-cli@latest issues list` returns clusters immediately. The clustering uses both hard edges (PRs that reference the same issue) and soft edges (PRs whose code changes or descriptions look similar).
|
| 63 |
|
| 64 |
**GHReplica and PRTags** ([dutifuldev/ghreplica](https://github.com/dutifuldev/ghreplica), [dutifuldev/prtags](https://github.com/dutifuldev/prtags)) are Onur Solmaz's GitHub API cache and tagging manager. GHReplica mirrors repository data through webhooks and backfill, serving it over the same API as GitHub but without rate limits. PRTags manages cluster assignments on top of that data and can automatically post comments on PRs linking them to their duplicates.
|
| 65 |
|
|
|
|
| 90 |
|
| 91 |
## What we found
|
| 92 |
|
| 93 |
+
### Clustering works
|
| 94 |
+
|
| 95 |
+
The clustering works. Both the embedding-based approach (GitCrawl, ClownFish) and the hard/soft-edge approach (Swarm Sweeper, pr-search-cli) identify genuine duplicates. They disagree on edge cases, and neither is complete on its own. Running an agent with access to both produces better groupings than either alone, but it consumes an extreme amount of tokens.
|
| 96 |
+
|
| 97 |
+
### Duplication is the norm
|
| 98 |
|
| 99 |
The duplicate rate is high. When a visible bug is filed as an issue, it is common to see ten or twenty PRs appear within hours, all attempting the same fix. Most of these PRs are close enough in content that an agent can combine them. The combined version is usually better than any individual submission because it picks the cleanest implementation from the group.
|
| 100 |
|
| 101 |
+
### The bottleneck is humans, then tokens
|
| 102 |
+
|
| 103 |
+
The triage bottleneck is real and it is not primarily a technology problem. The long standing issue is a lack of human resources, but that is given and not something an agent can or should solve. The next bottleneck is token consumption. Running a quality filter over every incoming PR depletes API subscriptions in hours. The tooling helps with prioritization and deduplication, but someone still has to review the output.
|
| 104 |
+
|
| 105 |
+
### Benchmarking the merge
|
| 106 |
|
| 107 |
To measure the result of the experiment, we evaluated a subset of models end-to-end. We ran the merged fork through [lighteval](https://github.com/huggingface/lighteval) on three small models across three standard benchmarks. The point was not to improve scores. It was to confirm that bulk-merging hundreds of agent PRs did not break inference.
|
| 108 |
|
|
|
|
| 136 |
|
| 137 |
### Repositories
|
| 138 |
|
| 139 |
+
- [huggingface/swarm-sweeper](https://github.com/huggingface/swarm-sweeper) β GitHub scraper, clustering, dataset publishing
|
| 140 |
+
- [huggingface/pr-search-cli](https://github.com/huggingface/pr-search-cli) β CLI frontend to Swarm Sweeper output
|
| 141 |
- [huggingface/pr-merger](https://github.com/huggingface/pr-merger) β ACPX merge workflows
|
|
|
|
| 142 |
- [openclaw/acpx](https://github.com/openclaw/acpx) β Agent automation framework
|
| 143 |
- [openclaw/gitcrawl](https://github.com/openclaw/gitcrawl) β GitHub data mirror and clustering (Go)
|
| 144 |
- [openclaw/clawsweeper](https://github.com/openclaw/clawsweeper) β Brute-force issue analysis
|