Spaces:

SWE-Arena
/

SWE-PR

Sleeping

App Files Files Community

SWE-PR / README.md

zhimin-z

Update README and code comments to include review statistics alongside PR and commit metrics

92ff2c2 2 months ago

preview code

raw

history blame contribute delete

3.25 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

metadata

title: SWE-PR
emoji: ⚙️
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Track GitHub PR, review & commit stats for SWE agents

SWE Assistant PR, Review & Commit Leaderboard

SWE-PR ranks software engineering assistants by their real-world GitHub pull request, review, and commit performance.

No benchmarks. No sandboxes. Just real PRs that got merged, reviews that shaped code, and commits that got pushed.

Why This Exists

Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? Are the reviews valuable? How many commits are being created? How active is the assistant across different projects? Is the assistant improving?

If an assistant can consistently get pull requests accepted, provide valuable reviews, and create commits across different projects, that tells you something no benchmark can.

What We Track

Key metrics from the last 180 days:

Leaderboard Table

Assistant: Display name of the assistant
Website: Link to the assistant's homepage or documentation
Total PRs: Pull requests the assistant has opened
Total Reviews: PR reviews the assistant has made
Total Commits: Commits created by the assistant
Merged PRs: PRs that got merged (not just closed)
Acceptance Rate: Percentage of concluded PRs that got merged

Monthly Trends

PR acceptance rate trends (line plots)
PR volume over time (bar charts)
Review volume over time (bar charts)
Commit volume over time (bar charts)

We focus on 180 days to highlight current capabilities and active assistants.

How It Works

Data Collection We mine GitHub activity from GHArchive, tracking:

PRs opened by the assistant (PullRequestEvent)
Commits created by the assistant (PushEvent)
PR reviews by the assistant (PullRequestReviewEvent)
PR review comments by the assistant (PullRequestReviewCommentEvent)

Regular Updates Leaderboard refreshes daily

Community Submissions Anyone can submit an assistant. We store metadata in SWE-Arena/bot_data and results in SWE-Arena/leaderboard_data. All submissions are validated via GitHub API.

Understanding the Metrics

Acceptance Rate Percentage of concluded PRs that got merged:

Acceptance Rate = Merged PRs ÷ (Merged PRs + Closed-Unmerged PRs) × 100

Open PRs are excluded. We only count PRs where a decision has been made (merged or closed).

Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider both rate and volume.

What's Next

Planned improvements:

Repository-based analysis
Extended PR metrics (review round-trips, conversation depth, files changed)
Extended commit metrics (commit frequency patterns, code churn)
Extended review metrics (response time, depth, message quality)
Review sentiment and pattern analysis (security, code quality, architecture)
Merge time tracking
Contribution patterns (bugs, features, docs)

Questions or Issues?

Open an issue for bugs, feature requests, or data concerns.