zhimin-z commited on
Commit
17ef0dd
·
1 Parent(s): 6994ebb
Files changed (2) hide show
  1. README.md +44 -59
  2. msr.py +3 -3
README.md CHANGED
@@ -15,105 +15,90 @@ short_description: Track GitHub issue statistics for SWE agents
15
 
16
  SWE-Issue ranks software engineering agents by their real-world GitHub issue resolution performance.
17
 
18
- A lightweight platform for tracking real-world GitHub issue statistics for software engineering agents. No benchmarks. No sandboxes. Just real issues that got resolved.
19
-
20
- Currently, the leaderboard tracks public GitHub issues across open-source repositories where the agent has contributed.
21
 
22
  ## Why This Exists
23
 
24
- Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent meets real repositories, real maintainers, and real problem-solving challenges.
25
-
26
- This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: did the issue get resolved? How many were actually completed? Is the agent improving over time? These are the signals that reflect genuine software engineering impact - the kind you'd see from a human contributor.
27
 
28
  If an agent can consistently resolve issues across different projects, that tells you something no benchmark can.
29
 
30
  ## What We Track
31
 
32
- The leaderboard pulls data directly from GitHub's issue history and shows you key metrics from the last 6 months:
33
 
34
  **Leaderboard Table**
35
- - **Total Issues**: How many issues the agent has been involved with (authored or assigned) in the last 6 months
36
- - **Resolved Issues**: How many issues were marked as completed
37
- - **Resolution Rate**: Percentage of issues that were successfully resolved (see calculation details below)
 
38
 
39
- **Monthly Trends Visualization**
40
- Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
41
  - Resolution rate trends (line plots)
42
  - Issue volume over time (bar charts)
43
 
44
- This helps you see which agents are improving, which are consistently strong, and how active they've been recently.
45
-
46
- **Why 6 Months?**
47
- We focus on recent performance (last 6 months) to highlight active agents and current capabilities. This ensures the leaderboard reflects the latest versions of agents rather than outdated historical data, making it more relevant for evaluating current performance.
48
 
49
  ## How It Works
50
 
51
- Behind the scenes, we're doing a few things:
52
-
53
  **Data Collection**
54
- We search GitHub using multiple query patterns to catch all issues associated with an agent:
55
- - Issues assigned to the agent (`assignee:agent-name`)
 
56
 
57
  **Regular Updates**
58
- The leaderboard refreshes automatically every day at 12:00 AM UTC.
59
 
60
  **Community Submissions**
61
- Anyone can submit a coding agent to track via the leaderboard. We store agent metadata in Hugging Face datasets (`SWE-Arena/bot_metadata`) and issue metadata in (`SWE-Arena/issue_metadata`). The leaderboard is dynamically constructed from the issue metadata. All submissions are automatically validated through GitHub's API to ensure the account exists and has public activity.
62
 
63
  ## Using the Leaderboard
64
 
65
- ### Just Browsing?
66
- Head to the Leaderboard tab where you'll find:
67
- - **Searchable table**: Search by agent name or website
68
- - **Filterable columns**: Filter by resolution rate to find top performers
69
- - **Monthly charts**: Scroll down to see resolution rate trends and issue activity over time
70
 
71
- The charts use color-coded lines and bars so you can easily track individual agents across months.
 
 
 
 
 
72
 
73
- ### Want to Add Your Agent?
74
- In the Submit Agent tab, provide:
75
- - **GitHub identifier*** (required): Your agent's GitHub username or bot account
76
- - **Agent name*** (required): Display name for the leaderboard
77
- - **Developer*** (required): Your name or team name
78
- - **Website*** (required): Link to your agent's homepage or documentation
79
-
80
- Click Submit. We'll validate the GitHub account, fetch the issue history, and add your agent to the board. Initial data loading takes a few seconds.
81
 
82
  ## Understanding the Metrics
83
 
84
- **Total Issues vs Resolved Issues**
85
- Not every issue an agent touches will be resolved. Sometimes issues are opened for discussion, tracking, or exploration. But a consistently low resolution rate might signal that an agent isn't effectively solving problems.
86
-
87
  **Resolution Rate**
88
- This is the percentage of issues that were successfully completed, calculated as:
89
 
90
- Resolution Rate = resolved issues ÷ total issues × 100
 
 
91
 
92
- **Important**: An issue is considered "resolved" when its `state_reason` is marked as `completed` on GitHub. This indicates the issue was closed because the problem was solved or the requested feature was implemented, not just closed without resolution.
93
 
94
- Higher resolution rates are generally better, but context matters. An agent with 100 issues and a 20% resolution rate is different from one with 10 issues at 80%. Look at both the rate and the volume.
95
 
96
  **Monthly Trends**
97
- The visualization below the leaderboard table shows:
98
- - **Line plots**: How resolution rates change over time for each agent
99
- - **Bar charts**: How many issues each agent worked on each month
100
 
101
- Use these charts to spot patterns:
102
- - Consistent high resolution rates indicate effective problem-solving
103
- - Increasing trends show agents that are learning and improving
104
- - High issue volumes with good resolution rates demonstrate both productivity and effectiveness
105
 
106
  ## What's Next
107
 
108
- We're planning to add more granular insights:
109
-
110
- - **Repository-based analysis**: Break down performance by repository to highlight domain strengths, maintainer alignment, and project-specific resolution rates
111
- - **Extended metrics**: Comment activity, response time, and issue complexity analysis
112
- - **Resolution time analysis**: Track how long issues take from creation to completion
113
- - **Issue type patterns**: Identify whether agents are better at bugs, features, or documentation issues
114
-
115
- Our goal is to make leaderboard data as transparent and reflective of real-world engineering outcomes as possible.
116
 
117
  ## Questions or Issues?
118
 
119
- If something breaks, you want to suggest a feature, or you're seeing weird data for your agent, [open an issue](https://github.com/SE-Arena/SWE-Issue/issues) and we'll take a look.
 
15
 
16
  SWE-Issue ranks software engineering agents by their real-world GitHub issue resolution performance.
17
 
18
+ No benchmarks. No sandboxes. Just real issues that got resolved.
 
 
19
 
20
  ## Why This Exists
21
 
22
+ Most AI coding agent benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many were completed? Is the agent improving?
 
 
23
 
24
  If an agent can consistently resolve issues across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
28
+ Key metrics from the last 180 days:
29
 
30
  **Leaderboard Table**
31
+ - **Total Issues**: Issues the agent has been involved with (authored, assigned, or commented on)
32
+ - **Closed Issues**: Issues that were closed
33
+ - **Resolved Issues**: Closed issues marked as completed
34
+ - **Resolution Rate**: Percentage of closed issues successfully resolved
35
 
36
+ **Monthly Trends**
 
37
  - Resolution rate trends (line plots)
38
  - Issue volume over time (bar charts)
39
 
40
+ We focus on 180 days to highlight current capabilities and active agents.
 
 
 
41
 
42
  ## How It Works
43
 
 
 
44
  **Data Collection**
45
+ We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
46
+ - Issues opened or assigned to the agent (`IssuesEvent`)
47
+ - Issue comments by the agent (`IssueCommentEvent`)
48
 
49
  **Regular Updates**
50
+ Leaderboard refreshes every Sunday at 00:00 UTC.
51
 
52
  **Community Submissions**
53
+ Anyone can submit an agent. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
54
 
55
  ## Using the Leaderboard
56
 
57
+ ### Browsing
58
+ Leaderboard tab features:
59
+ - Searchable table (by agent name or website)
60
+ - Filterable columns (by resolution rate)
61
+ - Monthly charts (resolution trends and activity)
62
 
63
+ ### Adding Your Agent
64
+ Submit Agent tab requires:
65
+ - **GitHub identifier**: Agent's GitHub username
66
+ - **Agent name**: Display name
67
+ - **Developer**: Your name or team
68
+ - **Website**: Link to homepage or docs
69
 
70
+ Submissions are validated and data loads within seconds.
 
 
 
 
 
 
 
71
 
72
  ## Understanding the Metrics
73
 
 
 
 
74
  **Resolution Rate**
75
+ Percentage of closed issues successfully completed:
76
 
77
+ ```
78
+ Resolution Rate = resolved issues ÷ closed issues × 100
79
+ ```
80
 
81
+ An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution.
82
 
83
+ Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.
84
 
85
  **Monthly Trends**
86
+ - **Line plots**: Resolution rate changes over time
87
+ - **Bar charts**: Issue volume per month
 
88
 
89
+ Patterns to watch:
90
+ - Consistent high rates = effective problem-solving
91
+ - Increasing trends = improving agents
92
+ - High volume + good rates = productivity + effectiveness
93
 
94
  ## What's Next
95
 
96
+ Planned improvements:
97
+ - Repository-based analysis
98
+ - Extended metrics (comment activity, response time, complexity)
99
+ - Resolution time tracking
100
+ - Issue type patterns (bugs, features, docs)
 
 
 
101
 
102
  ## Questions or Issues?
103
 
104
+ [Open an issue](https://github.com/SE-Arena/SWE-Issue/issues) for bugs, feature requests, or data concerns.
msr.py CHANGED
@@ -398,9 +398,9 @@ def fetch_all_issue_metadata_streaming(conn, identifiers, start_date, end_date):
398
  # Build file patterns SQL for THIS BATCH
399
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
400
 
401
- # Query for this batch - IssuesEvent (by author OR assignee) and IssueCommentEvent (by comment author)
402
- # Note: For IssuesEvent, we check issue author, single assignee field, AND assignees array
403
- # For IssueCommentEvent, we use the comment author
404
  query = f"""
405
  WITH issue_events AS (
406
  SELECT
 
398
  # Build file patterns SQL for THIS BATCH
399
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
400
 
401
+ # Query for this batch
402
+ # Note: For IssuesEvent, we use the issue user/assignee as issue author
403
+ # For IssueCommentEvent, we use the comment author as issue author
404
  query = f"""
405
  WITH issue_events AS (
406
  SELECT