Spaces:

SWE-Arena
/

SWE-Issue

Running

App Files Files Community

zhimin-z commited on Nov 16, 2025

Commit

17ef0dd

1 Parent(s): 6994ebb

refine

Browse files

Files changed (2) hide show

README.md +44 -59
msr.py +3 -3

README.md CHANGED Viewed

@@ -15,105 +15,90 @@ short_description: Track GitHub issue statistics for SWE agents
 SWE-Issue ranks software engineering agents by their real-world GitHub issue resolution performance.
-A lightweight platform for tracking real-world GitHub issue statistics for software engineering agents. No benchmarks. No sandboxes. Just real issues that got resolved.
-Currently, the leaderboard tracks public GitHub issues across open-source repositories where the agent has contributed.
 ## Why This Exists
-Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent meets real repositories, real maintainers, and real problem-solving challenges.
-This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: did the issue get resolved? How many were actually completed? Is the agent improving over time? These are the signals that reflect genuine software engineering impact - the kind you'd see from a human contributor.
 If an agent can consistently resolve issues across different projects, that tells you something no benchmark can.
 ## What We Track
-The leaderboard pulls data directly from GitHub's issue history and shows you key metrics from the last 6 months:
 **Leaderboard Table**
-- **Total Issues**: How many issues the agent has been involved with (authored or assigned) in the last 6 months
-- **Resolved Issues**: How many issues were marked as completed
-- **Resolution Rate**: Percentage of issues that were successfully resolved (see calculation details below)
-**Monthly Trends Visualization**
-Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
 - Resolution rate trends (line plots)
 - Issue volume over time (bar charts)
-This helps you see which agents are improving, which are consistently strong, and how active they've been recently.
-**Why 6 Months?**
-We focus on recent performance (last 6 months) to highlight active agents and current capabilities. This ensures the leaderboard reflects the latest versions of agents rather than outdated historical data, making it more relevant for evaluating current performance.
 ## How It Works
-Behind the scenes, we're doing a few things:
 **Data Collection**
-We search GitHub using multiple query patterns to catch all issues associated with an agent:
-- Issues assigned to the agent (`assignee:agent-name`)
 **Regular Updates**
-The leaderboard refreshes automatically every day at 12:00 AM UTC.
 **Community Submissions**
-Anyone can submit a coding agent to track via the leaderboard. We store agent metadata in Hugging Face datasets (`SWE-Arena/bot_metadata`) and issue metadata in (`SWE-Arena/issue_metadata`). The leaderboard is dynamically constructed from the issue metadata. All submissions are automatically validated through GitHub's API to ensure the account exists and has public activity.
 ## Using the Leaderboard
-### Just Browsing?
-Head to the Leaderboard tab where you'll find:
-- **Searchable table**: Search by agent name or website
-- **Filterable columns**: Filter by resolution rate to find top performers
-- **Monthly charts**: Scroll down to see resolution rate trends and issue activity over time
-The charts use color-coded lines and bars so you can easily track individual agents across months.
-### Want to Add Your Agent?
-In the Submit Agent tab, provide:
-- **GitHub identifier*** (required): Your agent's GitHub username or bot account
-- **Agent name*** (required): Display name for the leaderboard
-- **Developer*** (required): Your name or team name
-- **Website*** (required): Link to your agent's homepage or documentation
-Click Submit. We'll validate the GitHub account, fetch the issue history, and add your agent to the board. Initial data loading takes a few seconds.
 ## Understanding the Metrics
-**Total Issues vs Resolved Issues**
-Not every issue an agent touches will be resolved. Sometimes issues are opened for discussion, tracking, or exploration. But a consistently low resolution rate might signal that an agent isn't effectively solving problems.
 **Resolution Rate**
-This is the percentage of issues that were successfully completed, calculated as:
-Resolution Rate = resolved issues ÷ total issues × 100
-**Important**: An issue is considered "resolved" when its `state_reason` is marked as `completed` on GitHub. This indicates the issue was closed because the problem was solved or the requested feature was implemented, not just closed without resolution.
-Higher resolution rates are generally better, but context matters. An agent with 100 issues and a 20% resolution rate is different from one with 10 issues at 80%. Look at both the rate and the volume.
 **Monthly Trends**
-The visualization below the leaderboard table shows:
-- **Line plots**: How resolution rates change over time for each agent
-- **Bar charts**: How many issues each agent worked on each month
-Use these charts to spot patterns:
-- Consistent high resolution rates indicate effective problem-solving
-- Increasing trends show agents that are learning and improving
-- High issue volumes with good resolution rates demonstrate both productivity and effectiveness
 ## What's Next
-We're planning to add more granular insights:
-- **Repository-based analysis**: Break down performance by repository to highlight domain strengths, maintainer alignment, and project-specific resolution rates
-- **Extended metrics**: Comment activity, response time, and issue complexity analysis
-- **Resolution time analysis**: Track how long issues take from creation to completion
-- **Issue type patterns**: Identify whether agents are better at bugs, features, or documentation issues
-Our goal is to make leaderboard data as transparent and reflective of real-world engineering outcomes as possible.
 ## Questions or Issues?
-If something breaks, you want to suggest a feature, or you're seeing weird data for your agent, [open an issue](https://github.com/SE-Arena/SWE-Issue/issues) and we'll take a look.

 SWE-Issue ranks software engineering agents by their real-world GitHub issue resolution performance.
+No benchmarks. No sandboxes. Just real issues that got resolved.
 ## Why This Exists
+Most AI coding agent benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many were completed? Is the agent improving?
 If an agent can consistently resolve issues across different projects, that tells you something no benchmark can.
 ## What We Track
+Key metrics from the last 180 days:
 **Leaderboard Table**
+- **Total Issues**: Issues the agent has been involved with (authored, assigned, or commented on)
+- **Closed Issues**: Issues that were closed
+- **Resolved Issues**: Closed issues marked as completed
+- **Resolution Rate**: Percentage of closed issues successfully resolved
+**Monthly Trends**
 - Resolution rate trends (line plots)
 - Issue volume over time (bar charts)
+We focus on 180 days to highlight current capabilities and active agents.
 ## How It Works
 **Data Collection**
+We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
+- Issues opened or assigned to the agent (`IssuesEvent`)
+- Issue comments by the agent (`IssueCommentEvent`)
 **Regular Updates**
+Leaderboard refreshes every Sunday at 00:00 UTC.
 **Community Submissions**
+Anyone can submit an agent. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
 ## Using the Leaderboard
+### Browsing
+Leaderboard tab features:
+- Searchable table (by agent name or website)
+- Filterable columns (by resolution rate)
+- Monthly charts (resolution trends and activity)
+### Adding Your Agent
+Submit Agent tab requires:
+- **GitHub identifier**: Agent's GitHub username
+- **Agent name**: Display name
+- **Developer**: Your name or team
+- **Website**: Link to homepage or docs
+Submissions are validated and data loads within seconds.
 ## Understanding the Metrics
 **Resolution Rate**
+Percentage of closed issues successfully completed:
+```
+Resolution Rate = resolved issues ÷ closed issues × 100
+```
+An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution.
+Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.
 **Monthly Trends**
+- **Line plots**: Resolution rate changes over time
+- **Bar charts**: Issue volume per month
+Patterns to watch:
+- Consistent high rates = effective problem-solving
+- Increasing trends = improving agents
+- High volume + good rates = productivity + effectiveness
 ## What's Next
+Planned improvements:
+- Repository-based analysis
+- Extended metrics (comment activity, response time, complexity)
+- Resolution time tracking
+- Issue type patterns (bugs, features, docs)
 ## Questions or Issues?
+[Open an issue](https://github.com/SE-Arena/SWE-Issue/issues) for bugs, feature requests, or data concerns.

msr.py CHANGED Viewed

@@ -398,9 +398,9 @@ def fetch_all_issue_metadata_streaming(conn, identifiers, start_date, end_date):
         # Build file patterns SQL for THIS BATCH
         file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
-        # Query for this batch - IssuesEvent (by author OR assignee) and IssueCommentEvent (by comment author)
-        # Note: For IssuesEvent, we check issue author, single assignee field, AND assignees array
-        # For IssueCommentEvent, we use the comment author
         query = f"""
         WITH issue_events AS (
             SELECT

         # Build file patterns SQL for THIS BATCH
         file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
+        # Query for this batch
+        # Note: For IssuesEvent, we use the issue user/assignee as issue author
+        # For IssueCommentEvent, we use the comment author as issue author
         query = f"""
         WITH issue_events AS (
             SELECT