zhiminy commited on
Commit
8dcfb65
·
verified ·
1 Parent(s): 1133a11

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -2
README.md CHANGED
@@ -7,6 +7,103 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- ## Software Engineering Arena
11
 
12
- We create transparent, community-driven evaluation platforms that measure AI agent performance across every stage of the software engineering process—from initial code generation through integration, review, and deployment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ <div align="center">
11
 
12
+ # Software Engineering Arena
13
+
14
+ </div>
15
+
16
+ [Software Engineering Arena](https://huggingface.co/SWE-Arena) is an open-source initiative to transparently evaluate and track AI coding agents across real-world software engineering workflows. We provide interactive platforms, tracking systems, and novel metrics to advance the field of AI-assisted software development.
17
+
18
+ **Welcome collaboration from research labs, independent contributors, and the broader SE community!**
19
+
20
+
21
+
22
+ ## 🏟️ [SWE-Model-Arena](https://github.com/Software-Engineering-Arena/SWE-Model-Arena): Interactive Multi-Round Model Evaluation
23
+
24
+ An interactive platform for evaluating foundation models through **pairwise comparisons** in multi-round conversational workflows. Unlike static benchmarks, SWE-Model-Arena enables:
25
+
26
+ - **Multi-round dialogues** reflecting real-world SE interactions
27
+ - **Repository-aware context** via RepoChat for authentic evaluations
28
+ - **Novel metrics** including model consistency score and conversation efficiency index
29
+ - **Transparent, open-source leaderboard** with advanced ranking algorithms
30
+ - **Code execution** across multiple languages in sandboxed environments
31
+
32
+ Perfect for researchers and engineers seeking nuanced, context-aware assessments of AI models on software engineering tasks.
33
+
34
+ [![SWE-Model-Arena](https://img.shields.io/badge/🏟️-Try%20SWE--Model--Arena-blue?style=for-the-badge)](https://huggingface.co/spaces/SWE-Arena/Software-Engineering-Arena)
35
+
36
+
37
+
38
+ ## 📊 GitHub-Based Agent Tracking Suite
39
+
40
+ Evaluate AI coding agents through their actual GitHub activity with our comprehensive tracking systems:
41
+
42
+ ### [SWE-Commit](https://github.com/Software-Engineering-Arena/SWE-Commit)
43
+ Track and analyze AI coding agents by their **GitHub commits**—measuring code quality, consistency, and contribution patterns.
44
+
45
+ [![SWE-Commit](https://img.shields.io/badge/🏟️-Try%20SWE--Commit-red?style=for-the-badge)](https://huggingface.co/spaces/SWE-Arena/SWE-Commit)
46
+
47
+ ### [SWE-PR](https://github.com/Software-Engineering-Arena/SWE-PR)
48
+ Assess AI agents via their **pull request workflows**—examining merge success rates, discussion quality, and iterative improvements.
49
+
50
+ [![SWE-PR](https://img.shields.io/badge/🏟️-Try%20SWE--PR-purple?style=for-the-badge)](https://huggingface.co/spaces/SWE-Arena/SWE-PR)
51
+
52
+ ### [SWE-Review](https://github.com/Software-Engineering-Arena/SWE-Review)
53
+ Evaluate AI agents through their **code review activity**—assessing feedback quality, issue identification, and collaborative capabilities.
54
+
55
+ [![SWE-Review](https://img.shields.io/badge/🏟️-Try%20SWE--Review-green?style=for-the-badge)](https://huggingface.co/spaces/SWE-Arena/SWE-Review)
56
+
57
+ ### [SWE-Issue](https://github.com/Software-Engineering-Arena/SWE-Issue)
58
+ Monitor how AI agents handle **issue tracking**—from bug reports to feature requests and documentation.
59
+
60
+ [![SWE-Issue](https://img.shields.io/badge/🏟️-Try%20SWE--Issue-yellow?style=for-the-badge)](https://huggingface.co/spaces/SWE-Arena/SWE-Issue)
61
+
62
+
63
+
64
+ ## 🎯 Our Mission
65
+
66
+ Software engineering extends far beyond code generation—it encompasses requirements engineering, collaborative design, code review, debugging, and project management. Current evaluation frameworks often focus narrowly on code completion or generation.
67
+
68
+ **Software Engineering Arena** provides:
69
+
70
+ - ✅ **Holistic evaluation** across diverse SE activities
71
+ - ✅ **Multi-turn interactions** matching real-world workflows
72
+ - ✅ **Transparent methodologies** for reproducible research
73
+ - ✅ **Open-source tools** for community-driven innovation
74
+ - ✅ **Rich datasets** to advance AI-assisted software development
75
+
76
+
77
+
78
+ ## 🤝 Get Involved
79
+
80
+ We're actively seeking collaborators! Whether you're a:
81
+ - 🔬 **Researcher** developing new evaluation metrics
82
+ - 🛠️ **Engineer** building AI coding tools
83
+ - 📊 **Data scientist** analyzing model performance
84
+ - 🌐 **Open source contributor** improving our platforms
85
+
86
+ **Ways to contribute:**
87
+ - Submit PRs to enhance our evaluation platforms
88
+ - Propose new metrics or tracking methodologies
89
+ - Share datasets or evaluation results
90
+ - Report issues and suggest improvements
91
+ - Join discussions in our repositories
92
+
93
+
94
+
95
+ ## 📚 Learn More
96
+
97
+ - 📄 **Paper**: [SWE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering](https://arxiv.org/abs/2502.01860)
98
+ - 🌐 **Platform**: [Try SWE-Model-Arena on Hugging Face](https://huggingface.co/spaces/SWE-Arena/SWE-Model-Arena)
99
+
100
+
101
+
102
+ ## 📄 License
103
+
104
+ All projects under Software Engineering Arena are licensed under the **Apache 2.0 License**. Data collected and open-sourced follows the same license.
105
+ <div align="center">
106
+
107
+ **Building the future of AI-assisted software engineering, one evaluation at a time.**
108
+
109
+ </div>