Update README.md
Browse files
README.md
CHANGED
|
@@ -7,6 +7,103 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
<div align="center">
|
| 11 |
|
| 12 |
+
# Software Engineering Arena
|
| 13 |
+
|
| 14 |
+
</div>
|
| 15 |
+
|
| 16 |
+
[Software Engineering Arena](https://huggingface.co/SWE-Arena) is an open-source initiative to transparently evaluate and track AI coding agents across real-world software engineering workflows. We provide interactive platforms, tracking systems, and novel metrics to advance the field of AI-assisted software development.
|
| 17 |
+
|
| 18 |
+
**Welcome collaboration from research labs, independent contributors, and the broader SE community!**
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## 🏟️ [SWE-Model-Arena](https://github.com/Software-Engineering-Arena/SWE-Model-Arena): Interactive Multi-Round Model Evaluation
|
| 23 |
+
|
| 24 |
+
An interactive platform for evaluating foundation models through **pairwise comparisons** in multi-round conversational workflows. Unlike static benchmarks, SWE-Model-Arena enables:
|
| 25 |
+
|
| 26 |
+
- **Multi-round dialogues** reflecting real-world SE interactions
|
| 27 |
+
- **Repository-aware context** via RepoChat for authentic evaluations
|
| 28 |
+
- **Novel metrics** including model consistency score and conversation efficiency index
|
| 29 |
+
- **Transparent, open-source leaderboard** with advanced ranking algorithms
|
| 30 |
+
- **Code execution** across multiple languages in sandboxed environments
|
| 31 |
+
|
| 32 |
+
Perfect for researchers and engineers seeking nuanced, context-aware assessments of AI models on software engineering tasks.
|
| 33 |
+
|
| 34 |
+
[](https://huggingface.co/spaces/SWE-Arena/Software-Engineering-Arena)
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
## 📊 GitHub-Based Agent Tracking Suite
|
| 39 |
+
|
| 40 |
+
Evaluate AI coding agents through their actual GitHub activity with our comprehensive tracking systems:
|
| 41 |
+
|
| 42 |
+
### [SWE-Commit](https://github.com/Software-Engineering-Arena/SWE-Commit)
|
| 43 |
+
Track and analyze AI coding agents by their **GitHub commits**—measuring code quality, consistency, and contribution patterns.
|
| 44 |
+
|
| 45 |
+
[](https://huggingface.co/spaces/SWE-Arena/SWE-Commit)
|
| 46 |
+
|
| 47 |
+
### [SWE-PR](https://github.com/Software-Engineering-Arena/SWE-PR)
|
| 48 |
+
Assess AI agents via their **pull request workflows**—examining merge success rates, discussion quality, and iterative improvements.
|
| 49 |
+
|
| 50 |
+
[](https://huggingface.co/spaces/SWE-Arena/SWE-PR)
|
| 51 |
+
|
| 52 |
+
### [SWE-Review](https://github.com/Software-Engineering-Arena/SWE-Review)
|
| 53 |
+
Evaluate AI agents through their **code review activity**—assessing feedback quality, issue identification, and collaborative capabilities.
|
| 54 |
+
|
| 55 |
+
[](https://huggingface.co/spaces/SWE-Arena/SWE-Review)
|
| 56 |
+
|
| 57 |
+
### [SWE-Issue](https://github.com/Software-Engineering-Arena/SWE-Issue)
|
| 58 |
+
Monitor how AI agents handle **issue tracking**—from bug reports to feature requests and documentation.
|
| 59 |
+
|
| 60 |
+
[](https://huggingface.co/spaces/SWE-Arena/SWE-Issue)
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
## 🎯 Our Mission
|
| 65 |
+
|
| 66 |
+
Software engineering extends far beyond code generation—it encompasses requirements engineering, collaborative design, code review, debugging, and project management. Current evaluation frameworks often focus narrowly on code completion or generation.
|
| 67 |
+
|
| 68 |
+
**Software Engineering Arena** provides:
|
| 69 |
+
|
| 70 |
+
- ✅ **Holistic evaluation** across diverse SE activities
|
| 71 |
+
- ✅ **Multi-turn interactions** matching real-world workflows
|
| 72 |
+
- ✅ **Transparent methodologies** for reproducible research
|
| 73 |
+
- ✅ **Open-source tools** for community-driven innovation
|
| 74 |
+
- ✅ **Rich datasets** to advance AI-assisted software development
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
## 🤝 Get Involved
|
| 79 |
+
|
| 80 |
+
We're actively seeking collaborators! Whether you're a:
|
| 81 |
+
- 🔬 **Researcher** developing new evaluation metrics
|
| 82 |
+
- 🛠️ **Engineer** building AI coding tools
|
| 83 |
+
- 📊 **Data scientist** analyzing model performance
|
| 84 |
+
- 🌐 **Open source contributor** improving our platforms
|
| 85 |
+
|
| 86 |
+
**Ways to contribute:**
|
| 87 |
+
- Submit PRs to enhance our evaluation platforms
|
| 88 |
+
- Propose new metrics or tracking methodologies
|
| 89 |
+
- Share datasets or evaluation results
|
| 90 |
+
- Report issues and suggest improvements
|
| 91 |
+
- Join discussions in our repositories
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
## 📚 Learn More
|
| 96 |
+
|
| 97 |
+
- 📄 **Paper**: [SWE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering](https://arxiv.org/abs/2502.01860)
|
| 98 |
+
- 🌐 **Platform**: [Try SWE-Model-Arena on Hugging Face](https://huggingface.co/spaces/SWE-Arena/SWE-Model-Arena)
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
## 📄 License
|
| 103 |
+
|
| 104 |
+
All projects under Software Engineering Arena are licensed under the **Apache 2.0 License**. Data collected and open-sourced follows the same license.
|
| 105 |
+
<div align="center">
|
| 106 |
+
|
| 107 |
+
**Building the future of AI-assisted software engineering, one evaluation at a time.**
|
| 108 |
+
|
| 109 |
+
</div>
|