Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: SWE-Chatbot-Arena
emoji: π―
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Chatbot arena for software engineering tasks
SWE-Chatbot-Arena
An open-source arena for evaluating LLMs on real software engineering tasks β multi-turn, context-rich, and repo-aware.
Unlike general-purpose arenas (e.g., LMArena), this platform focuses on iterative SE workflows: debugging, code review, refactoring, and more.
How It Works
- Sign in with your Hugging Face account
- Enter an SE task (optionally include a repo URL for automatic context injection via RepoChat)
- Two anonymous models respond β compare them over multiple rounds
- Vote for the better model
Try it: SWE-Chatbot-Arena on HF Spaces
Key Capabilities
- RepoChat β injects repo context (issues, commits, PRs) into conversations
- Multi-round evaluation β tests contextual understanding across turns
- Rich metrics β Elo, PageRank, eigenvector centrality, modularity clustering, self-play consistency, efficiency index
- Guardrails β
gpt-oss-safeguard-20bfilters non-SE requests
Contributing
Submit SE tasks, report bugs, or send PRs. Open an issue to get started.
Citation
@inproceedings{zhao2025se,
title={SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering},
author={Zhao, Zhimin},
booktitle={2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)},
pages={78--81},
year={2025},
organization={IEEE}
}