Spaces:
Sleeping
Sleeping
Your Name commited on
Commit Β·
58ca26f
1
Parent(s): 991240b
polish: fix leaderboard data, update links, correct app_file paths, fix openenv.yaml
Browse files- README.md +11 -10
- openenv.yaml +1 -1
- pyproject.toml +3 -3
README.md
CHANGED
|
@@ -4,7 +4,7 @@ emoji: ποΈ
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: green
|
| 6 |
sdk: docker
|
| 7 |
-
app_file: app.py
|
| 8 |
pinned: false
|
| 9 |
---
|
| 10 |
<div align="center">
|
|
@@ -15,7 +15,7 @@ pinned: false
|
|
| 15 |
|
| 16 |
[](https://github.com/openenv)
|
| 17 |
[](https://python.org)
|
| 18 |
-
[](https://huggingface.co/spaces)
|
| 19 |
[](https://docker.com)
|
| 20 |
[](LICENSE)
|
| 21 |
|
|
@@ -78,16 +78,17 @@ Current benchmarks (HumanEval, SWE-bench, MBPP) treat code generation as a **sin
|
|
| 78 |
|
| 79 |
## π Leaderboard
|
| 80 |
|
| 81 |
-
*
|
|
|
|
| 82 |
|
| 83 |
| Rank | Model | TeamForge Score | Easy (20%) | Medium (35%) | Hard (45%) | Avg Steps |
|
| 84 |
|:----:|-------|:--------------:|:----------:|:------------:|:----------:|:---------:|
|
| 85 |
-
|
|
| 86 |
-
|
|
| 87 |
-
| π₯ | `mixtral-8x7b-32768` | **0.4812** | 0.780 | 0.410 | 0.332 | 33.1 |
|
| 88 |
-
| 4 | `gemma2-9b-it` | **0.3521** | 0.620 | 0.290 | 0.211 | 37.8 |
|
| 89 |
|
| 90 |
-
**Submit your model** β run `python evaluation.py --model <name> --runs 3` and open a PR with `results/<model>/eval_<timestamp>.json`
|
|
|
|
|
|
|
| 91 |
|
| 92 |
---
|
| 93 |
|
|
@@ -244,7 +245,7 @@ The delta-based test bonus provides a smooth gradient toward correctness β cri
|
|
| 244 |
### No API key needed
|
| 245 |
```bash
|
| 246 |
# 1. Clone
|
| 247 |
-
git clone https://github.com/
|
| 248 |
cd teamforge
|
| 249 |
|
| 250 |
# 2. Install
|
|
@@ -330,7 +331,7 @@ docker run teamforge pytest tests/test_environment.py -v
|
|
| 330 |
```bash
|
| 331 |
# 1. Create a new Gradio Space on huggingface.co/spaces
|
| 332 |
# 2. Clone your Space
|
| 333 |
-
git clone https://huggingface.co/spaces/
|
| 334 |
cd teamforge
|
| 335 |
|
| 336 |
# 3. Copy project files
|
|
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: green
|
| 6 |
sdk: docker
|
| 7 |
+
app_file: server/app.py
|
| 8 |
pinned: false
|
| 9 |
---
|
| 10 |
<div align="center">
|
|
|
|
| 15 |
|
| 16 |
[](https://github.com/openenv)
|
| 17 |
[](https://python.org)
|
| 18 |
+
[](https://huggingface.co/spaces/PrakashCider/teamforge)
|
| 19 |
[](https://docker.com)
|
| 20 |
[](LICENSE)
|
| 21 |
|
|
|
|
| 78 |
|
| 79 |
## π Leaderboard
|
| 80 |
|
| 81 |
+
*Results are from agentic evaluation runs via the OpenEnv Hackathon scoring pipeline.*
|
| 82 |
+
*3 runs per (model Γ task) Β· best run counts Β· weighted by task difficulty (Easy 20% / Medium 35% / Hard 45%)*
|
| 83 |
|
| 84 |
| Rank | Model | TeamForge Score | Easy (20%) | Medium (35%) | Hard (45%) | Avg Steps |
|
| 85 |
|:----:|-------|:--------------:|:----------:|:------------:|:----------:|:---------:|
|
| 86 |
+
| β | `llama3-8b-8192` *(baseline)* | *pending Phase 2* | β | β | β | β |
|
| 87 |
+
| β | `llama3-70b-8192` | *pending Phase 2* | β | β | β | β |
|
|
|
|
|
|
|
| 88 |
|
| 89 |
+
> π¬ **Submit your model score** β run `python evaluation.py --model <name> --runs 3` and open a PR with `results/<model>/eval_<timestamp>.json`
|
| 90 |
+
|
| 91 |
+
> βοΈ Phase 2 agentic evaluation scores will be filled in when the hackathon pipeline completes.
|
| 92 |
|
| 93 |
---
|
| 94 |
|
|
|
|
| 245 |
### No API key needed
|
| 246 |
```bash
|
| 247 |
# 1. Clone
|
| 248 |
+
git clone https://github.com/Prakash-codeMaker/teamforge.git
|
| 249 |
cd teamforge
|
| 250 |
|
| 251 |
# 2. Install
|
|
|
|
| 331 |
```bash
|
| 332 |
# 1. Create a new Gradio Space on huggingface.co/spaces
|
| 333 |
# 2. Clone your Space
|
| 334 |
+
git clone https://huggingface.co/spaces/PrakashCider/teamforge
|
| 335 |
cd teamforge
|
| 336 |
|
| 337 |
# 3. Copy project files
|
openenv.yaml
CHANGED
|
@@ -171,7 +171,7 @@ inference:
|
|
| 171 |
deployment:
|
| 172 |
dockerfile: Dockerfile
|
| 173 |
huggingface_spaces: true
|
| 174 |
-
gradio_app: app.py
|
| 175 |
|
| 176 |
# ββ API Endpoints (for OpenEnv validator) ββββββββββββββββββββββββββββββββββββββ
|
| 177 |
api:
|
|
|
|
| 171 |
deployment:
|
| 172 |
dockerfile: Dockerfile
|
| 173 |
huggingface_spaces: true
|
| 174 |
+
gradio_app: server/app.py
|
| 175 |
|
| 176 |
# ββ API Endpoints (for OpenEnv validator) ββββββββββββββββββββββββββββββββββββββ
|
| 177 |
api:
|
pyproject.toml
CHANGED
|
@@ -34,9 +34,9 @@ teamforge-benchmark = "benchmark:main"
|
|
| 34 |
server = "server.app:main"
|
| 35 |
|
| 36 |
[project.urls]
|
| 37 |
-
Homepage = "https://github.com/
|
| 38 |
-
Documentation = "https://github.com/
|
| 39 |
-
Issues = "https://github.com/
|
| 40 |
|
| 41 |
[tool.ruff]
|
| 42 |
line-length = 88
|
|
|
|
| 34 |
server = "server.app:main"
|
| 35 |
|
| 36 |
[project.urls]
|
| 37 |
+
Homepage = "https://github.com/Prakash-codeMaker/teamforge"
|
| 38 |
+
Documentation = "https://github.com/Prakash-codeMaker/teamforge#readme"
|
| 39 |
+
Issues = "https://github.com/Prakash-codeMaker/teamforge/issues"
|
| 40 |
|
| 41 |
[tool.ruff]
|
| 42 |
line-length = 88
|