Your Name commited on
Commit
58ca26f
Β·
1 Parent(s): 991240b

polish: fix leaderboard data, update links, correct app_file paths, fix openenv.yaml

Browse files
Files changed (3) hide show
  1. README.md +11 -10
  2. openenv.yaml +1 -1
  3. pyproject.toml +3 -3
README.md CHANGED
@@ -4,7 +4,7 @@ emoji: πŸ—οΈ
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: docker
7
- app_file: app.py
8
  pinned: false
9
  ---
10
  <div align="center">
@@ -15,7 +15,7 @@ pinned: false
15
 
16
  [![OpenEnv Compliant](https://img.shields.io/badge/OpenEnv-βœ“%20Compliant-2563eb?style=for-the-badge)](https://github.com/openenv)
17
  [![Python 3.11+](https://img.shields.io/badge/Python-3.11+-16a34a?style=for-the-badge)](https://python.org)
18
- [![HF Spaces](https://img.shields.io/badge/πŸ€—-Live%20Demo-ff9d00?style=for-the-badge)](https://huggingface.co/spaces)
19
  [![Docker](https://img.shields.io/badge/Docker-Ready-0ea5e9?style=for-the-badge)](https://docker.com)
20
  [![License MIT](https://img.shields.io/badge/License-MIT-8b5cf6?style=for-the-badge)](LICENSE)
21
 
@@ -78,16 +78,17 @@ Current benchmarks (HumanEval, SWE-bench, MBPP) treat code generation as a **sin
78
 
79
  ## πŸ† Leaderboard
80
 
81
- *3 runs per (model Γ— task) Β· best run counts Β· weighted by task difficulty*
 
82
 
83
  | Rank | Model | TeamForge Score | Easy (20%) | Medium (35%) | Hard (45%) | Avg Steps |
84
  |:----:|-------|:--------------:|:----------:|:------------:|:----------:|:---------:|
85
- | πŸ₯‡ | `llama3-70b-8192` | **0.7841** | 0.970 | 0.762 | 0.621 | 22.3 |
86
- | πŸ₯ˆ | `llama3-8b-8192` | **0.5934** | 0.890 | 0.541 | 0.412 | 28.7 |
87
- | πŸ₯‰ | `mixtral-8x7b-32768` | **0.4812** | 0.780 | 0.410 | 0.332 | 33.1 |
88
- | 4 | `gemma2-9b-it` | **0.3521** | 0.620 | 0.290 | 0.211 | 37.8 |
89
 
90
- **Submit your model** β†’ run `python evaluation.py --model <name> --runs 3` and open a PR with `results/<model>/eval_<timestamp>.json`
 
 
91
 
92
  ---
93
 
@@ -244,7 +245,7 @@ The delta-based test bonus provides a smooth gradient toward correctness β€” cri
244
  ### No API key needed
245
  ```bash
246
  # 1. Clone
247
- git clone https://github.com/YOUR_USERNAME/teamforge.git
248
  cd teamforge
249
 
250
  # 2. Install
@@ -330,7 +331,7 @@ docker run teamforge pytest tests/test_environment.py -v
330
  ```bash
331
  # 1. Create a new Gradio Space on huggingface.co/spaces
332
  # 2. Clone your Space
333
- git clone https://huggingface.co/spaces/YOUR_USERNAME/teamforge
334
  cd teamforge
335
 
336
  # 3. Copy project files
 
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: docker
7
+ app_file: server/app.py
8
  pinned: false
9
  ---
10
  <div align="center">
 
15
 
16
  [![OpenEnv Compliant](https://img.shields.io/badge/OpenEnv-βœ“%20Compliant-2563eb?style=for-the-badge)](https://github.com/openenv)
17
  [![Python 3.11+](https://img.shields.io/badge/Python-3.11+-16a34a?style=for-the-badge)](https://python.org)
18
+ [![HF Spaces](https://img.shields.io/badge/πŸ€—-Live%20Demo-ff9d00?style=for-the-badge)](https://huggingface.co/spaces/PrakashCider/teamforge)
19
  [![Docker](https://img.shields.io/badge/Docker-Ready-0ea5e9?style=for-the-badge)](https://docker.com)
20
  [![License MIT](https://img.shields.io/badge/License-MIT-8b5cf6?style=for-the-badge)](LICENSE)
21
 
 
78
 
79
  ## πŸ† Leaderboard
80
 
81
+ *Results are from agentic evaluation runs via the OpenEnv Hackathon scoring pipeline.*
82
+ *3 runs per (model Γ— task) Β· best run counts Β· weighted by task difficulty (Easy 20% / Medium 35% / Hard 45%)*
83
 
84
  | Rank | Model | TeamForge Score | Easy (20%) | Medium (35%) | Hard (45%) | Avg Steps |
85
  |:----:|-------|:--------------:|:----------:|:------------:|:----------:|:---------:|
86
+ | β€” | `llama3-8b-8192` *(baseline)* | *pending Phase 2* | β€” | β€” | β€” | β€” |
87
+ | β€” | `llama3-70b-8192` | *pending Phase 2* | β€” | β€” | β€” | β€” |
 
 
88
 
89
+ > πŸ“¬ **Submit your model score** β†’ run `python evaluation.py --model <name> --runs 3` and open a PR with `results/<model>/eval_<timestamp>.json`
90
+
91
+ > βš™οΈ Phase 2 agentic evaluation scores will be filled in when the hackathon pipeline completes.
92
 
93
  ---
94
 
 
245
  ### No API key needed
246
  ```bash
247
  # 1. Clone
248
+ git clone https://github.com/Prakash-codeMaker/teamforge.git
249
  cd teamforge
250
 
251
  # 2. Install
 
331
  ```bash
332
  # 1. Create a new Gradio Space on huggingface.co/spaces
333
  # 2. Clone your Space
334
+ git clone https://huggingface.co/spaces/PrakashCider/teamforge
335
  cd teamforge
336
 
337
  # 3. Copy project files
openenv.yaml CHANGED
@@ -171,7 +171,7 @@ inference:
171
  deployment:
172
  dockerfile: Dockerfile
173
  huggingface_spaces: true
174
- gradio_app: app.py
175
 
176
  # ── API Endpoints (for OpenEnv validator) ──────────────────────────────────────
177
  api:
 
171
  deployment:
172
  dockerfile: Dockerfile
173
  huggingface_spaces: true
174
+ gradio_app: server/app.py
175
 
176
  # ── API Endpoints (for OpenEnv validator) ──────────────────────────────────────
177
  api:
pyproject.toml CHANGED
@@ -34,9 +34,9 @@ teamforge-benchmark = "benchmark:main"
34
  server = "server.app:main"
35
 
36
  [project.urls]
37
- Homepage = "https://github.com/yourname/teamforge"
38
- Documentation = "https://github.com/yourname/teamforge#readme"
39
- Issues = "https://github.com/yourname/teamforge/issues"
40
 
41
  [tool.ruff]
42
  line-length = 88
 
34
  server = "server.app:main"
35
 
36
  [project.urls]
37
+ Homepage = "https://github.com/Prakash-codeMaker/teamforge"
38
+ Documentation = "https://github.com/Prakash-codeMaker/teamforge#readme"
39
+ Issues = "https://github.com/Prakash-codeMaker/teamforge/issues"
40
 
41
  [tool.ruff]
42
  line-length = 88