Spaces:

PrakashCider
/

teamforge

Sleeping

App Files Files Community

Your Name commited on Apr 11

Commit

58ca26f

1 Parent(s): 991240b

polish: fix leaderboard data, update links, correct app_file paths, fix openenv.yaml

Browse files

Files changed (3) hide show

README.md +11 -10
openenv.yaml +1 -1
pyproject.toml +3 -3

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ emoji: 🏗️
 colorFrom: blue
 colorTo: green
 sdk: docker
-app_file: app.py
 pinned: false
 ---
 <div align="center">
@@ -15,7 +15,7 @@ pinned: false
 [![OpenEnv Compliant](https://img.shields.io/badge/OpenEnv-✓%20Compliant-2563eb?style=for-the-badge)](https://github.com/openenv)
 [![Python 3.11+](https://img.shields.io/badge/Python-3.11+-16a34a?style=for-the-badge)](https://python.org)
-[![HF Spaces](https://img.shields.io/badge/🤗-Live%20Demo-ff9d00?style=for-the-badge)](https://huggingface.co/spaces)
 [![Docker](https://img.shields.io/badge/Docker-Ready-0ea5e9?style=for-the-badge)](https://docker.com)
 [![License MIT](https://img.shields.io/badge/License-MIT-8b5cf6?style=for-the-badge)](LICENSE)
@@ -78,16 +78,17 @@ Current benchmarks (HumanEval, SWE-bench, MBPP) treat code generation as a **sin
 ## 🏆 Leaderboard
-*3 runs per (model × task) · best run counts · weighted by task difficulty*
 | Rank | Model | TeamForge Score | Easy (20%) | Medium (35%) | Hard (45%) | Avg Steps |
 |:----:|-------|:--------------:|:----------:|:------------:|:----------:|:---------:|
-| 🥇 | `llama3-70b-8192` | **0.7841** | 0.970 | 0.762 | 0.621 | 22.3 |
-| 🥈 | `llama3-8b-8192` | **0.5934** | 0.890 | 0.541 | 0.412 | 28.7 |
-| 🥉 | `mixtral-8x7b-32768` | **0.4812** | 0.780 | 0.410 | 0.332 | 33.1 |
-| 4 | `gemma2-9b-it` | **0.3521** | 0.620 | 0.290 | 0.211 | 37.8 |
-**Submit your model** → run `python evaluation.py --model <name> --runs 3` and open a PR with `results/<model>/eval_<timestamp>.json`
 ---
@@ -244,7 +245,7 @@ The delta-based test bonus provides a smooth gradient toward correctness — cri
 ### No API key needed
 ```bash
 # 1. Clone
-git clone https://github.com/YOUR_USERNAME/teamforge.git
 cd teamforge
 # 2. Install
@@ -330,7 +331,7 @@ docker run teamforge pytest tests/test_environment.py -v
 ```bash
 # 1. Create a new Gradio Space on huggingface.co/spaces
 # 2. Clone your Space
-git clone https://huggingface.co/spaces/YOUR_USERNAME/teamforge
 cd teamforge
 # 3. Copy project files

 colorFrom: blue
 colorTo: green
 sdk: docker
+app_file: server/app.py
 pinned: false
 ---
 <div align="center">
 [![OpenEnv Compliant](https://img.shields.io/badge/OpenEnv-✓%20Compliant-2563eb?style=for-the-badge)](https://github.com/openenv)
 [![Python 3.11+](https://img.shields.io/badge/Python-3.11+-16a34a?style=for-the-badge)](https://python.org)
+[![HF Spaces](https://img.shields.io/badge/🤗-Live%20Demo-ff9d00?style=for-the-badge)](https://huggingface.co/spaces/PrakashCider/teamforge)
 [![Docker](https://img.shields.io/badge/Docker-Ready-0ea5e9?style=for-the-badge)](https://docker.com)
 [![License MIT](https://img.shields.io/badge/License-MIT-8b5cf6?style=for-the-badge)](LICENSE)
 ## 🏆 Leaderboard
+*Results are from agentic evaluation runs via the OpenEnv Hackathon scoring pipeline.*
+*3 runs per (model × task) · best run counts · weighted by task difficulty (Easy 20% / Medium 35% / Hard 45%)*
 | Rank | Model | TeamForge Score | Easy (20%) | Medium (35%) | Hard (45%) | Avg Steps |
 |:----:|-------|:--------------:|:----------:|:------------:|:----------:|:---------:|
+| — | `llama3-8b-8192` *(baseline)* | *pending Phase 2* | — | — | — | — |
+| — | `llama3-70b-8192` | *pending Phase 2* | — | — | — | — |
+> 📬 **Submit your model score** → run `python evaluation.py --model <name> --runs 3` and open a PR with `results/<model>/eval_<timestamp>.json`
+> ⚙️ Phase 2 agentic evaluation scores will be filled in when the hackathon pipeline completes.
 ---
 ### No API key needed
 ```bash
 # 1. Clone
+git clone https://github.com/Prakash-codeMaker/teamforge.git
 cd teamforge
 # 2. Install
 ```bash
 # 1. Create a new Gradio Space on huggingface.co/spaces
 # 2. Clone your Space
+git clone https://huggingface.co/spaces/PrakashCider/teamforge
 cd teamforge
 # 3. Copy project files

openenv.yaml CHANGED Viewed

@@ -171,7 +171,7 @@ inference:
 deployment:
   dockerfile: Dockerfile
   huggingface_spaces: true
-  gradio_app: app.py
 # ── API Endpoints (for OpenEnv validator) ──────────────────────────────────────
 api:

 deployment:
   dockerfile: Dockerfile
   huggingface_spaces: true
+  gradio_app: server/app.py
 # ── API Endpoints (for OpenEnv validator) ──────────────────────────────────────
 api:

pyproject.toml CHANGED Viewed

@@ -34,9 +34,9 @@ teamforge-benchmark = "benchmark:main"
 server              = "server.app:main"
 [project.urls]
-Homepage      = "https://github.com/yourname/teamforge"
-Documentation = "https://github.com/yourname/teamforge#readme"
-Issues        = "https://github.com/yourname/teamforge/issues"
 [tool.ruff]
 line-length = 88

 server              = "server.app:main"
 [project.urls]
+Homepage      = "https://github.com/Prakash-codeMaker/teamforge"
+Documentation = "https://github.com/Prakash-codeMaker/teamforge#readme"
+Issues        = "https://github.com/Prakash-codeMaker/teamforge/issues"
 [tool.ruff]
 line-length = 88