Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,6 +14,7 @@ tags:
|
|
| 14 |
- openenv
|
| 15 |
---
|
| 16 |
|
|
|
|
| 17 |
# CodeReviewEnv
|
| 18 |
|
| 19 |
A realistic OpenEnv environment where an AI agent performs code review on Python code snippets.
|
|
@@ -44,18 +45,25 @@ git clone <your-space-url>
|
|
| 44 |
cd codereview-env
|
| 45 |
docker build -t codereview-env .
|
| 46 |
docker run -p 7860:7860 codereview-env
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
| 50 |
export ENV_URL=http://localhost:7860
|
| 51 |
python inference.py
|
| 52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
|
| 60 |
-
|
| 61 |
-
Create a Space with Docker, push this repo, and set environment variables API_BASE_URL, MODEL_NAME, HF_TOKEN.
|
|
|
|
| 14 |
- openenv
|
| 15 |
---
|
| 16 |
|
| 17 |
+
|
| 18 |
# CodeReviewEnv
|
| 19 |
|
| 20 |
A realistic OpenEnv environment where an AI agent performs code review on Python code snippets.
|
|
|
|
| 45 |
cd codereview-env
|
| 46 |
docker build -t codereview-env .
|
| 47 |
docker run -p 7860:7860 codereview-env
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
## Baseline Inference
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
export GROQ_API_KEY=your_key
|
| 54 |
export ENV_URL=http://localhost:7860
|
| 55 |
python inference.py
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Expected baseline scores (Llama-3-70B-8192):
|
| 59 |
+
- Easy: ~0.95
|
| 60 |
+
- Medium: ~0.82
|
| 61 |
+
- Hard: ~0.60
|
| 62 |
|
| 63 |
+
## Deploy to HF Spaces
|
| 64 |
|
| 65 |
+
Create a Space with Docker, push this repo, and set environment variables `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`.
|
| 66 |
|
| 67 |
+
---
|
| 68 |
|
| 69 |
+
This implementation satisfies all OpenEnv requirements, including real-world utility, varying difficulty, 0.0-1.0 grading, and reproducible baseline inference.
|
|
|