SyamSashank commited on
Commit
df58858
·
verified ·
1 Parent(s): d934356

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -9
README.md CHANGED
@@ -14,6 +14,7 @@ tags:
14
  - openenv
15
  ---
16
 
 
17
  # CodeReviewEnv
18
 
19
  A realistic OpenEnv environment where an AI agent performs code review on Python code snippets.
@@ -44,18 +45,25 @@ git clone <your-space-url>
44
  cd codereview-env
45
  docker build -t codereview-env .
46
  docker run -p 7860:7860 codereview-env
47
- Baseline Inference
48
- bash
49
- export OPENAI_API_KEY=your_key
 
 
 
50
  export ENV_URL=http://localhost:7860
51
  python inference.py
52
- Expected baseline scores (GPT-4o-mini):
 
 
 
 
 
53
 
54
- Easy: ~0.92
55
 
56
- Medium: ~0.78
57
 
58
- Hard: ~0.54
59
 
60
- Deploy to HF Spaces
61
- Create a Space with Docker, push this repo, and set environment variables API_BASE_URL, MODEL_NAME, HF_TOKEN.
 
14
  - openenv
15
  ---
16
 
17
+
18
  # CodeReviewEnv
19
 
20
  A realistic OpenEnv environment where an AI agent performs code review on Python code snippets.
 
45
  cd codereview-env
46
  docker build -t codereview-env .
47
  docker run -p 7860:7860 codereview-env
48
+ ```
49
+
50
+ ## Baseline Inference
51
+
52
+ ```bash
53
+ export GROQ_API_KEY=your_key
54
  export ENV_URL=http://localhost:7860
55
  python inference.py
56
+ ```
57
+
58
+ Expected baseline scores (Llama-3-70B-8192):
59
+ - Easy: ~0.95
60
+ - Medium: ~0.82
61
+ - Hard: ~0.60
62
 
63
+ ## Deploy to HF Spaces
64
 
65
+ Create a Space with Docker, push this repo, and set environment variables `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`.
66
 
67
+ ---
68
 
69
+ This implementation satisfies all OpenEnv requirements, including real-world utility, varying difficulty, 0.0-1.0 grading, and reproducible baseline inference.