SyamSashank commited on
Commit
abbca25
·
verified ·
1 Parent(s): 6e7ce30

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -16
README.md CHANGED
@@ -1,3 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # CodeReviewEnv
2
 
3
  A realistic OpenEnv environment where an AI agent performs code review on Python code snippets.
@@ -28,25 +43,18 @@ git clone <your-space-url>
28
  cd codereview-env
29
  docker build -t codereview-env .
30
  docker run -p 7860:7860 codereview-env
31
- ```
32
-
33
- ## Baseline Inference
34
-
35
- ```bash
36
- export GROQ_API_KEY=your_key
37
  export ENV_URL=http://localhost:7860
38
  python inference.py
39
- ```
40
 
41
- Expected baseline scores (Llama-3-70B-8192):
42
- - Easy: ~0.95
43
- - Medium: ~0.82
44
- - Hard: ~0.60
45
 
46
- ## Deploy to HF Spaces
47
 
48
- Create a Space with Docker, push this repo, and set environment variables `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`.
49
-
50
- ---
51
 
52
- This implementation satisfies all OpenEnv requirements, including real-world utility, varying difficulty, 0.0-1.0 grading, and reproducible baseline inference.
 
 
1
+ ---
2
+ title: Codereview Env
3
+ emoji: 🔥
4
+ colorFrom: red
5
+ colorTo: green
6
+ sdk: docker
7
+ sdk_version: "1.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: OpenEnv environment for code review tasks.
12
+ tags:
13
+ - openenv
14
+ ---
15
+
16
  # CodeReviewEnv
17
 
18
  A realistic OpenEnv environment where an AI agent performs code review on Python code snippets.
 
43
  cd codereview-env
44
  docker build -t codereview-env .
45
  docker run -p 7860:7860 codereview-env
46
+ Baseline Inference
47
+ bash
48
+ export OPENAI_API_KEY=your_key
 
 
 
49
  export ENV_URL=http://localhost:7860
50
  python inference.py
51
+ Expected baseline scores (GPT-4o-mini):
52
 
53
+ Easy: ~0.92
 
 
 
54
 
55
+ Medium: ~0.78
56
 
57
+ Hard: ~0.54
 
 
58
 
59
+ Deploy to HF Spaces
60
+ Create a Space with Docker, push this repo, and set environment variables API_BASE_URL, MODEL_NAME, HF_TOKEN.