AIMLxDIV commited on
Commit
cae4a95
·
0 Parent(s):

first commit

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AgentOrg CodeReview Environment
2
+
3
+ AI Senior Code Reviewer evaluation environment.
4
+
5
+ ## Tasks
6
+ 1. **Bug Detection**: Identify logical errors and edge cases.
7
+ 2. **Security Audit**: Detect vulnerabilities (OWASP Top 10).
8
+ 3. **Architectural Review**: Evaluate design patterns and system constraints.
9
+
10
+ ## Installation
11
+
12
+ ```bash
13
+ python3 -m venv venv
14
+ source venv/bin/activate
15
+ pip install -r requirements.txt
16
+ ```
17
+
18
+ ## Running the Environment
19
+
20
+ ### 1. Start the API Server
21
+ ```bash
22
+ PYTHONPATH=. python3 app.py
23
+ ```
24
+ *Server runs on port **7860** (Hugging Face standard).*
25
+
26
+ ### 2. Run Baseline Agent
27
+ ```bash
28
+ PYTHONPATH=. python3 scripts/baseline.py --url http://localhost:7860
29
+ ```
30
+
31
+ ## Features
32
+ - **Deterministic Grading**: MoE-inspired confidence-weighted matching.
33
+ - **Noise Budget**: Penalizes false positives to prevent gaming the system.
34
+ - **WebSocket Stream**: Real-time event broadcasting on `/ws/events`.
35
+ - **Leaderboard**: In-memory tracking of top agent performances.
36
+
37
+ ## Verification
38
+
39
+ Run the full test suite to ensure everything is functional:
40
+
41
+ ```bash
42
+ PYTHONPATH=. pytest tests/
43
+ ```
44
+
45
+ Individual component tests:
46
+ - `tests/test_graders.py`: Scoring logic unit tests.
47
+ - `tests/test_env.py`: State machine integration tests.
48
+ - `tests/test_api.py`: FastAPI contract tests.
49
+
50
+ ## Roadmap & Progress
51
+ The environment is currently **Production Ready** and follows the standard OpenEnv specification.
52
+ - [x] 30 Synthetic Scenarios (Bug, Security, Architecture)
53
+ - [x] Deterministic specialized graders
54
+ - [x] Thin FastAPI gateway with WebSocket event streaming
55
+ - [x] Comprehensive test coverage
56
+ # open-ev-code-handler