Spaces:
Running
Running
File size: 3,855 Bytes
474eafa 3583511 474eafa 3583511 474eafa 3583511 474eafa 3583511 561b3cf 474eafa 3583511 cd2bb35 3583511 561b3cf 474eafa 3583511 561b3cf 3583511 561b3cf 474eafa 3583511 474eafa 3583511 561b3cf 3583511 561b3cf 474eafa 4a2e8a2 474eafa 3583511 474eafa 3583511 474eafa 3583511 474eafa 3583511 561b3cf 3583511 474eafa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | # OpenEnv Environment Specification
# This file describes the Code Security Review environment for the Meta PyTorch OpenEnv Hackathon.
# Metadata section details the environment's identity.
name: code-security-review
version: "1.0.0"
description: >
An RL environment for training AI agents to perform code security review.
Agents analyze code snippets from production pull requests and identify bugs,
vulnerabilities, and security issues.
author: Inmodel Labs
# Tasks section defines the core challenges in the environment.
# Each task has a unique ID, name, description, and difficulty level.
tasks:
- id: python-off-by-one
name: "Python Off-by-One Error"
description: "Identify an off-by-one index error in a Python finance batch processor"
difficulty: easy
max_steps: 2
reward_range: [0.0, 1.0]
- id: js-idor-auth
name: "JavaScript IDOR Authorization Bypass"
description: "Identify a horizontal privilege escalation (IDOR) in a Node.js REST profile endpoint"
difficulty: medium
max_steps: 2
reward_range: [0.0, 1.0]
- id: python-pickle-deserialization
name: "Python Pickle Deserialization"
description: "Identify an insecure deserialization vulnerability using pickle in a background worker"
difficulty: hard
max_steps: 2
reward_range: [0.0, 1.0]
# The Action space defines the format of the agent's response.
# Each field is scored by the grader to provide partial progress signals.
action_space:
type: object
description: >
Two-phase action space. Phase 1: submit {"request_file": true} to unlock
the code snippet (+0.20 reward). Phase 2: submit a full review JSON.
properties:
request_file: { type: boolean, description: "Phase 1: Request the hidden file contents" }
bug_identified: { type: boolean, description: "Boolean: true if a bug exists" }
bug_location: { type: string, description: "String: Pinpoint the bug's location in code" }
bug_type: { type: string, description: "String: off-by-one | logic-error | insecure-deserialization | none" }
bug_description: { type: string, description: "String: Detailed analysis of the vulnerability" }
severity: { type: string, enum: [none, low, medium, high, critical], description: "String: none | low | medium | high | critical" }
suggested_fix: { type: string, description: "String: How to fix the identified bug" }
# The Observation space defines what the agent sees at each step.
# It uses a structured context to help the agent understand the code's purpose.
observation_space:
type: object
properties:
task_id: { type: string, description: "Unique task identifier" }
language: { type: string, description: "Source code language" }
difficulty: { type: string, enum: [easy, medium, hard], description: "Task complexity (easy/medium/hard)" }
code_snippet: { type: string, description: "The source code to be reviewed" }
context: { type: string, description: "Real-world context (e.g., API description)" }
pr_title: { type: string, description: "Pull Request title for additional intent context" }
file_path: { type: string, description: "Relative path to the file in the repository" }
# Reward structure for evaluating agent performance.
reward:
min: 0.0
max: 1.0
description: >
Step 1 — File request: +0.20 (flat, always granted).
Step 2 — Bug review: partial rewards for bug identification (0.20),
correct bug type (0.20), precise location (0.10), description quality (0.25,
keyword density), fix quality (0.15), correct severity (0.10).
Episode total is clamped to [0.0, 1.0]. Grader penalizes keyword stuffing.
endpoints:
health: GET /
reset: POST /reset
step: POST /step
state: GET /state
tasks: GET /tasks
|