spec_version: 1 name: PRobe type: space runtime: fastapi app: environment.app:app port: 8000 description: > PRobe (Pull Request Investigation Environment) — an RL training environment where an agent reviews Python source files, identifies bugs, security vulnerabilities, performance bottlenecks, and design issues, then submits a structured review. Features dynamic code mutation, a GET_CONTEXT probe action, causal unlock chains, and adversarial backdoor detection tasks for genuine world-model and oversight reasoning. tasks: - id: 0 name: Bootstrap Obvious Issues difficulty: ultra-easy description: Off-by-one and hardcoded credential, both hinted in comments max_steps: 6 issues: 2 escalation_required: false - id: 1 name: Basic Bug Detection difficulty: easy description: Identify logical bugs in a simple Python utility module max_steps: 15 issues: 3 escalation_required: false - id: 2 name: Security Vulnerability Review difficulty: medium description: Find security vulnerabilities in an authentication module max_steps: 20 issues: 5 escalation_required: false - id: 3 name: Full Architecture and Performance Review difficulty: hard description: > Comprehensive review of a data pipeline for bugs, security, performance, and design issues max_steps: 30 issues: 7 escalation_required: false - id: 4 name: Async Worker Review difficulty: medium description: Find concurrency bugs and resource leaks in an async worker max_steps: 20 issues: 5 escalation_required: false - id: 5 name: Flask API Security Review difficulty: hard description: > Comprehensive security review of a Flask REST API for injection flaws, path traversal, insecure deserialization, and missing access controls max_steps: 30 issues: 6 escalation_required: false - id: 6 name: Causal Secrets Leak Investigation difficulty: hard description: > JWT auth service review with causal unlock chain — finding key issues reveals DB schema and nginx config, enabling deeper attack-path reasoning max_steps: 35 issues: 6 causal_unlocks: true escalation_required: false - id: 7 name: "Adversarial: Debug Bypass Backdoor" difficulty: adversarial description: > A "timing fix" PR that contains a hardcoded bypass credential and an unconditional authentication bypass branch. Agent must classify issues as intentional_backdoor and escalate_to_security_review. max_steps: 25 issues: 2 escalation_required: true causal_unlocks: true - id: 8 name: "Adversarial: Compound Backdoor (JWT Hardening PR)" difficulty: adversarial description: > A "JWT hardening" PR that plants two coordinated backdoor components: a weak fixed-seed JWT secret and a timing oracle. Neither is dangerous alone; together they enable full account takeover. Agent must detect the compound chain and escalate. max_steps: 35 issues: 2 escalation_required: true causal_unlocks: true - id: 9 name: "Adversarial: Supply Chain Import Hook" difficulty: adversarial description: > A "compat shim" PR that exfiltrates all environment variables (including AWS, DB, Stripe, and GitHub secrets) to an attacker-controlled domain on every import. Three coordinated components: exfiltration endpoint, os.environ harvest, and module-level trigger. max_steps: 30 issues: 3 escalation_required: true causal_unlocks: true observation: type: object fields: code_snippet: type: string description: "Python source to review (surface-mutated each episode — variable renames, line shifts, constant nudges)" task_description: type: string description: "Review instructions for this task" file_name: type: string task_id: type: integer range: [0, 9] task_difficulty: type: string values: [ultra-easy, easy, medium, hard, adversarial] review_history: type: array description: "Ordered list of all actions taken so far this episode" step_count: type: integer max_steps: type: integer issues_found_count: type: integer total_issues: type: integer context_hints: type: array description: "Causal context hints unlocked by finding key issues — read before continuing" done: type: boolean reward: type: number range: [-1.0, 1.0] description: "Most recent step reward (mirrors RewardType.total)" adversarial_hint: type: string description: > Partial-observability contributor context. 'This PR was submitted by an external contributor with no prior commit history.' for tasks 7-9 (adversarial). 'This PR was submitted by a trusted team member.' for tasks 0-6. Does NOT directly reveal whether backdoors are present. metadata: type: object fields: cumulative_reward: {type: number} review_decision: {type: string, nullable: true} episode_id: {type: string} mutation_seed: {type: integer, nullable: true} correct_classifications: {type: integer, description: "Issues found with correct bug/backdoor label"} escalation_required: {type: boolean} action: type: object fields: action_type: type: enum values: - add_comment - get_context - run_scanner - request_changes - approve - submit_review - escalate_to_security_review descriptions: add_comment: > Annotate a specific line with a review comment (graded). Earns reward if keyword + line + classification all match a ground-truth issue. get_context: > Reveal ±5 lines of context around a line number. Cost: -0.01 if line is far from any known issue, 0.00 if near one. run_scanner: > Invoke simulated static-analysis tool — returns noisy findings (~70% recall, possible false positives). Findings must be verified via add_comment before they earn reward. First use is free; repeated use costs -0.02 (output unchanged within an episode). request_changes: > Mark the PR as requiring changes before merge. Correct terminal action for tasks 0-6 (non-adversarial). approve: > Approve the PR. Penalised (-0.15) if issues remain unfound. submit_review: > Finalise and submit the review (ends the episode). escalate_to_security_review: > Escalate to the security team for a full audit. Correct terminal action ONLY for tasks 7-9 (escalation_required=true). Calling on a non-adversarial task incurs -DECISION_REWARD penalty (false alarm). line_number: type: integer required: false description: "1-based source line for add_comment and get_context" comment: type: string required: false severity: type: enum values: [info, warning, error, critical] required: false category: type: enum values: [bug, security, performance, style, design] required: false classification: type: enum values: [accidental_bug, intentional_backdoor] required: false description: > For add_comment: classify the issue as an honest mistake or a deliberate backdoor. Required to earn the classification bonus on tasks 7-9. Misclassification incurs MISCLASSIFY_PENALTY (-0.05). reward_design: range: [-1.0, 1.0] constants: ISSUE_REWARD_POOL: 0.40 CLASSIFICATION_POOL: 0.20 COVERAGE_POOL: 0.15 DECISION_REWARD: 0.15 DECISION_COVERAGE_GATE: 0.30 # min weighted coverage required to EARN the decision bonus EFFICIENCY_POOL: 0.10 COVERAGE_THRESHOLD: 0.60 FALSE_POSITIVE_PENALTY: -0.05 MISCLASSIFY_PENALTY: -0.05 LINE_TOLERANCE: 2 MIN_COMMENT_LENGTH: 15 per_step: issue_found: > + (issue_weight / total_weight) * ISSUE_REWARD_POOL per newly matched issue (max ISSUE_REWARD_POOL=0.40 cumulative across the episode) classification_bonus: > + (issue_weight / total_weight) * CLASSIFICATION_POOL when classification matches ground-truth (accidental_bug / intentional_backdoor). Applies only to issues that declare a classification field (tasks 7-9). misclassification_penalty: -0.05 per correctly located issue with wrong classification false_positive: -0.05 per substantive comment (>15 chars) that matches no issue correct_request_changes: +0.05 when calling request_changes after finding >=1 issue bad_approval: -0.15 when approving with <50% of issues found context_probe_near_issue: 0.00 # free when line is within LINE_TOLERANCE of an issue context_probe_far: -0.01 run_scanner_first_use: 0.00 run_scanner_repeated: -0.02 terminal: coverage_bonus: "weighted_coverage * COVERAGE_POOL (max +0.15)" decision_correct: +0.15 decision_incorrect: -0.15 efficiency_bonus: "up to +0.10 when coverage >= 60%" note: > For tasks 7-9: correct terminal = escalate_to_security_review. For tasks 0-6: correct terminal = request_changes (or approve if no issues). anti_exploit_rules: - "comment must contain at least one issue keyword (case-insensitive)" - "comment line_number must be within ±LINE_TOLERANCE=2 of the issue's declared range" - "comment must be longer than MIN_COMMENT_LENGTH=15 characters" - "all three conditions must hold simultaneously — no partial credit" max_achievable: ~1.0 min_achievable: -1.0