Spaces:

Krish-05
/

OpenEnv

Sleeping

App Files Files Community

krishnachoudhary-hclguvi commited on Apr 8

Commit

2182d10

unverified ·

1 Parent(s): eaac76b

Deploy OpenEnv Code Review to HF Spaces

Browse files

Files changed (7) hide show

Dockerfile +29 -0
README.md +59 -11
app.py +27 -0
checklist.md +30 -0
code_review_env.py +88 -0
inference.py +77 -0
requirements.txt +7 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,29 @@

+# Use an official Python runtime as a parent image
+FROM python:3.10-slim
+# Set environment variables to avoid writing .pyc files and buffer stdout
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+# Hugging Face Spaces requires running as a non-root user
+RUN useradd -m -u 1000 user
+USER user
+# Set up the home directory and path
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH
+# Set the working directory inside the container
+WORKDIR $HOME/app
+# Copy the current directory contents into the container and set ownership
+COPY --chown=user:user . $HOME/app
+# Install any needed packages specified in requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt
+# Expose port 7860 (Hugging Face Spaces default)
+EXPOSE 7860
+# Command to run the Gradio UI
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,11 +1,59 @@
----
-title: OpenEnv
-emoji: 🌖
-colorFrom: purple
-colorTo: pink
-sdk: docker
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# OpenEnv Environment Submission
+This repository contains the submission for the **Meta PyTorch OpenEnv Hackathon — Round 1**.
+## Overview
+Implement an RL-style environment that follows the OpenEnv framework by Meta and Hugging Face. The environment exposes tasks, actions, step execution, and reward scoring.
+**Domain:** Custom Domain (e.g. Email triage, Scheduling, Code Review)
+## Project Structure
+```
+openEnv/
+├── inference.py     # Main execution script emitting required [START], [STEP], [END] logs.
+├── requirements.txt # Project dependencies
+├── README.md        # This file
+├── spec.md          # Full Hackathon Specification
+└── checklist.md     # Submission Verification Checklist
+```
+## Setup & Execution
+### Prerequisites
+- Python 3.9+
+- OpenAI Python client (`openai>=1.0.0`)
+### Installation
+```bash
+pip install -r requirements.txt
+```
+### Environment Variables
+For inference script to run, the following environment variables are supported/required:
+- `HF_TOKEN`: Required. Hugging Face Access Token.
+- `API_BASE_URL`: Base URL for OpenAI client (Default: `https://api.openai.com/v1`)
+- `MODEL_NAME`: The Language Model name (Default: `gpt-3.5-turbo`)
+- `OPENAI_API_KEY`: API Key if hitting OpenAI directly or external OpenAI-compatible APIs.
+```bash
+export HF_TOKEN="your_hf_token"
+export OPENAI_API_KEY="your_api_key"
+```
+### Run
+Ensure you output exactly to `stdout` for the metrics collection:
+```bash
+python inference.py
+```
+### Output Formatting
+The script outputs logs specifically formatted for the autograder:
+- `[START] task=xyz env=abc model=mymodel`
+- `[STEP] step=1 action=abc reward=0.00 done=false error=null`
+- `[END] success=true steps=5 rewards=0.00,1.00`
+## Hugging Face Spaces Deployment
+*URL: `https://huggingface.co/spaces/YOUR_USER_ID/YOUR_SPACE_NAME`*
+This project is configured to run efficiently on Hugging Face Spaces under the **2 vCPU & 8 GB RAM** limitation constraint, with valid docker-based build processes.

app.py ADDED Viewed

	@@ -0,0 +1,27 @@

+import gradio as gr
+import subprocess
+def run_inference():
+    try:
+        # Run inference.py and capture exact output
+        result = subprocess.run(['python', 'inference.py'], capture_output=True, text=True, timeout=30)
+        return result.stdout + "\n" + result.stderr
+    except subprocess.TimeoutExpired:
+        return "Process timed out after 30 seconds."
+    except Exception as e:
+        return str(e)
+with gr.Blocks(title="OpenEnv Code Review Hackathon", theme=gr.themes.Soft()) as app:
+    gr.Markdown("# OpenEnv Environment: Code Review")
+    gr.Markdown("This interface runs the `inference.py` backend and displays the `[START]`, `[STEP]`, `[END]` output strictly required by the hackathon spec.")
+    with gr.Row():
+        run_btn = gr.Button("Run Inference Agent")
+    with gr.Row():
+        output_display = gr.Textbox(label="Agent Output Log", lines=15, interactive=False)
+    run_btn.click(fn=run_inference, outputs=output_display)
+if __name__ == "__main__":
+    app.launch(server_name="0.0.0.0", server_port=7860)

checklist.md ADDED Viewed

	@@ -0,0 +1,30 @@

+# Submission Verification Checklist
+Before submitting your project, double-check that all constraints and formats are satisfied.
+### Required Files
+- [ ] `inference.py` exists in the project root
+- [ ] `requirements.txt` is updated and working
+- [ ] `README.md` features clear instructions and your Demo URL
+- [ ] Demo script/video (if applicable)
+### Environment & Integrations
+- [ ] `API_BASE_URL` reads properly and falls back to a default value
+- [ ] `MODEL_NAME` reads properly and falls back to a default value
+- [ ] `HF_TOKEN` is verified and successfully read
+- [ ] The OpenAI Python Client SDK is strictly used for all LLM calls (no `requests` module directly)
+### Evaluation Constraints
+- [ ] Exact output format for `[START]` is used
+- [ ] Exact output format for `[STEP]` is used
+- [ ] Exact output format for `[END]` is used (always emitted)
+- [ ] Rewards log formatted exactly to `2` decimal places (e.g. `1.00`, not `1.0` or `1`)
+- [ ] Booleans printed strictly as lowercase `true` or `false` (e.g., `success=true`, `done=false`)
+### Hugging Face Space & Operations
+- [ ] Hugging Face Space is Public and deployed in a 'Running' state
+- [ ] Unnecessary unused Spaces are disabled or turned off
+- [ ] The Space/inference runs cleanly within `2 vCPU` and `8 GB RAM` limits
+- [ ] The dockerization / environment does not rely on unpublished local-only dependencies
+Good luck on Round 1!

code_review_env.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import os
+import sys
+import json
+from datasets import load_dataset
+class CodeReviewEnv:
+    def __init__(self, dataset_name="Krish-05/krish-bug-detect-fix", split="train"):
+        self.task_name = "code_review_task"
+        self.benchmark_name = "krish_bug_detect_benchmark"
+        self.dataset_name = dataset_name
+        self.split = split
+        self.steps_taken = 0
+        self.rewards = []
+        self.current_sample = None
+        self.max_steps = 5
+        self._load_dataset()
+    def _load_dataset(self):
+        try:
+            self.dataset = load_dataset(self.dataset_name, split=self.split)
+            self.current_idx = 0
+        except Exception as e:
+            print(f"Error loading dataset: {e}")
+            self.dataset = None
+    def reset(self):
+        self.steps_taken = 0
+        self.rewards = []
+        if self.dataset is None:
+            return "Error: Dataset not loaded."
+        self.current_sample = self.dataset[self.current_idx]
+        self.current_idx = (self.current_idx + 1) % len(self.dataset)
+        # NOTE: Adjusting these keys ('instruction', 'input', 'output' or similar)
+        # depending on the actual schema of Krish-05/krish-bug-detect-fix
+        buggy_code = self.current_sample.get('buggy_code', self.current_sample.get('input', 'No code found'))
+        observation = f"""You are a senior code reviewer. Please review the following code:
+{buggy_code}
+Available actions:
+1. COMMENT <line_number> <issue_description>
+2. APPROVE
+3. REQUEST_CHANGES
+"""
+        return observation
+    def step(self, action):
+        self.steps_taken += 1
+        done = False
+        reward = 0.0
+        action = action.strip()
+        if action.startswith("COMMENT"):
+            # Acknowledge comment but typically delay final reward until the end
+            reward = 0.5  # Intermediate reward for finding something to comment on
+            obs = "Comment recorded. Any other issues, or are you ready to APPROVE / REQUEST_CHANGES?"
+        elif action == "APPROVE":
+            # If the code had bugs but the agent approved, negative reward. Let's assume there's always a bug in this dataset.
+            reward = -1.0
+            done = True
+            obs = "You approved flawed code."
+        elif action == "REQUEST_CHANGES":
+            # Good job, they rejected buggy code
+            reward = 1.0
+            done = True
+            obs = "Changes requested successfully."
+        else:
+            reward = -0.1
+            obs = "Invalid action format. Use COMMENT <line> <text>, APPROVE, or REQUEST_CHANGES."
+            if self.steps_taken >= self.max_steps:
+               done = True
+        if self.steps_taken >= self.max_steps:
+            done = True
+        self.rewards.append(reward)
+        formatted_reward = f"{reward:.2f}"
+        return obs, formatted_reward, done, None

inference.py ADDED Viewed

	@@ -0,0 +1,77 @@

+import os
+import sys
+import traceback
+from openai import OpenAI
+from code_review_env import CodeReviewEnv
+# -------------------------------------------------------------------
+# Configuration & Environment Variables
+# -------------------------------------------------------------------
+API_BASE_URL = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "gpt-3.5-turbo")
+HF_TOKEN = os.getenv("HF_TOKEN")
+def validate_environment():
+    """Ensure required environment variables like HF_TOKEN are present."""
+    if not HF_TOKEN:
+        print("[STEP] step=0 action=init reward=0.00 done=true error=HF_TOKEN_missing")
+        print("[END] success=false steps=0 rewards=")
+        sys.exit(1)
+# -------------------------------------------------------------------
+# Main Inference Loop
+# -------------------------------------------------------------------
+def main():
+    validate_environment()
+    # Initialize OpenAI Client (per requirements, use OpenAI Python client)
+    client = OpenAI(
+        base_url=API_BASE_URL,
+        api_key=os.getenv("OPENAI_API_KEY", "dummy_if_not_needed_for_custom_endpoint")
+    )
+    env = CodeReviewEnv()
+    # [START] Output
+    print(f"[START] task={env.task_name} env={env.benchmark_name} model={MODEL_NAME}")
+    success = False
+    try:
+        obs = env.reset()
+        done = False
+        while not done:
+            # Replace dummy action with actual LLM generation using the standard OpenAI client
+            response = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[
+                    {"role": "system", "content": "You are a precise code reviewer. Your ONLY allowed outputs are: 'COMMENT <line> <text>', 'APPROVE', or 'REQUEST_CHANGES'."},
+                    {"role": "user", "content": obs}
+                ],
+                max_tokens=100
+            )
+            action_str = response.choices[0].message.content.strip()
+            obs, reward_str, done, error = env.step(action_str)
+            error_str = error if error else "null"
+            done_str = "true" if done else "false"
+            # [STEP] Output
+            print(f"[STEP] step={env.steps_taken} action={action_str} reward={reward_str} done={done_str} error={error_str}")
+        success = True
+    except Exception as e:
+        error_msg = str(e).replace('\n', ' ')
+        print(f"[STEP] step={env.steps_taken} action=error reward=0.00 done=true error={error_msg}")
+        success = False
+    finally:
+        # [END] Output MUST ALWAYS be emitted, even on exceptions
+        success_str = "true" if success else "false"
+        rewards_str = ",".join([f"{r:.2f}" for r in env.rewards])
+        print(f"[END] success={success_str} steps={env.steps_taken} rewards={rewards_str}")
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+openai>=1.0.0
+python-dotenv
+datasets
+gradio
+# Add any required OpenEnv or domain-specific packages below:
+# openenv