Spaces:
No application file
No application file
| title: Code Debugging Challenge | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - openenv | |
| - reinforcement-learning | |
| - code-debugging | |
| - agentic-ai | |
| # π Code Debugging Challenge - OpenEnv Environment | |
| A production-ready OpenEnv environment where AI agents learn to debug Python code. | |
| ## π― Overview | |
| This environment challenges AI agents to identify and fix bugs in Python code snippets using the official **OpenEnv framework** from Meta-PyTorch and Hugging Face. | |
| **Key Features:** | |
| - β Built with official OpenEnv library | |
| - β WebSocket-based client-server architecture | |
| - β Docker containerized for isolation | |
| - β Compatible with TRL, Torchforge, and other RL frameworks | |
| - β Production-ready with proper session management | |
| ## ποΈ Environment Details | |
| - **Action Space**: 4 discrete actions (analyze, fix, test, submit) | |
| - **Observation Space**: Structured observations with code, errors, and feedback | |
| - **Reward Structure**: | |
| - +1.0 for successful fix | |
| - -0.2 to -0.5 for failed attempts | |
| - +0.1 for analysis actions | |
| - -1.0 for premature submission | |
| - **Episode Length**: Max 5 attempts per bug | |
| ## π Bug Types Included | |
| 1. **Argument Count Errors** - Wrong number of function arguments | |
| 2. **Logic Errors** - Incorrect loop variables and conditions | |
| 3. **Exception Handling** - Missing error handling for edge cases | |
| 4. **Index Errors** - Array/string index out of bounds | |
| 5. **Infinite Recursion** - Recursive calls without base case reduction | |
| 6. **Type Errors** - String/integer concatenation issues | |
| 7. **Key Errors** - Missing dictionary keys | |
| ## π Quick Start | |
| ### Using Docker (Recommended) | |
| ```python | |
| from code_debug_env.client import DebugEnv | |
| # Automatically starts Docker container and connects | |
| env = DebugEnv.from_hub("openenv/code-debug-env") | |
| # Reset to get first challenge | |
| result = env.reset() | |
| print(result.observation.buggy_code) | |
| print(f"Expected output: {result.observation.expected_output}") | |
| # Take action | |
| from code_debug_env.models import DebugAction | |
| action = DebugAction(action_type="test") | |
| result = env.step(action) | |
| print(f"Reward: {result.reward}") | |
| # Cleanup | |
| env.close() | |
| ``` | |
| ## π§ Integration with RL Frameworks | |
| ### With TRL (Transformer Reinforcement Learning) | |
| ```python | |
| from trl import OnlineDPOConfig, OnlineDPOTrainer | |
| from code_debug_env.client import DebugEnv | |
| config = OnlineDPOConfig(...) | |
| trainer = OnlineDPOTrainer( | |
| config=config, | |
| env=DebugEnv.from_hub("openenv/code-debug-env"), | |
| # ... other args | |
| ) | |
| trainer.train() | |
| ``` | |
| ## π OpenEnv Challenge Submission | |
| This environment is submitted to the **OpenEnv Challenge: SOTA Environments to Drive General Intelligence** (UC Berkeley AgentBeats Competition). | |
| ## π License | |
| Apache 2.0 | |