Spaces:

Navigam
/

jira-to-code

Sleeping

App Files Files Community

jira-to-code / README.md

Navigam

docs: update README task list with detailed objective descriptions and expand 72b_eval.txt coverage

fe3a0f1 3 months ago

preview code

Raw

History Blame Contribute Delete

6.98 kB

metadata

title: Jira-To-Code Agent Environment
emoji: 🛠️
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
license: mit
app_port: 7860

🚀 Jira-To-Code: The Ultimate RL Coding Environment

Jira-To-Code is a high-fidelity Reinforcement Learning (RL) environment designed for the Meta/Hugging Face OpenEnv Hackathon. It challenges AI agents to act as Senior Software Engineers by resolving real-world Jira tickets within a secure, sandboxed codebase.

🌟 Key Features

⚡ ReAct Ready: Built-in support for Thought-Action-Observation loops.
🧠 Episodic Memory: Maintains full conversational history for multi-turn reasoning.
📊 22 Diverse Tasks: From simple bug fixes to complex architecture, concurrency, and security.
📈 Rich Reward Shaping: Partial credit for passing tests, step penalties for efficiency, and shaping rewards for active coding.
🛡️ Robust Parsing: Resilient JSON extraction and self-correction prompt injection.

🏗️ Environment Architecture

Agents interact with the environment via a standardized FastAPI interface:

Action	Description
`list_files`	Explore the workspace directory structure.
`read_file`	Read the content of a specific file.
`write_file`	Create or overwrite code in the workspace.
`run_tests`	Execute `pytest` and receive detailed traceback output.
`submit`	Finalize the task and receive the definitive score.

🎯 Available Tasks (22 Total)

Task ID	Level	Objective
`easy`	Easy	Fix off-by-one bug in `calculator.add()`.
`easy_2`	Easy	Fix case-sensitivity bug in `string_utils.count_vowels()`.
`easy_3`	Easy	API KeyError: use `.get()` with fallback for missing `phone_number`.
`easy_4`	Easy	Off-by-One Pagination: Fix math index logic in `get_page_bounds`.
`easy_5`	Easy	FastAPI Route Typo: Align `user_id` route param with function arg.
`medium`	Medium	Implement `format_user_data()` dictionary mapping specs.
`medium_2`	Medium	Implement complex `Email` and `Password` validation logic.
`medium_3`	Medium	Missing Auth Middleware: Apply `@require_auth` to `/api/billing`.
`medium_4`	Medium	ORM N+1 Problem: Rewrite fetches to use JOINs (`select_related`).
`medium_5`	Medium	Regex Validation: Fix email regex to allow plus sign (`+`).
`medium_6`	Medium	Error Handling: Add try/except fallback for currency rate timeouts.
`medium_7`	Medium	Stale Cache: Add Redis invalidation to `update_user_profile`.
`medium_8`	Medium	Timezone Naive: Make naive datetimes UTC aware.
`medium_9`	Medium	State Machine: Add transition guards (CANCELLED -> SHIPPED).
`medium_10`	Medium	Config Merge: Fix recursion logic for nested dict merges.
`hard`	Hard	Implement `LRUCache` with $O(1)$ time complexity.
`hard_2`	Hard	Implement `DirectedGraph` with BFS/DFS and Topological Sort.
`hard_3`	Hard	Circular Dependency: Refactor `models/utils/config` via `base.py`.
`hard_4`	Hard	Race Condition: Refactor threaded worker to use `queue.Queue`.
`hard_5`	Hard	OOM Generator: Rewrite `readlines()` loop to use `yield` generators.
`hard_6`	Hard	Implementation: Code `StripeGateway` matching `PaymentGateway` ABC.
`hard_7`	Hard	Async Deadlock: Fix lock release safety using async context managers.

🚀 Getting Started

1. Local Setup

# Clone the repository
git clone https://huggingface.co/spaces/Navigam/jira-to-code
cd jira-to-code

# Create and activate environment
uv venv
source .venv/bin/activate  # Or .venv\Scripts\activate on Windows

# Install dependencies
uv pip install -e .

2. Run Inference

# Run the full baseline agent against all tasks
uv run python inference.py

# Run a specific task
uv run python inference.py --tasks easy_2,medium

3. Docker Deployment

docker build -t jira-to-code .
docker run -p 7860:7860 jira-to-code

🛠️ Deep Dive: Design & Rubric Alignment

🎨 Creativity & Novelty

Real-World Software Engineering Domain: While most RL environments focus on games or simplified logic, Jira-To-Code provides a high-stakes, documentation-driven coding domain. Agents are forced to interpret edge cases from docstrings (e.g., case-insensitivity in vowel counting) just like real developers.
Non-Sparse Reward Mechanics: We move away from binary "Pass/Fail" signals. The environment rewards "Progress Toward Solution" by parsing intermediate test results.

📈 Reward Signal Design

The environment provides a dense, informative reward signal to guide agent learning, ensuring all step scores are strictly continuously bounded:

Strict Bounds (0.01 to 0.99): To comply with grading requirements, every step evaluated unconditionally maps to a boundary range strictly between 0 and 1. The literal 0.0 and 1.0 are actively bypassed.
Action & Thinking Weightage: The first 3 orientation steps (e.g., listing/reading files, thinking) receive a +0.02 bonus shaping token to reward early planning.
Efficiency Penalty: For all steps beyond the 3rd step, a -0.01 penalty is continuously applied to minimize rewards for agents taking excessively long.
Linear Partial Credit: Intermediate run_tests and the final submit rewards are calculated proportionally as (passed_tests / total_tests).

🧱 Episode & Workspace Design

Isolation & Reset: Every reset() call generates a cryptographically unique, isolated temporary directory. This ensures the agent starts with a "Clean Slate" and prevents cross-contamination between tasks or episodes.
Atomic Boundaries: An episode concludes when the agent calls submit or reaches MAX_STEPS.
Deterministic Grading: Graders are based on hidden unit tests (pytest) that are immutable within the environment container, ensuring 100% reproducible scoring.

🏆 Scoring Rubric Alignment

This environment is optimized for high marks in the OpenEnv Hackathon:

Real-world Utility: Models a developer's daily workflow.
Task/Grader Quality: Deterministic pytest grading with partial credit.
Environment Design: Gymnasium-style API with comprehensive observation space.
Code Quality: Passes openenv validate and follows strict Pydantic typing.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

_{Built with ❤️ for the OpenEnv Hackathon by Navigam}