doosaganesh's picture
Upload README.md
8595a08 verified
---
title: Git Conflict Resolver
emoji: πŸ”€
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags: ["openenv"]
---
# πŸ”€ Git Conflict Resolver β€” OpenEnv Environment
An RL environment where AI agents learn to resolve Git merge conflicts.
Built for the **OpenEnv Hackathon** (Meta Γ— Hugging Face Γ— Scaler).
---
## 🧠 Environment Description & Motivation
Merge conflicts are a daily reality for every software team. Resolving them correctly requires understanding both the syntactic structure of code **and** the semantic intent of diverging branches β€” a genuinely hard task for AI agents.
This environment presents agents with real Python files containing `<<<<<<< HEAD`, `=======`, and `>>>>>>>` conflict markers. The agent must produce a clean, fully resolved file β€” no markers, valid syntax, and correct logic.
Unlike toy environments, this simulates:
- **Easy conflicts**: Accept an obvious incoming change (e.g. updated timeout value)
- **Medium conflicts**: Apply different resolution strategies per conflict block
- **Hard conflicts**: Combine additive changes from *both* branches (not just pick one)
---
## πŸ“ Action & Observation Space
### Observation Space (structured JSON)
| Field | Type | Description |
|-------|------|-------------|
| `task_name` | string | Current task identifier |
| `task_description` | string | Natural language instructions for the agent |
| `filename` | string | Name of the file being resolved |
| `file_language` | string | Language of the file (`python`, `text`) |
| `conflicted_content` | string | Full file content with conflict markers |
| `branch_ours` | string | Name of the HEAD (current) branch |
| `branch_theirs` | string | Name of the incoming branch |
| `num_conflicts` | integer | Number of `<<<<<<<` blocks in the file |
| `last_attempt` | string \| null | Agent's previous resolution (for retry) |
| `last_error` | string \| null | Grading feedback from last step |
| `step` | integer | Current step number |
| `max_steps` | integer | Maximum allowed steps (10) |
| `done` | boolean | Whether the episode is finished |
### Action Space
```json
{
"resolved_content": "<full resolved file as a string>"
}
```
The agent outputs the **complete file content** with **all conflict markers removed**.
---
## πŸ“‹ Task Descriptions
### Task 1: `single_conflict` β€” Easy
- **File:** `config.py`
- **Conflicts:** 1 block
- **Description:** A timeout value was changed from 30s to 60s on a feature branch. The agent must accept the incoming change.
- **Expected difficulty:** Any capable LLM should solve this in 1–2 steps.
### Task 2: `multi_conflict` β€” Medium
- **File:** `user_service.py`
- **Conflicts:** 3 blocks
- **Description:** Authentication was refactored. Each block requires a different resolution: accept new import, keep original constant, accept new function implementation.
- **Expected difficulty:** Requires reading context across blocks.
### Task 3: `logic_conflict` β€” Hard
- **File:** `data_pipeline.py`
- **Conflicts:** 2 blocks
- **Description:** Both branches added valid, additive features. The agent **must combine** them β€” not simply pick one side. Requires understanding code semantics.
- **Expected difficulty:** Frontier models (GPT-4, Qwen-72B) score ~0.5–0.7 without specific tuning.
---
## πŸ† Reward Function
The reward is **shaped** β€” agents get feedback at every step, not just at the end.
| Signal | Value | Trigger |
|--------|-------|---------|
| Improvement bonus | `+0.75 Γ— score_delta` | Score improves over previous step |
| Marker-free bonus | `+0.10` | First time no markers remain |
| Perfect match bonus | `+0.25` | Score reaches 1.0 |
| Stagnation penalty | `-0.10` | Identical submission as previous step |
| Step cost | `-0.01 Γ— (step/max_steps)` | Every step |
### Grading Breakdown (per step)
| Component | Score | Criterion |
|-----------|-------|-----------|
| `no_markers` | 0.25 | No `<<<<<<<`, `=======`, `>>>>>>>` in output |
| `valid_syntax` | 0.25 | File parses as valid Python (AST check) |
| `similarity` | 0.25 | Fuzzy match ratio vs. expected resolution |
| `exact_match` | 0.25 | Character-exact match with expected output |
---
## πŸš€ Setup & Usage
### Local Setup
```bash
# 1. Install dependencies
pip install -r server/requirements.txt
# 2. Start the server
uvicorn server.main:app --host 0.0.0.0 --port 7860 --app-dir server
# 3. Test the endpoints
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task": "single_conflict"}'
curl -X POST http://localhost:7860/step \
-H "Content-Type: application/json" \
-d '{"resolved_content": "# your resolved content here"}'
curl http://localhost:7860/state
```
### Docker
```bash
# Build
docker build -t git-conflict-resolver .
# Run
docker run -p 7860:7860 git-conflict-resolver
# Verify
curl http://localhost:7860/health
```
### Run Baseline Inference
```bash
export HF_TOKEN=your_token_here
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
export ENV_URL=http://localhost:7860
python inference.py
```
---
## πŸ“Š Baseline Scores
> Scores obtained using `Qwen/Qwen2.5-72B-Instruct` via HF Inference API.
| Task | Score | Steps | Success |
|------|-------|-------|---------|
| `single_conflict` | 1.00 | 1 | βœ… |
| `multi_conflict` | 0.75 | 3 | ❌ |
| `logic_conflict` | 0.50 | 5 | ❌ |
| **Average** | **0.75** | β€” | β€” |
---
## πŸ“ Project Structure
```
openenv_hackathon/
β”œβ”€β”€ server/
β”‚ β”œβ”€β”€ main.py # FastAPI server β€” /reset /step /state /health
β”‚ β”œβ”€β”€ env.py # Core environment logic (reset/step/state/close)
β”‚ β”œβ”€β”€ models.py # Pydantic Observation, Action, Reward models
β”‚ β”œβ”€β”€ tasks.py # Task definitions (3 tasks with conflict content)
β”‚ β”œβ”€β”€ graders.py # Deterministic graders (marker, AST, similarity, exact)
β”‚ β”œβ”€β”€ reward.py # Shaped reward function
β”‚ └── requirements.txt
β”œβ”€β”€ inference.py # Baseline inference script (root β€” required)
β”œβ”€β”€ openenv.yaml # OpenEnv metadata (required for openenv validate)
β”œβ”€β”€ Dockerfile # Container build (port 7860 for HF Spaces)
└── README.md
```
---
## πŸ”§ API Reference
### `POST /reset`
Start a new episode.
```json
{ "task": "single_conflict" }
```
Returns: `ConflictObservation`
### `POST /step`
Submit a conflict resolution.
```json
{ "resolved_content": "# full resolved file..." }
```
Returns: `{ observation, reward, done, info }`
### `GET /state`
Returns current episode state (step, total_reward, history).
### `GET /health`
Returns `{ "status": "ok" }` β€” used for HF Space validation.
### `GET /tasks`
Returns `{ "tasks": ["single_conflict", "multi_conflict", "logic_conflict"] }`
---
## πŸ‘₯ Team
**Agent Smith** β€” OpenEnv Hackathon, April 2026
- Ganesh Doosa (Team Lead)
- Gajula Akanksha
- Yashwanth Kumar