Spaces:

doosaganesh
/

openenv-git-conflict-resolver

Sleeping

App Files Files Community

openenv-git-conflict-resolver / README.md

doosaganesh

Upload README.md

8595a08 verified about 2 months ago

preview code

raw

history blame contribute delete

6.99 kB

metadata

title: Git Conflict Resolver
emoji: 🔀
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
  - openenv

🔀 Git Conflict Resolver — OpenEnv Environment

An RL environment where AI agents learn to resolve Git merge conflicts.
Built for the OpenEnv Hackathon (Meta × Hugging Face × Scaler).

🧠 Environment Description & Motivation

Merge conflicts are a daily reality for every software team. Resolving them correctly requires understanding both the syntactic structure of code and the semantic intent of diverging branches — a genuinely hard task for AI agents.

This environment presents agents with real Python files containing <<<<<<< HEAD, =======, and >>>>>>> conflict markers. The agent must produce a clean, fully resolved file — no markers, valid syntax, and correct logic.

Unlike toy environments, this simulates:

Easy conflicts: Accept an obvious incoming change (e.g. updated timeout value)
Medium conflicts: Apply different resolution strategies per conflict block
Hard conflicts: Combine additive changes from both branches (not just pick one)

📐 Action & Observation Space

Observation Space (structured JSON)

Field	Type	Description
`task_name`	string	Current task identifier
`task_description`	string	Natural language instructions for the agent
`filename`	string	Name of the file being resolved
`file_language`	string	Language of the file (`python`, `text`)
`conflicted_content`	string	Full file content with conflict markers
`branch_ours`	string	Name of the HEAD (current) branch
`branch_theirs`	string	Name of the incoming branch
`num_conflicts`	integer	Number of `<<<<<<<` blocks in the file
`last_attempt`	string \| null	Agent's previous resolution (for retry)
`last_error`	string \| null	Grading feedback from last step
`step`	integer	Current step number
`max_steps`	integer	Maximum allowed steps (10)
`done`	boolean	Whether the episode is finished

Action Space

{
  "resolved_content": "<full resolved file as a string>"
}

The agent outputs the complete file content with all conflict markers removed.

📋 Task Descriptions

Task 1: `single_conflict` — Easy

File: config.py
Conflicts: 1 block
Description: A timeout value was changed from 30s to 60s on a feature branch. The agent must accept the incoming change.
Expected difficulty: Any capable LLM should solve this in 1–2 steps.

Task 2: `multi_conflict` — Medium

File: user_service.py
Conflicts: 3 blocks
Description: Authentication was refactored. Each block requires a different resolution: accept new import, keep original constant, accept new function implementation.
Expected difficulty: Requires reading context across blocks.

Task 3: `logic_conflict` — Hard

File: data_pipeline.py
Conflicts: 2 blocks
Description: Both branches added valid, additive features. The agent must combine them — not simply pick one side. Requires understanding code semantics.
Expected difficulty: Frontier models (GPT-4, Qwen-72B) score ~0.5–0.7 without specific tuning.

🏆 Reward Function

The reward is shaped — agents get feedback at every step, not just at the end.

Signal	Value	Trigger
Improvement bonus	`+0.75 × score_delta`	Score improves over previous step
Marker-free bonus	`+0.10`	First time no markers remain
Perfect match bonus	`+0.25`	Score reaches 1.0
Stagnation penalty	`-0.10`	Identical submission as previous step
Step cost	`-0.01 × (step/max_steps)`	Every step

Grading Breakdown (per step)

Component	Score	Criterion
`no_markers`	0.25	No `<<<<<<<`, `=======`, `>>>>>>>` in output
`valid_syntax`	0.25	File parses as valid Python (AST check)
`similarity`	0.25	Fuzzy match ratio vs. expected resolution
`exact_match`	0.25	Character-exact match with expected output

🚀 Setup & Usage

Local Setup

# 1. Install dependencies
pip install -r server/requirements.txt

# 2. Start the server
uvicorn server.main:app --host 0.0.0.0 --port 7860 --app-dir server

# 3. Test the endpoints
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task": "single_conflict"}'

curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"resolved_content": "# your resolved content here"}'

curl http://localhost:7860/state

Docker

# Build
docker build -t git-conflict-resolver .

# Run
docker run -p 7860:7860 git-conflict-resolver

# Verify
curl http://localhost:7860/health

Run Baseline Inference

export HF_TOKEN=your_token_here
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
export ENV_URL=http://localhost:7860

python inference.py

📊 Baseline Scores

Scores obtained using Qwen/Qwen2.5-72B-Instruct via HF Inference API.

Task	Score	Steps	Success
`single_conflict`	1.00	1	✅
`multi_conflict`	0.75	3	❌
`logic_conflict`	0.50	5	❌
Average	0.75	—	—

📁 Project Structure

openenv_hackathon/
├── server/
│   ├── main.py          # FastAPI server — /reset /step /state /health
│   ├── env.py           # Core environment logic (reset/step/state/close)
│   ├── models.py        # Pydantic Observation, Action, Reward models
│   ├── tasks.py         # Task definitions (3 tasks with conflict content)
│   ├── graders.py       # Deterministic graders (marker, AST, similarity, exact)
│   ├── reward.py        # Shaped reward function
│   └── requirements.txt
├── inference.py         # Baseline inference script (root — required)
├── openenv.yaml         # OpenEnv metadata (required for openenv validate)
├── Dockerfile           # Container build (port 7860 for HF Spaces)
└── README.md

🔧 API Reference

`POST /reset`

Start a new episode.

{ "task": "single_conflict" }

Returns: ConflictObservation

`POST /step`

Submit a conflict resolution.

{ "resolved_content": "# full resolved file..." }

Returns: { observation, reward, done, info }

`GET /state`

Returns current episode state (step, total_reward, history).

`GET /health`

Returns { "status": "ok" } — used for HF Space validation.

`GET /tasks`

Returns { "tasks": ["single_conflict", "multi_conflict", "logic_conflict"] }

👥 Team

Agent Smith — OpenEnv Hackathon, April 2026

Ganesh Doosa (Team Lead)
Gajula Akanksha
Yashwanth Kumar