Spaces:
Sleeping
Sleeping
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -39,22 +39,72 @@ Bug reports in the wild are often poorly written — missing steps, ambiguous de
|
|
| 39 |
| `GET` | `/health` | Health check |
|
| 40 |
| `GET` | `/docs` | Interactive API documentation |
|
| 41 |
|
| 42 |
-
## Action
|
| 43 |
|
| 44 |
-
The agent submits a structured bug report as JSON:
|
| 45 |
|
| 46 |
```json
|
| 47 |
{
|
| 48 |
-
"
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
|
|
|
|
|
|
| 55 |
}
|
| 56 |
```
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
## Scoring
|
| 59 |
|
| 60 |
Reports are graded on 7 dimensions (each 0.0–1.0):
|
|
|
|
| 39 |
| `GET` | `/health` | Health check |
|
| 40 |
| `GET` | `/docs` | Interactive API documentation |
|
| 41 |
|
| 42 |
+
## Action Space
|
| 43 |
|
| 44 |
+
The agent submits a structured bug report as a JSON object via `POST /step`:
|
| 45 |
|
| 46 |
```json
|
| 47 |
{
|
| 48 |
+
"action": {
|
| 49 |
+
"title": "Clear, concise bug title",
|
| 50 |
+
"steps_to_reproduce": "1. Step one\n2. Step two\n...",
|
| 51 |
+
"expected_behavior": "What should happen",
|
| 52 |
+
"actual_behavior": "What actually happens",
|
| 53 |
+
"severity": "low|medium|high|critical",
|
| 54 |
+
"environment": "OS, browser, version info",
|
| 55 |
+
"additional_notes": "Any other relevant details"
|
| 56 |
+
}
|
| 57 |
}
|
| 58 |
```
|
| 59 |
|
| 60 |
+
| Field | Type | Description |
|
| 61 |
+
|-------|------|-------------|
|
| 62 |
+
| `title` | string | Clear, concise summary of the bug |
|
| 63 |
+
| `steps_to_reproduce` | string | Numbered step-by-step reproduction instructions |
|
| 64 |
+
| `expected_behavior` | string | What the correct behavior should be |
|
| 65 |
+
| `actual_behavior` | string | What actually happens (the bug) |
|
| 66 |
+
| `severity` | string | One of: `low`, `medium`, `high`, `critical` |
|
| 67 |
+
| `environment` | string | OS, browser, version, platform details |
|
| 68 |
+
| `additional_notes` | string | Any other relevant information |
|
| 69 |
+
|
| 70 |
+
## Observation Space
|
| 71 |
+
|
| 72 |
+
After each `reset()` or `step()`, the environment returns an observation:
|
| 73 |
+
|
| 74 |
+
```json
|
| 75 |
+
{
|
| 76 |
+
"raw_report": "The messy, unstructured bug report text...",
|
| 77 |
+
"feedback": "Grading feedback explaining the score",
|
| 78 |
+
"score": 0.85,
|
| 79 |
+
"field_scores": {
|
| 80 |
+
"title": 1.0,
|
| 81 |
+
"steps_to_reproduce": 0.75,
|
| 82 |
+
"expected_behavior": 0.5,
|
| 83 |
+
"actual_behavior": 0.8,
|
| 84 |
+
"severity": 1.0,
|
| 85 |
+
"environment": 1.0,
|
| 86 |
+
"format": 0.83
|
| 87 |
+
},
|
| 88 |
+
"done": false,
|
| 89 |
+
"reward": 0.85,
|
| 90 |
+
"step_count": 1,
|
| 91 |
+
"task_id": "easy",
|
| 92 |
+
"max_steps": 3
|
| 93 |
+
}
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
| Field | Type | Description |
|
| 97 |
+
|-------|------|-------------|
|
| 98 |
+
| `raw_report` | string | The original messy bug report to structure |
|
| 99 |
+
| `feedback` | string | Human-readable grading feedback |
|
| 100 |
+
| `score` | float | Overall score from 0.0 to 1.0 |
|
| 101 |
+
| `field_scores` | dict | Per-field scores (0.0–1.0 each) |
|
| 102 |
+
| `done` | bool | Whether the episode is complete |
|
| 103 |
+
| `reward` | float | Reward signal for this step |
|
| 104 |
+
| `step_count` | int | Current step number |
|
| 105 |
+
| `task_id` | string | Current task identifier |
|
| 106 |
+
| `max_steps` | int | Maximum steps allowed |
|
| 107 |
+
|
| 108 |
## Scoring
|
| 109 |
|
| 110 |
Reports are graded on 7 dimensions (each 0.0–1.0):
|