Siteshcodes commited on
Commit
e44a740
Β·
1 Parent(s): 6a0c34c

fix: remove all 0.0/1.0 references, update reward ranges throughout

Browse files
Files changed (1) hide show
  1. README.md +22 -7
README.md CHANGED
@@ -34,7 +34,7 @@ Every software team triages dozens of bug reports weekly. Getting prioritization
34
  | `bug_report` | BugReport | Title, body, author, comments |
35
  | `task_id` | string | Current difficulty: easy / medium / hard |
36
  | `score` | float | Cumulative score this episode |
37
- | `reward` | float | Reward from last action (0.0–1.0) |
38
  | `feedback` | string | Human-readable grader feedback |
39
  | `done` | bool | Episode complete flag |
40
 
@@ -42,25 +42,29 @@ Every software team triages dozens of bug reports weekly. Getting prioritization
42
 
43
  ### Task 1 β€” Easy (Priority labeling)
44
  Agent assigns a single P0–P3 priority to a bug report.
45
- - Grader: exact match = 1.0, one level off = 0.5, else 0.0
46
  - Grader weight: priority 100%
 
47
 
48
  ### Task 2 β€” Medium (Priority + labels + team)
49
  Agent assigns priority, category labels, and team routing.
50
  - Grader: priority 45% + label Jaccard similarity 40% + team routing 15%
 
51
 
52
  ### Task 3 β€” Hard (Full triage)
53
  Agent must assign priority, labels, team, and milestone. Security escalation failures are penalized.
54
  - Grader: priority 35% + labels 30% + team 20% + milestone 15%
55
  - Penalty: βˆ’0.15 for missing security escalation
 
56
 
57
  ## Reward function
58
 
59
  Rewards are provided at every step (not just end of episode):
60
- - Partial credit for close-but-not-exact priority (0.5 vs 0.0 vs 1.0)
61
  - Label overlap via Jaccard similarity (continuous signal)
62
  - Team routing accuracy (binary, but weighted)
63
  - Security escalation penalty discourages ignoring critical signals
 
64
 
65
  ## Setup
66
 
@@ -103,13 +107,24 @@ Evaluated with `meta-llama/Llama-3.3-70B-Instruct` via HuggingFace router (tempe
103
 
104
  | Task | Score |
105
  |------------|-------|
106
- | Easy | 1.000 |
107
  | Medium | 0.500 |
108
- | Hard | 1.000 |
109
- | **Avg** | **0.833** |
110
 
111
  Scores vary per run due to random bug sampling from a pool of 5 bugs per task.
112
 
 
 
 
 
 
 
 
 
 
 
 
113
  ## Project structure
114
 
115
  ```
@@ -119,7 +134,7 @@ bug-triage-env/
119
  β”‚ β”œβ”€β”€ environment.py # BugTriageEnvironment core logic
120
  β”‚ β”œβ”€β”€ task.py # Bug reports + graders
121
  β”‚ └── requirements.txt
122
- β”œβ”€β”€ model.py # Dataclass models
123
  β”œβ”€β”€ client.py # HTTP client
124
  β”œβ”€β”€ baseline.py # Groq development script
125
  β”œβ”€β”€ inference.py # OpenAI client submission script
 
34
  | `bug_report` | BugReport | Title, body, author, comments |
35
  | `task_id` | string | Current difficulty: easy / medium / hard |
36
  | `score` | float | Cumulative score this episode |
37
+ | `reward` | float | Reward from last action (0.05–0.95) |
38
  | `feedback` | string | Human-readable grader feedback |
39
  | `done` | bool | Episode complete flag |
40
 
 
42
 
43
  ### Task 1 β€” Easy (Priority labeling)
44
  Agent assigns a single P0–P3 priority to a bug report.
45
+ - Grader: exact match = 0.95, one level off = 0.5, else 0.05
46
  - Grader weight: priority 100%
47
+ - Reward range: 0.05–0.95
48
 
49
  ### Task 2 β€” Medium (Priority + labels + team)
50
  Agent assigns priority, category labels, and team routing.
51
  - Grader: priority 45% + label Jaccard similarity 40% + team routing 15%
52
+ - Reward range: 0.05–0.95
53
 
54
  ### Task 3 β€” Hard (Full triage)
55
  Agent must assign priority, labels, team, and milestone. Security escalation failures are penalized.
56
  - Grader: priority 35% + labels 30% + team 20% + milestone 15%
57
  - Penalty: βˆ’0.15 for missing security escalation
58
+ - Reward range: 0.05–0.95
59
 
60
  ## Reward function
61
 
62
  Rewards are provided at every step (not just end of episode):
63
+ - Partial credit for close-but-not-exact priority (0.5 vs 0.05 vs 0.95)
64
  - Label overlap via Jaccard similarity (continuous signal)
65
  - Team routing accuracy (binary, but weighted)
66
  - Security escalation penalty discourages ignoring critical signals
67
+ - All scores clamped strictly to (0.05, 0.95)
68
 
69
  ## Setup
70
 
 
107
 
108
  | Task | Score |
109
  |------------|-------|
110
+ | Easy | 0.950 |
111
  | Medium | 0.500 |
112
+ | Hard | 0.850 |
113
+ | **Avg** | **0.767** |
114
 
115
  Scores vary per run due to random bug sampling from a pool of 5 bugs per task.
116
 
117
+ ## API Endpoints
118
+
119
+ | Method | Endpoint | Description |
120
+ |--------|------------------|------------------------------------|
121
+ | GET | `/` | Health check |
122
+ | POST | `/reset` | Start new episode for a task |
123
+ | POST | `/step` | Submit triage action |
124
+ | GET | `/state` | Get current episode state |
125
+ | GET | `/tasks` | List all tasks with grader info |
126
+ | GET | `/tasks/{id}` | Get specific task metadata |
127
+
128
  ## Project structure
129
 
130
  ```
 
134
  β”‚ β”œβ”€β”€ environment.py # BugTriageEnvironment core logic
135
  β”‚ β”œβ”€β”€ task.py # Bug reports + graders
136
  β”‚ └── requirements.txt
137
+ β”œβ”€β”€ model.py # Pydantic models
138
  β”œβ”€β”€ client.py # HTTP client
139
  β”œβ”€β”€ baseline.py # Groq development script
140
  β”œβ”€β”€ inference.py # OpenAI client submission script