RoyAalekh commited on
Commit
909f5b5
·
1 Parent(s): c92a716

docs: Add comprehensive enhancement plan for bug fixes

Browse files

Based on code analysis, identified critical issues:

Priority 0 (Critical):
- Override state pollution - flags persist across runs
- Ripeness defaults to RIPE - optimistic bias risks scheduling unready cases
- Override auditability - in-place mutations lose rejection tracking

Priority 1 (High):
- Re-enable case inflow for realistic long-term simulations
- Configurable ripeness re-evaluation frequency
- Comprehensive test coverage

Priority 2 (Medium):
- Judge availability blocking
- Per-case gap overrides for urgent cases
- Dynamic courtroom capacity

4-week implementation roadmap with clear success criteria.
Addresses state management bugs, ripeness detection weaknesses,
and simulation realism issues.

Files changed (1) hide show
  1. docs/ENHANCEMENT_PLAN.md +233 -0
docs/ENHANCEMENT_PLAN.md ADDED
@@ -0,0 +1,233 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Court Scheduling System - Bug Fixes & Enhancements
2
+
3
+ ## Priority 1: Fix State Management Bugs (P0 - Critical)
4
+
5
+ ### 1.1 Fix Override State Pollution
6
+ **Problem**: Override flags persist across runs, priority overrides don't clear
7
+ **Impact**: Cases keep boosted priority in subsequent schedules
8
+
9
+ **Solution**:
10
+ - Add `clear_overrides()` method to Case class
11
+ - Call after each scheduling day or at simulation reset
12
+ - Store overrides in separate tracking dict instead of mutating case objects
13
+ - Alternative: Use immutable override context passed to scheduler
14
+
15
+ **Files**:
16
+ - scheduler/core/case.py (add clear method)
17
+ - scheduler/control/overrides.py (refactor to non-mutating approach)
18
+ - scheduler/simulation/engine.py (call clear after scheduling)
19
+
20
+ ### 1.2 Preserve Override Auditability
21
+ **Problem**: Invalid overrides removed in-place from input list
22
+ **Impact**: Caller loses original override list, can't audit rejections
23
+
24
+ **Solution**:
25
+ - Validate into separate collections: `valid_overrides`, `rejected_overrides`
26
+ - Return structured result: `OverrideResult(applied, rejected_with_reasons)`
27
+ - Keep original override list immutable
28
+ - Log all rejections with clear error messages
29
+
30
+ **Files**:
31
+ - scheduler/control/overrides.py (refactor apply_overrides)
32
+ - scheduler/core/algorithm.py (update override handling)
33
+
34
+ ### 1.3 Track Override Outcomes Explicitly
35
+ **Problem**: Applied overrides in list, rejected as None in unscheduled
36
+ **Impact**: Hard to distinguish "not selected" from "override rejected"
37
+
38
+ **Solution**:
39
+ - Create `OverrideAudit` dataclass: (override_id, status, reason, timestamp)
40
+ - Return audit log from schedule_day: `result.override_audit`
41
+ - Separate tracking: `cases_not_selected`, `overrides_accepted`, `overrides_rejected`
42
+
43
+ **Files**:
44
+ - scheduler/core/algorithm.py (add audit tracking)
45
+ - scheduler/control/overrides.py (structured audit log)
46
+
47
+ ## Priority 2: Strengthen Ripeness Detection (P0 - Critical)
48
+
49
+ ### 2.1 Require Positive Evidence for RIPE
50
+ **Problem**: Defaults to RIPE when signals ambiguous
51
+ **Impact**: Schedules cases that may not be ready
52
+
53
+ **Solution**:
54
+ - Add `UNKNOWN` status to RipenessStatus enum
55
+ - Require explicit RIPE signals: stage progression, document check, age threshold
56
+ - Default to UNKNOWN (not RIPE) when data insufficient
57
+ - Add confidence score: `ripeness_confidence: float` (0.0-1.0)
58
+
59
+ **Files**:
60
+ - scheduler/core/ripeness.py (add UNKNOWN, confidence scoring)
61
+ - scheduler/simulation/engine.py (filter UNKNOWN cases)
62
+
63
+ ### 2.2 Enrich Ripeness Signals
64
+ **Problem**: Only uses keyword search and basic stage checks
65
+ **Impact**: Misses nuanced bottlenecks
66
+
67
+ **Solution**:
68
+ - Add signals:
69
+ - Filing age relative to case type median
70
+ - Adjournment reason history (recurring "summons pending")
71
+ - Outstanding task list (if available in data)
72
+ - Party/lawyer attendance rate
73
+ - Document submission completeness
74
+ - Multi-signal scoring: weighted combination
75
+ - Configurable thresholds per signal
76
+
77
+ **Files**:
78
+ - scheduler/core/ripeness.py (add signal extraction)
79
+ - scheduler/data/config.py (ripeness thresholds)
80
+
81
+ ### 2.3 Add Learning Feedback Loop
82
+ **Problem**: Static heuristics don't improve
83
+ **Impact**: Classification errors persist
84
+
85
+ **Solution** (Future Enhancement):
86
+ - Track ripeness prediction vs actual outcomes
87
+ - Cases marked RIPE but adjourned → false positive signal
88
+ - Cases marked UNRIPE but later heard successfully → false negative
89
+ - Adjust thresholds based on historical accuracy
90
+ - Log classification performance metrics
91
+
92
+ **Files**:
93
+ - scheduler/monitoring/ripeness_metrics.py (new)
94
+ - scheduler/core/ripeness.py (adaptive thresholds)
95
+
96
+ ## Priority 3: Re-enable Simulation Inflow (P1 - High)
97
+
98
+ ### 3.1 Parameterize Case Filing
99
+ **Problem**: New filings commented out, no caseload growth
100
+ **Impact**: Unrealistic long-term simulations
101
+
102
+ **Solution**:
103
+ - Add `enable_inflow: bool` to CourtSimConfig
104
+ - Add `filing_rate_multiplier: float` (default 1.0 for historical rate)
105
+ - Expose inflow controls in pipeline config
106
+ - Surface inflow metrics in simulation results
107
+
108
+ **Files**:
109
+ - scheduler/simulation/engine.py (uncomment + gate filings)
110
+ - court_scheduler_rl.py (add config parameters)
111
+
112
+ ### 3.2 Make Ripeness Re-evaluation Configurable
113
+ **Problem**: Fixed 7-day re-evaluation may be too infrequent
114
+ **Impact**: Stale classifications drive multiple days
115
+
116
+ **Solution**:
117
+ - Add `ripeness_eval_frequency_days: int` to config (default 7)
118
+ - Consider adaptive frequency: more frequent when backlog high
119
+ - Log ripeness re-evaluation events
120
+
121
+ **Files**:
122
+ - scheduler/simulation/engine.py (configurable frequency)
123
+ - scheduler/data/config.py (add parameter)
124
+
125
+ ## Priority 4: Enhanced Scheduling Constraints (P2 - Medium)
126
+
127
+ ### 4.1 Judge Blocking & Availability
128
+ **Problem**: No per-judge blocked dates
129
+ **Impact**: Schedules hearings when judge unavailable
130
+
131
+ **Solution**:
132
+ - Add `blocked_dates: list[date]` to Judge entity
133
+ - Add `availability_override: dict[date, bool]` for one-time changes
134
+ - Filter eligible courtrooms by judge availability
135
+
136
+ **Files**:
137
+ - scheduler/core/judge.py (add availability fields)
138
+ - scheduler/core/algorithm.py (check availability)
139
+
140
+ ### 4.2 Per-Case Gap Overrides
141
+ **Problem**: Global MIN_GAP_BETWEEN_HEARINGS, no exceptions
142
+ **Impact**: Urgent cases can't be expedited
143
+
144
+ **Solution**:
145
+ - Add `min_gap_override: Optional[int]` to Case
146
+ - Apply in eligibility check: `gap = case.min_gap_override or MIN_GAP`
147
+ - Track override applications in metrics
148
+
149
+ **Files**:
150
+ - scheduler/core/case.py (add field)
151
+ - scheduler/core/algorithm.py (use override in eligibility)
152
+
153
+ ### 4.3 Courtroom Capacity Changes
154
+ **Problem**: Fixed daily capacity, no dynamic adjustments
155
+ **Impact**: Can't model half-days, special sessions
156
+
157
+ **Solution**:
158
+ - Add `capacity_overrides: dict[date, int]` to Courtroom
159
+ - Apply in allocation: check date-specific capacity first
160
+ - Support judge preferences (e.g., "Property cases Mondays")
161
+
162
+ **Files**:
163
+ - scheduler/core/courtroom.py (add override dict)
164
+ - scheduler/simulation/allocator.py (check overrides)
165
+
166
+ ## Priority 5: Testing & Validation (P1 - High)
167
+
168
+ ### 5.1 Unit Tests for Bug Fixes
169
+ **Coverage**:
170
+ - Override state clearing
171
+ - Ripeness UNKNOWN handling
172
+ - Inflow rate calculations
173
+ - Constraint validation
174
+
175
+ **Files**:
176
+ - tests/test_overrides.py (new)
177
+ - tests/test_ripeness.py (expand)
178
+ - tests/test_simulation.py (inflow tests)
179
+
180
+ ### 5.2 Integration Tests
181
+ **Scenarios**:
182
+ - Full pipeline with overrides applied
183
+ - Ripeness transitions over time
184
+ - Blocked judge dates respected
185
+ - Capacity overrides honored
186
+
187
+ **Files**:
188
+ - tests/integration/test_scheduling_pipeline.py (new)
189
+
190
+ ## Implementation Order
191
+
192
+ 1. **Week 1**: Fix state bugs (1.1, 1.2, 1.3) + tests
193
+ 2. **Week 2**: Strengthen ripeness (2.1, 2.2) + re-enable inflow (3.1, 3.2)
194
+ 3. **Week 3**: Enhanced constraints (4.1, 4.2, 4.3)
195
+ 4. **Week 4**: Comprehensive testing + ripeness learning feedback (2.3)
196
+
197
+ ## Success Criteria
198
+
199
+ **Bug Fixes**:
200
+ - Override state doesn't leak between runs
201
+ - All override decisions auditable
202
+ - Rejected overrides tracked with reasons
203
+
204
+ **Ripeness**:
205
+ - UNKNOWN status used when confidence low
206
+ - False positive rate < 15% (marked RIPE but adjourned)
207
+ - Multi-signal scoring operational
208
+
209
+ **Simulation Realism**:
210
+ - Inflow configurable and metrics tracked
211
+ - Long runs show realistic caseload patterns
212
+ - Ripeness re-evaluation frequency tunable
213
+
214
+ **Constraints**:
215
+ - Judge blocked dates respected 100%
216
+ - Per-case gap overrides functional
217
+ - Capacity changes applied correctly
218
+
219
+ **Quality**:
220
+ - 90%+ test coverage for bug fixes
221
+ - Integration tests pass
222
+ - All edge cases documented
223
+
224
+ ## Background
225
+
226
+ This plan addresses critical bugs and architectural improvements identified through code analysis:
227
+
228
+ 1. **State Management**: Override flags persist across runs, causing silent bias
229
+ 2. **Ripeness Defaults**: System defaults to RIPE when uncertain, risking premature scheduling
230
+ 3. **Closed Simulation**: No case inflow, making long-term runs unrealistic
231
+ 4. **Limited Auditability**: In-place mutations make debugging and QA difficult
232
+
233
+ See commit history for OutputManager refactoring and Windows compatibility fixes already completed.