RoyAalekh commited on
Commit
b512b22
·
1 Parent(s): efc8383

chore: Major cleanup - remove redundant docs and emoticons

Browse files

DELETED REDUNDANT DOCUMENTATION (136KB):
- COMPREHENSIVE_ANALYSIS.md (28KB)
- SYSTEM_WORKFLOW.md (21KB)
- TECHNICAL_IMPLEMENTATION.md (21KB)
- SUBMISSION_SUMMARY.md (15KB)
- RL_EXPLORATION_PLAN.md (15KB)
- Court Scheduling System Implementation Plan.md (14KB)
- DEVELOPMENT.md (10KB)
- PIPELINE.md (8KB)
- reports/codebase_analysis_2024-07-01.md (6KB)

DELETED OLD TEST FILES:
- test_phase1.py (312 lines - superseded by test_enhancements.py)
- test_system.py (7 lines - trivial)

REMOVED EMOTICONS:
- HACKATHON_SUBMISSION.md: Replaced tree characters with ASCII
- docs/CONFIGURATION.md: Removed checkmarks, replaced math symbols with ASCII

KEPT ESSENTIAL DOCS:
- README.md (main entry point)
- HACKATHON_SUBMISSION.md (final submission)
- docs/ENHANCEMENT_PLAN.md (tracks bug fixes)
- docs/CONFIGURATION.md (config reference)
- rl/README.md (module docs)
- test_enhancements.py (comprehensive PR validation)

Result: Clean, professional codebase ready for hackathon submission

COMPREHENSIVE_ANALYSIS.md DELETED
@@ -1,862 +0,0 @@
1
- # Code4Change Court Scheduling Analysis: Comprehensive Codebase Documentation
2
-
3
- **Project**: Karnataka High Court Scheduling Optimization
4
- **Version**: v0.4.0
5
- **Last Updated**: 2025-11-19
6
- **Purpose**: Exploratory Data Analysis and Parameter Extraction for Court Scheduling System
7
-
8
- ---
9
-
10
- ## Table of Contents
11
- 1. [Executive Summary](#executive-summary)
12
- 2. [Project Architecture](#project-architecture)
13
- 3. [Dataset Overview](#dataset-overview)
14
- 4. [Data Processing Pipeline](#data-processing-pipeline)
15
- 5. [Exploratory Data Analysis](#exploratory-data-analysis)
16
- 6. [Parameter Extraction](#parameter-extraction)
17
- 7. [Key Findings and Insights](#key-findings-and-insights)
18
- 8. [Technical Implementation](#technical-implementation)
19
- 9. [Outputs and Artifacts](#outputs-and-artifacts)
20
- 10. [Next Steps for Algorithm Development](#next-steps-for-algorithm-development)
21
-
22
- ---
23
-
24
- ## Executive Summary
25
-
26
- This project provides comprehensive analysis tools for the Code4Change hackathon, focused on developing intelligent court scheduling systems for the Karnataka High Court. The codebase implements a complete EDA pipeline that processes 20+ years of court data to extract scheduling parameters, identify patterns, and generate insights for algorithm development.
27
-
28
- ### Key Statistics
29
- - **Cases Analyzed**: 134,699 unique civil cases
30
- - **Hearings Tracked**: 739,670 individual hearings
31
- - **Time Period**: 2000-2025 (disposed cases only)
32
- - **Case Types**: 8 civil case categories (RSA, CRP, RFA, CA, CCC, CP, MISC.CVL, CMP)
33
- - **Data Quality**: High (minimal lifecycle inconsistencies)
34
-
35
- ### Primary Deliverables
36
- 1. **Interactive HTML Visualizations** (15+ plots covering all dimensions)
37
- 2. **Parameter Extraction** (stage transitions, court capacity, adjournment rates)
38
- 3. **Case Features Dataset** with readiness scores and alert flags
39
- 4. **Seasonality and Anomaly Detection** for resource planning
40
-
41
- ---
42
-
43
- ## Project Architecture
44
-
45
- ### Technology Stack
46
- - **Data Processing**: Polars (for performance), Pandas (for visualization)
47
- - **Visualization**: Plotly (interactive HTML outputs)
48
- - **Scientific Computing**: NumPy, SciPy, Scikit-learn
49
- - **Graph Analysis**: NetworkX
50
- - **Optimization**: OR-Tools
51
- - **Data Validation**: Pydantic
52
- - **CLI**: Typer
53
-
54
- ### Directory Structure
55
- ```
56
- code4change-analysis/
57
- ├── Data/ # Raw CSV inputs
58
- │ ├── ISDMHack_Cases_WPfinal.csv
59
- │ └── ISDMHack_Hear.csv
60
- ├── src/ # Analysis modules
61
- │ ├── eda_config.py # Configuration and paths
62
- │ ├── eda_load_clean.py # Data loading and cleaning
63
- │ ├── eda_exploration.py # Visual EDA
64
- │ └── eda_parameters.py # Parameter extraction
65
- ├── reports/ # Generated outputs
66
- │ └── figures/
67
- │ └── v0.4.0_TIMESTAMP/ # Versioned outputs
68
- │ ├── *.html # Interactive visualizations
69
- │ ├── *.parquet # Cleaned data
70
- │ ├── *.csv # Summary tables
71
- │ └── params/ # Extracted parameters
72
- ├── literature/ # Problem statements and references
73
- ├── main.py # Pipeline orchestrator
74
- ├── pyproject.toml # Dependencies and metadata
75
- └── README.md # User documentation
76
- ```
77
-
78
- ### Execution Flow
79
- ```
80
- main.py
81
- ├─> Step 1: run_load_and_clean()
82
- │ ├─ Load raw CSVs
83
- │ ├─ Normalize text fields
84
- │ ├─ Compute hearing gaps
85
- │ ├─ Deduplicate and validate
86
- │ └─ Save to Parquet
87
-
88
- ├─> Step 2: run_exploration()
89
- │ ├─ Generate 15+ interactive visualizations
90
- │ ├─ Analyze temporal patterns
91
- │ ├─ Compute stage transitions
92
- │ └─ Detect anomalies
93
-
94
- └─> Step 3: run_parameter_export()
95
- ├─ Extract stage transition probabilities
96
- ├─ Compute court capacity metrics
97
- ├─ Identify adjournment proxies
98
- ├─ Calculate readiness scores
99
- └─ Generate case features dataset
100
- ```
101
-
102
- ---
103
-
104
- ## Dataset Overview
105
-
106
- ### Cases Dataset (ISDMHack_Cases_WPfinal.csv)
107
- **Shape**: 134,699 rows × 24 columns
108
- **Primary Key**: CNR_NUMBER (unique case identifier)
109
-
110
- #### Key Attributes
111
- | Column | Type | Description | Notes |
112
- |--------|------|-------------|-------|
113
- | CNR_NUMBER | String | Unique case identifier | Primary key |
114
- | CASE_TYPE | Categorical | Type of case (RSA, CRP, etc.) | 8 unique values |
115
- | DATE_FILED | Date | Case filing date | Range: 2000-2025 |
116
- | DECISION_DATE | Date | Case disposal date | Only disposed cases |
117
- | DISPOSALTIME_ADJ | Integer | Disposal duration (days) | Adjusted for consistency |
118
- | COURT_NUMBER | Integer | Courtroom identifier | Resource allocation |
119
- | CURRENT_STATUS | Categorical | Case status | All "Disposed" |
120
- | NATURE_OF_DISPOSAL | String | Disposal type/outcome | Varied outcomes |
121
-
122
- #### Derived Attributes (Computed in Pipeline)
123
- - **YEAR_FILED**: Extracted from DATE_FILED
124
- - **YEAR_DECISION**: Extracted from DECISION_DATE
125
- - **N_HEARINGS**: Count of hearings per case
126
- - **GAP_MEAN/MEDIAN/STD**: Hearing gap statistics
127
- - **GAP_P25/GAP_P75**: Quartile values for gaps
128
-
129
- ### Hearings Dataset (ISDMHack_Hear.csv)
130
- **Shape**: 739,670 rows × 31 columns
131
- **Primary Key**: Hearing_ID
132
- **Foreign Key**: CNR_NUMBER (links to Cases)
133
-
134
- #### Key Attributes
135
- | Column | Type | Description | Notes |
136
- |--------|------|-------------|-------|
137
- | Hearing_ID | String | Unique hearing identifier | Primary key |
138
- | CNR_NUMBER | String | Links to case | Foreign key |
139
- | BusinessOnDate | Date | Hearing date | Core temporal attribute |
140
- | Remappedstages | Categorical | Hearing stage | 11 standardized stages |
141
- | PurposeofHearing | Text | Purpose description | Used for classification |
142
- | BeforeHonourableJudge | String | Judge name(s) | May be multi-judge bench |
143
- | CourtName | String | Courtroom identifier | Resource tracking |
144
- | PreviousHearing | Date | Prior hearing date | For gap computation |
145
-
146
- #### Stage Taxonomy (Remappedstages)
147
- 1. **PRE-ADMISSION**: Initial procedural stage
148
- 2. **ADMISSION**: Formal admission of case
149
- 3. **FRAMING OF CHARGES**: Charge formulation (rare)
150
- 4. **EVIDENCE**: Evidence presentation
151
- 5. **ARGUMENTS**: Legal arguments phase
152
- 6. **INTERLOCUTORY APPLICATION**: Interim relief requests
153
- 7. **SETTLEMENT**: Settlement negotiations
154
- 8. **ORDERS / JUDGMENT**: Final orders or judgments
155
- 9. **FINAL DISPOSAL**: Case closure
156
- 10. **OTHER**: Miscellaneous hearings
157
- 11. **NA**: Missing or unknown stage
158
-
159
- ---
160
-
161
- ## Data Processing Pipeline
162
-
163
- ### Module 1: Load and Clean (eda_load_clean.py)
164
-
165
- #### Responsibilities
166
- 1. **Robust CSV Loading** with null token handling
167
- 2. **Text Normalization** (uppercase, strip, null standardization)
168
- 3. **Date Parsing** with multiple format support
169
- 4. **Deduplication** on primary keys
170
- 5. **Hearing Gap Computation** (mean, median, std, p25, p75)
171
- 6. **Lifecycle Validation** (hearings within case timeline)
172
-
173
- #### Data Quality Checks
174
- - **Null Summary**: Reports missing values per column
175
- - **Duplicate Detection**: Removes duplicate CNR_NUMBER and Hearing_ID
176
- - **Temporal Consistency**: Flags hearings before filing or after decision
177
- - **Type Validation**: Ensures proper data types for all columns
178
-
179
- #### Key Transformations
180
-
181
- **Stage Canonicalization**:
182
- ```python
183
- STAGE_MAP = {
184
- "ORDERS/JUDGMENTS": "ORDERS / JUDGMENT",
185
- "ORDER/JUDGMENT": "ORDERS / JUDGMENT",
186
- "ORDERS / JUDGMENT": "ORDERS / JUDGMENT",
187
- # ... additional mappings
188
- }
189
- ```
190
-
191
- **Hearing Gap Computation**:
192
- - Computed as (Current Hearing Date - Previous Hearing Date) per case
193
- - Statistics: mean, median, std, p25, p75, count
194
- - Handles first hearing (gap = null) appropriately
195
-
196
- **Outputs**:
197
- - `cases_clean.parquet`: 134,699 × 33 columns
198
- - `hearings_clean.parquet`: 739,669 × 31 columns
199
- - `metadata.json`: Shape, columns, timestamp information
200
-
201
- ---
202
-
203
- ## Exploratory Data Analysis
204
-
205
- ### Module 2: Visual EDA (eda_exploration.py)
206
-
207
- This module generates 15+ interactive HTML visualizations covering all analytical dimensions.
208
-
209
- ### Visualization Catalog
210
-
211
- #### 1. Case Type Distribution
212
- **File**: `1_case_type_distribution.html`
213
- **Type**: Bar chart
214
- **Insights**:
215
- - CRP (27,132 cases) - Civil Revision Petitions
216
- - CA (26,953 cases) - Civil Appeals
217
- - RSA (26,428 cases) - Regular Second Appeals
218
- - RFA (22,461 cases) - Regular First Appeals
219
- - Distribution is relatively balanced across major types
220
-
221
- #### 2. Filing Trends Over Time
222
- **File**: `2_cases_filed_by_year.html`
223
- **Type**: Line chart with range slider
224
- **Insights**:
225
- - Steady growth from 2000-2010
226
- - Peak filing years: 2011-2015
227
- - Recent stabilization (2016-2025)
228
- - Useful for capacity planning
229
-
230
- #### 3. Disposal Time Distribution
231
- **File**: `3_disposal_time_distribution.html`
232
- **Type**: Histogram (50 bins)
233
- **Insights**:
234
- - Heavy right-skew (long tail of delayed cases)
235
- - Median disposal: ~139-903 days depending on case type
236
- - 90th percentile: 298-2806 days (varies dramatically)
237
-
238
- #### 4. Hearings vs Disposal Time
239
- **File**: `4_hearings_vs_disposal.html`
240
- **Type**: Scatter plot (colored by case type)
241
- **Correlation**: 0.718 (Spearman)
242
- **Insights**:
243
- - Strong positive correlation between hearing count and disposal time
244
- - Non-linear relationship (diminishing returns)
245
- - Case type influences both dimensions
246
-
247
- #### 5. Disposal Time by Case Type
248
- **File**: `5_box_disposal_by_type.html`
249
- **Type**: Box plot
250
- **Insights**:
251
- ```
252
- Case Type | Median Days | P90 Days
253
- ----------|-------------|----------
254
- CCC | 93 | 298
255
- CP | 96 | 541
256
- CA | 117 | 588
257
- CRP | 139 | 867
258
- CMP | 252 | 861
259
- RSA | 695.5 | 2,313
260
- RFA | 903 | 2,806
261
- ```
262
- - RSA and RFA cases take significantly longer
263
- - CCC and CP are fastest to resolve
264
-
265
- #### 6. Stage Frequency Analysis
266
- **File**: `6_stage_frequency.html`
267
- **Type**: Bar chart
268
- **Insights**:
269
- - ADMISSION: 427,716 hearings (57.8%)
270
- - ORDERS / JUDGMENT: 159,846 hearings (21.6%)
271
- - NA: 6,981 hearings (0.9%)
272
- - Other stages: < 5,000 each
273
- - Most case time spent in ADMISSION phase
274
-
275
- #### 7. Hearing Gap by Case Type
276
- **File**: `9_gap_median_by_type.html`
277
- **Type**: Box plot
278
- **Insights**:
279
- - CA: 0 days median (immediate disposals common)
280
- - CP: 6.75 days median
281
- - CRP: 14 days median
282
- - CCC: 18 days median
283
- - CMP/RFA/RSA: 28-38 days median
284
- - Significant outliers in all categories
285
-
286
- #### 8. Stage Transition Sankey
287
- **File**: `10_stage_transition_sankey.html`
288
- **Type**: Sankey diagram
289
- **Top Transitions**:
290
- 1. ADMISSION → ADMISSION (396,894) - cases remain in admission
291
- 2. ORDERS / JUDGMENT → ORDERS / JUDGMENT (155,819)
292
- 3. ADMISSION → ORDERS / JUDGMENT (20,808) - direct progression
293
- 4. ADMISSION → NA (9,539) - missing data
294
-
295
- #### 9. Monthly Hearing Volume
296
- **File**: `11_monthly_hearings.html`
297
- **Type**: Time series line chart
298
- **Insights**:
299
- - Seasonal pattern: Lower volume in May (summer vacations)
300
- - Higher volume in Feb-Apr and Jul-Nov (peak court periods)
301
- - Steady growth trend from 2000-2020
302
- - Recent stabilization at ~30,000-40,000 hearings/month
303
-
304
- #### 10. Monthly Waterfall with Anomalies
305
- **File**: `11b_monthly_waterfall.html`
306
- **Type**: Waterfall chart with anomaly markers
307
- **Anomalies Detected** (|z-score| ≥ 3):
308
- - COVID-19 impact: March-May 2020 (dramatic drops)
309
- - System transitions: Data collection changes
310
- - Holiday impacts: December/January consistently lower
311
-
312
- #### 11. Court Day Load
313
- **File**: `12b_court_day_load.html`
314
- **Type**: Box plot per courtroom
315
- **Capacity Insights**:
316
- - Median: 151 hearings/courtroom/day
317
- - P90: 252 hearings/courtroom/day
318
- - High variability across courtrooms (resource imbalance)
319
-
320
- #### 12. Stage Bottleneck Impact
321
- **File**: `15_bottleneck_impact.html`
322
- **Type**: Bar chart (Median Days × Run Count)
323
- **Top Bottlenecks**:
324
- 1. **ADMISSION**: Median 75 days × 126,979 runs = massive impact
325
- 2. **ORDERS / JUDGMENT**: Median 224 days × 21,974 runs
326
- 3. **ARGUMENTS**: Median 26 days × 743 runs
327
-
328
- ### Summary Outputs (CSV)
329
- - `transitions.csv`: Stage-to-stage transition counts
330
- - `stage_duration.csv`: Median/mean/p90 duration per stage
331
- - `monthly_hearings.csv`: Time series of hearing volumes
332
- - `monthly_anomalies.csv`: Anomaly detection results with z-scores
333
-
334
- ---
335
-
336
- ## Parameter Extraction
337
-
338
- ### Module 3: Parameters (eda_parameters.py)
339
-
340
- This module extracts scheduling parameters needed for simulation and optimization algorithms.
341
-
342
- ### 1. Stage Transition Probabilities
343
-
344
- **Output**: `stage_transition_probs.csv`
345
-
346
- **Format**:
347
- ```csv
348
- STAGE_FROM,STAGE_TO,N,row_n,p
349
- ADMISSION,ADMISSION,396894,427716,0.9279
350
- ADMISSION,ORDERS / JUDGMENT,20808,427716,0.0486
351
- ```
352
-
353
- **Application**: Markov chain modeling for case progression
354
-
355
- **Key Probabilities**:
356
- - P(ADMISSION → ADMISSION) = 0.928 (cases stay in admission)
357
- - P(ADMISSION → ORDERS/JUDGMENT) = 0.049 (direct progression)
358
- - P(ORDERS/JUDGMENT → ORDERS/JUDGMENT) = 0.975 (iterative judgments)
359
- - P(ARGUMENTS → ARGUMENTS) = 0.782 (multi-hearing arguments)
360
-
361
- ### 2. Stage Transition Entropy
362
-
363
- **Output**: `stage_transition_entropy.csv`
364
-
365
- **Entropy Scores** (predictability metric):
366
- ```
367
- Stage | Entropy
368
- ---------------------------|--------
369
- PRE-ADMISSION | 1.40 (most unpredictable)
370
- FRAMING OF CHARGES | 1.14
371
- SETTLEMENT | 0.90
372
- ADMISSION | 0.31 (very predictable)
373
- ORDERS / JUDGMENT | 0.12 (highly predictable)
374
- NA | 0.00 (terminal state)
375
- ```
376
-
377
- **Interpretation**: Lower entropy = more predictable transitions
378
-
379
- ### 3. Stage Duration Distribution
380
-
381
- **Output**: `stage_duration.csv`
382
-
383
- **Format**:
384
- ```csv
385
- STAGE,RUN_MEDIAN_DAYS,RUN_P90_DAYS,HEARINGS_PER_RUN_MED,N_RUNS
386
- ORDERS / JUDGMENT,224.0,1738.0,4.0,21974
387
- ADMISSION,75.0,889.0,3.0,126979
388
- ```
389
-
390
- **Application**: Duration modeling for scheduling simulation
391
-
392
- ### 4. Court Capacity Metrics
393
-
394
- **Outputs**:
395
- - `court_capacity_stats.csv`: Per-courtroom statistics
396
- - `court_capacity_global.json`: Global aggregates
397
-
398
- **Global Capacity**:
399
- ```json
400
- {
401
- "slots_median_global": 151.0,
402
- "slots_p90_global": 252.0
403
- }
404
- ```
405
-
406
- **Application**: Resource constraint modeling
407
-
408
- ### 5. Adjournment Proxies
409
-
410
- **Output**: `adjournment_proxies.csv`
411
-
412
- **Methodology**:
413
- - Adjournment proxy: Hearing gap > 1.3 × stage median gap
414
- - Not-reached proxy: Purpose text contains "NOT REACHED", "NR", etc.
415
-
416
- **Sample Results**:
417
- ```csv
418
- Stage,CaseType,p_adjourn_proxy,p_not_reached_proxy,n
419
- ADMISSION,RSA,0.423,0.0,139337
420
- ADMISSION,RFA,0.356,0.0,120725
421
- ORDERS / JUDGMENT,RFA,0.448,0.0,90746
422
- ```
423
-
424
- **Application**: Stochastic modeling of hearing outcomes
425
-
426
- ### 6. Case Type Summary
427
-
428
- **Output**: `case_type_summary.csv`
429
-
430
- **Format**:
431
- ```csv
432
- CASE_TYPE,n_cases,disp_median,disp_p90,hear_median,gap_median
433
- RSA,26428,695.5,2313.0,5.0,38.0
434
- RFA,22461,903.0,2806.0,6.0,31.0
435
- ```
436
-
437
- **Application**: Case type-specific parameter tuning
438
-
439
- ### 7. Correlation Analysis
440
-
441
- **Output**: `correlations_spearman.csv`
442
-
443
- **Spearman Correlations**:
444
- ```
445
- | DISPOSALTIME_ADJ | N_HEARINGS | GAP_MEDIAN
446
- -----------------+------------------+------------+-----------
447
- DISPOSALTIME_ADJ | 1.000 | 0.718 | 0.594
448
- N_HEARINGS | 0.718 | 1.000 | 0.502
449
- GAP_MEDIAN | 0.594 | 0.502 | 1.000
450
- ```
451
-
452
- **Interpretation**: All metrics are positively correlated, confirming scheduling complexity compounds
453
-
454
- ### 8. Case Features with Readiness Scores
455
-
456
- **Output**: `cases_features.csv` (134,699 × 14 columns)
457
-
458
- **Readiness Score Formula**:
459
- ```python
460
- READINESS_SCORE =
461
- (N_HEARINGS_CAPPED / 50) × 0.4 + # Hearing progress
462
- (100 / GAP_MEDIAN_CLAMPED) × 0.3 + # Momentum
463
- (LAST_STAGE in [ARGUMENTS, EVIDENCE, ORDERS]) × 0.3 # Stage advancement
464
- ```
465
-
466
- **Range**: [0, 1] (higher = more ready for final hearing)
467
-
468
- **Alert Flags**:
469
- - `ALERT_P90_TYPE`: Disposal time > 90th percentile within case type
470
- - `ALERT_HEARING_HEAVY`: Hearing count > 90th percentile within case type
471
- - `ALERT_LONG_GAP`: Gap > 90th percentile within case type
472
-
473
- **Application**: Priority queue construction, urgency detection
474
-
475
- ### 9. Age Funnel Analysis
476
-
477
- **Output**: `age_funnel.csv`
478
-
479
- **Distribution**:
480
- ```
481
- Age Bucket | Count | Percentage
482
- -----------|---------|------------
483
- <1y | 83,887 | 62.3%
484
- 1-3y | 29,418 | 21.8%
485
- 3-5y | 10,290 | 7.6%
486
- >5y | 11,104 | 8.2%
487
- ```
488
-
489
- **Application**: Backlog management, aging case prioritization
490
-
491
- ---
492
-
493
- ## Key Findings and Insights
494
-
495
- ### 1. Case Lifecycle Patterns
496
-
497
- **Average Journey**:
498
- 1. **Filing → Admission**: ~2-3 hearings, ~75 days median
499
- 2. **Admission (holding pattern)**: Multiple hearings, 92.8% stay in admission
500
- 3. **Arguments (if reached)**: ~3 hearings, ~26 days median
501
- 4. **Orders/Judgment**: ~4 hearings, ~224 days median
502
- 5. **Final Disposal**: Varies by case type (93-903 days median)
503
-
504
- **Key Observation**: Most cases spend disproportionate time in ADMISSION stage
505
-
506
- ### 2. Case Type Complexity
507
-
508
- **Fast Track** (< 150 days median):
509
- - CCC (93 days) - Ordinary civil cases
510
- - CP (96 days) - Civil petitions
511
- - CA (117 days) - Civil appeals
512
- - CRP (139 days) - Civil revision petitions
513
-
514
- **Extended Process** (> 600 days median):
515
- - RSA (695.5 days) - Second appeals
516
- - RFA (903 days) - First appeals
517
-
518
- **Implication**: Scheduling algorithms must differentiate by case type
519
-
520
- ### 3. Scheduling Bottlenecks
521
-
522
- **Primary Bottleneck**: ADMISSION stage
523
- - 57.8% of all hearings
524
- - Median duration: 75 days per run
525
- - 126,979 separate runs
526
- - High self-loop probability (0.928)
527
-
528
- **Secondary Bottleneck**: ORDERS / JUDGMENT stage
529
- - 21.6% of all hearings
530
- - Median duration: 224 days per run
531
- - Complex cases accumulate here
532
-
533
- **Tertiary**: Judge assignment constraints
534
- - High variance in per-judge workload
535
- - Some judges handle 2-3× median load
536
-
537
- ### 4. Temporal Patterns
538
-
539
- **Seasonality**:
540
- - **Low Volume**: May (summer vacations), December-January (holidays)
541
- - **High Volume**: February-April, July-November
542
- - **Anomalies**: COVID-19 (March-May 2020), system transitions
543
-
544
- **Implications**:
545
- - Capacity planning must account for 40-60% seasonal variance
546
- - Vacation schedules create predictable bottlenecks
547
-
548
- ### 5. Judge and Court Utilization
549
-
550
- **Capacity Metrics**:
551
- - Median courtroom load: 151 hearings/day
552
- - P90 courtroom load: 252 hearings/day
553
- - High variance suggests resource imbalance
554
-
555
- **Multi-Judge Benches**:
556
- - Present in dataset (BeforeHonourableJudgeTwo, etc.)
557
- - Adds scheduling complexity
558
-
559
- ### 6. Adjournment Patterns
560
-
561
- **High Adjournment Stages**:
562
- - ORDERS / JUDGMENT: 40-45% adjournment rate
563
- - ADMISSION (RSA cases): 42% adjournment rate
564
- - ADMISSION (RFA cases): 36% adjournment rate
565
-
566
- **Implication**: Stochastic models need adjournment probability by stage × case type
567
-
568
- ### 7. Data Quality Insights
569
-
570
- **Strengths**:
571
- - Comprehensive coverage (20+ years)
572
- - Minimal missing data in key fields
573
- - Strong referential integrity (CNR_NUMBER links)
574
-
575
- **Limitations**:
576
- - Judge names not standardized (typos, variations)
577
- - Purpose text is free-form (NLP required)
578
- - Some stages have sparse data (EVIDENCE, SETTLEMENT)
579
- - "NA" stage used for missing data (0.9% of hearings)
580
-
581
- ---
582
-
583
- ## Technical Implementation
584
-
585
- ### Design Decisions
586
-
587
- #### 1. Polars for Data Processing
588
- **Rationale**: 10-100× faster than Pandas for large datasets
589
- **Usage**: All ETL and aggregation operations
590
- **Trade-off**: Convert to Pandas only for Plotly visualization
591
-
592
- #### 2. Parquet for Storage
593
- **Rationale**: Columnar format, compressed, schema-preserving
594
- **Benefit**: 10-20× faster I/O vs CSV, type safety
595
- **Size**: cases_clean.parquet (~5MB), hearings_clean.parquet (~37MB)
596
-
597
- #### 3. Versioned Outputs
598
- **Pattern**: `reports/figures/v{VERSION}_{TIMESTAMP}/`
599
- **Benefit**: Reproducibility, comparison across runs
600
- **Storage**: ~100MB per run (HTML files are large)
601
-
602
- #### 4. Interactive HTML Visualizations
603
- **Rationale**: Self-contained, shareable, no server required
604
- **Library**: Plotly (browser-based interaction)
605
- **Trade-off**: Large file sizes (4-10MB per plot)
606
-
607
- ### Code Quality Patterns
608
-
609
- #### Type Hints and Validation
610
- ```python
611
- def load_raw() -> tuple[pl.DataFrame, pl.DataFrame]:
612
- """Load raw data with Polars."""
613
- cases = pl.read_csv(
614
- CASES_FILE,
615
- try_parse_dates=True,
616
- null_values=NULL_TOKENS,
617
- infer_schema_length=100_000,
618
- )
619
- return cases, hearings
620
- ```
621
-
622
- #### Null Handling
623
- ```python
624
- NULL_TOKENS = ["", "NULL", "Null", "null", "NA", "N/A", "na", "NaN", "nan", "-", "--"]
625
- ```
626
-
627
- #### Stage Canonicalization
628
- ```python
629
- STAGE_MAP = {
630
- "ORDERS/JUDGMENTS": "ORDERS / JUDGMENT",
631
- "INTERLOCUTARY APPLICATION": "INTERLOCUTORY APPLICATION",
632
- }
633
- ```
634
-
635
- #### Error Handling
636
- ```python
637
- try:
638
- fig_sankey = create_sankey(transitions)
639
- fig_sankey.write_html(FIGURES_DIR / "sankey.html")
640
- copy_to_versioned("sankey.html")
641
- except Exception as e:
642
- print(f"Sankey error: {e}")
643
- # Continue pipeline
644
- ```
645
-
646
- ### Performance Characteristics
647
-
648
- **Full Pipeline Runtime** (on typical laptop):
649
- - Step 1 (Load & Clean): ~20 seconds
650
- - Step 2 (Exploration): ~120 seconds (Plotly rendering is slow)
651
- - Step 3 (Parameter Export): ~30 seconds
652
- - **Total**: ~3 minutes
653
-
654
- **Memory Usage**:
655
- - Peak: ~2GB RAM
656
- - Mostly during Plotly figure generation (holds entire plot in memory)
657
-
658
- ---
659
-
660
- ## Outputs and Artifacts
661
-
662
- ### Cleaned Data
663
- | File | Format | Size | Rows | Columns | Purpose |
664
- |------|--------|------|------|---------|---------|
665
- | cases_clean.parquet | Parquet | 5MB | 134,699 | 33 | Clean case data with computed features |
666
- | hearings_clean.parquet | Parquet | 37MB | 739,669 | 31 | Clean hearing data with stage normalization |
667
- | metadata.json | JSON | 2KB | - | - | Dataset schema and statistics |
668
-
669
- ### Visualizations (HTML)
670
- | File | Type | Purpose |
671
- |------|------|---------|
672
- | 1_case_type_distribution.html | Bar | Case type frequency |
673
- | 2_cases_filed_by_year.html | Line | Filing trends |
674
- | 3_disposal_time_distribution.html | Histogram | Disposal duration |
675
- | 4_hearings_vs_disposal.html | Scatter | Correlation analysis |
676
- | 5_box_disposal_by_type.html | Box | Case type comparison |
677
- | 6_stage_frequency.html | Bar | Stage distribution |
678
- | 9_gap_median_by_type.html | Box | Hearing gap analysis |
679
- | 10_stage_transition_sankey.html | Sankey | Transition flows |
680
- | 11_monthly_hearings.html | Line | Volume trends |
681
- | 11b_monthly_waterfall.html | Waterfall | Monthly changes |
682
- | 12b_court_day_load.html | Box | Court capacity |
683
- | 15_bottleneck_impact.html | Bar | Bottleneck ranking |
684
-
685
- ### Parameter Files (CSV/JSON)
686
- | File | Purpose | Application |
687
- |------|---------|-------------|
688
- | stage_transitions.csv | Transition counts | Markov chain construction |
689
- | stage_transition_probs.csv | Probability matrix | Stochastic modeling |
690
- | stage_transition_entropy.csv | Predictability scores | Uncertainty quantification |
691
- | stage_duration.csv | Duration distributions | Time estimation |
692
- | court_capacity_global.json | Capacity limits | Resource constraints |
693
- | court_capacity_stats.csv | Per-court metrics | Load balancing |
694
- | adjournment_proxies.csv | Adjournment rates | Stochastic outcomes |
695
- | case_type_summary.csv | Type-specific stats | Parameter tuning |
696
- | correlations_spearman.csv | Feature correlations | Feature selection |
697
- | cases_features.csv | Enhanced case data | Scheduling input |
698
- | age_funnel.csv | Case age distribution | Priority computation |
699
-
700
- ---
701
-
702
- ## Next Steps for Algorithm Development
703
-
704
- ### 1. Scheduling Algorithm Design
705
-
706
- **Multi-Objective Optimization**:
707
- - **Fairness**: Minimize age variance, equal treatment
708
- - **Efficiency**: Maximize throughput, minimize idle time
709
- - **Urgency**: Prioritize high-readiness cases
710
-
711
- **Suggested Approach**: Graph-based optimization with OR-Tools
712
- ```python
713
- # Pseudo-code
714
- from ortools.sat.python import cp_model
715
-
716
- model = cp_model.CpModel()
717
-
718
- # Decision variables
719
- hearing_slots = {} # (case, date, court) -> binary
720
- judge_assignments = {} # (hearing, judge) -> binary
721
-
722
- # Constraints
723
- for date in dates:
724
- for court in courts:
725
- model.Add(sum(hearing_slots[c, date, court] for c in cases) <= CAPACITY[court])
726
-
727
- # Objective: weighted sum of fairness + efficiency + urgency
728
- model.Maximize(...)
729
- ```
730
-
731
- ### 2. Simulation Framework
732
-
733
- **Discrete Event Simulation** with SimPy:
734
- ```python
735
- import simpy
736
-
737
- def case_lifecycle(env, case_id):
738
- # Admission phase
739
- yield env.timeout(sample_duration("ADMISSION", case.type))
740
-
741
- # Arguments phase (probabilistic)
742
- if random() < transition_prob["ADMISSION", "ARGUMENTS"]:
743
- yield env.timeout(sample_duration("ARGUMENTS", case.type))
744
-
745
- # Adjournment modeling
746
- if random() < adjournment_rate[stage, case.type]:
747
- yield env.timeout(adjournment_delay())
748
-
749
- # Orders/Judgment
750
- yield env.timeout(sample_duration("ORDERS / JUDGMENT", case.type))
751
- ```
752
-
753
- ### 3. Feature Engineering
754
-
755
- **Additional Features to Compute**:
756
- - Case complexity score (parties, acts, sections)
757
- - Judge specialization matching
758
- - Historical disposal rate (judge × case type)
759
- - Network centrality (advocate recurrence)
760
-
761
- ### 4. Machine Learning Integration
762
-
763
- **Potential Models**:
764
- - **XGBoost**: Disposal time prediction
765
- - **LSTM**: Sequence modeling for stage progression
766
- - **Graph Neural Networks**: Relationship modeling (judge-advocate-case)
767
-
768
- **Target Variables**:
769
- - Disposal time (regression)
770
- - Next stage (classification)
771
- - Adjournment probability (binary classification)
772
-
773
- ### 5. Real-Time Dashboard
774
-
775
- **Technology**: Streamlit or Plotly Dash
776
- **Features**:
777
- - Live scheduling queue
778
- - Judge workload visualization
779
- - Bottleneck alerts
780
- - What-if scenario analysis
781
-
782
- ### 6. Validation Metrics
783
-
784
- **Fairness**:
785
- - Gini coefficient of disposal times
786
- - Age variance within case type
787
- - Equal opportunity (demographic analysis if available)
788
-
789
- **Efficiency**:
790
- - Court utilization rate
791
- - Average disposal time
792
- - Throughput (cases/month)
793
-
794
- **Urgency**:
795
- - Readiness score coverage
796
- - High-priority case delay
797
-
798
- ---
799
-
800
- ## Appendix: Key Statistics Reference
801
-
802
- ### Case Type Distribution
803
- ```
804
- CRP: 27,132 (20.1%)
805
- CA: 26,953 (20.0%)
806
- RSA: 26,428 (19.6%)
807
- RFA: 22,461 (16.7%)
808
- CCC: 14,996 (11.1%)
809
- CP: 12,920 (9.6%)
810
- CMP: 3,809 (2.8%)
811
- ```
812
-
813
- ### Disposal Time Percentiles
814
- ```
815
- P50 (median): 215 days
816
- P75: 629 days
817
- P90: 1,460 days
818
- P95: 2,152 days
819
- P99: 3,688 days
820
- ```
821
-
822
- ### Stage Transition Matrix (Top 10)
823
- ```
824
- From | To | Count | Probability
825
- -------------------|--------------------|---------:|------------:
826
- ADMISSION | ADMISSION | 396,894 | 0.928
827
- ORDERS / JUDGMENT | ORDERS / JUDGMENT | 155,819 | 0.975
828
- ADMISSION | ORDERS / JUDGMENT | 20,808 | 0.049
829
- ADMISSION | NA | 9,539 | 0.022
830
- NA | NA | 6,981 | 1.000
831
- ORDERS / JUDGMENT | NA | 3,998 | 0.025
832
- ARGUMENTS | ARGUMENTS | 2,612 | 0.782
833
- ```
834
-
835
- ### Court Capacity
836
- ```
837
- Global Median: 151 hearings/court/day
838
- Global P90: 252 hearings/court/day
839
- ```
840
-
841
- ### Correlations (Spearman)
842
- ```
843
- DISPOSALTIME_ADJ ↔ N_HEARINGS: 0.718
844
- DISPOSALTIME_ADJ ↔ GAP_MEDIAN: 0.594
845
- N_HEARINGS ↔ GAP_MEDIAN: 0.502
846
- ```
847
-
848
- ---
849
-
850
- ## Conclusion
851
-
852
- This codebase provides a comprehensive foundation for building intelligent court scheduling systems. The combination of robust data processing, detailed exploratory analysis, and extracted parameters creates a complete information pipeline from raw data to algorithm-ready inputs.
853
-
854
- The analysis reveals that court scheduling is a complex multi-constraint optimization problem with significant temporal patterns, stage-based dynamics, and case type heterogeneity. The extracted parameters and visualizations provide the necessary building blocks for developing fair, efficient, and urgency-aware scheduling algorithms.
855
-
856
- **Recommended Next Action**: Begin with simulation-based validation of scheduling policies using the extracted parameters, then graduate to optimization-based approaches once baseline performance is established.
857
-
858
- ---
859
-
860
- **Document Version**: 1.0
861
- **Generated**: 2025-11-19
862
- **Maintained By**: Code4Change Analysis Team
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Court Scheduling System Implementation Plan.md DELETED
@@ -1,331 +0,0 @@
1
- # Court Scheduling System Implementation Plan
2
- ## Overview
3
- Build an intelligent judicial scheduling system for Karnataka High Court that optimizes daily cause lists across multiple courtrooms over a 2-year simulation period, balancing fairness, efficiency, and urgency.
4
- ## Architecture Design
5
- ### System Components
6
- 1. **Parameter Loader**: Load EDA-extracted parameters (transition probs, durations, capacities)
7
- 2. **Case Generator**: Synthetic case creation with realistic attributes
8
- 3. **Simulation Engine**: SimPy-based discrete event simulation
9
- 4. **Scheduling Policies**: Multiple algorithms (FIFO, Priority, Optimized)
10
- 5. **Metrics Tracker**: Performance evaluation (fairness, efficiency, urgency)
11
- 6. **Visualization**: Dashboard for monitoring and analysis
12
- ### Technology Stack
13
- * **Simulation**: SimPy (discrete event simulation)
14
- * **Optimization**: OR-Tools (CP-SAT solver)
15
- * **Data Processing**: Polars, Pandas
16
- * **Visualization**: Plotly, Streamlit
17
- * **Testing**: Pytest, Hypothesis
18
- ## Module Structure
19
- ```warp-runnable-command
20
- scheduler/
21
- ├── core/
22
- │ ├── __init__.py
23
- │ ├── case.py # Case entity and lifecycle
24
- │ ├── courtroom.py # Courtroom resource
25
- │ ├── judge.py # Judge entity
26
- │ └── hearing.py # Hearing event
27
- ├── data/
28
- │ ├── __init__.py
29
- │ ├── param_loader.py # Load EDA parameters
30
- │ ├── case_generator.py # Generate synthetic cases
31
- │ └── config.py # Configuration constants
32
- ├── simulation/
33
- │ ├── __init__.py
34
- │ ├── engine.py # SimPy simulation engine
35
- │ ├── scheduler.py # Base scheduler interface
36
- │ ├── policies/
37
- │ │ ├── __init__.py
38
- │ │ ├── fifo.py # FIFO scheduling
39
- │ │ ├── priority.py # Priority-based
40
- │ │ └── optimized.py # OR-Tools optimization
41
- │ └── events.py # Event handlers
42
- ├── optimization/
43
- │ ├── __init__.py
44
- │ ├── model.py # OR-Tools model
45
- │ ├── objectives.py # Multi-objective functions
46
- │ └── constraints.py # Constraint definitions
47
- ├── metrics/
48
- │ ├── __init__.py
49
- │ ├── fairness.py # Gini coefficient, age variance
50
- │ ├── efficiency.py # Utilization, throughput
51
- │ └── urgency.py # Readiness coverage
52
- ├── visualization/
53
- │ ├── __init__.py
54
- │ ├── dashboard.py # Streamlit dashboard
55
- │ └── plots.py # Plotly visualizations
56
- └── utils/
57
- ├── __init__.py
58
- ├── distributions.py # Probability distributions
59
- └── calendar.py # Working days calculator
60
- ```
61
- ## Implementation Phases
62
- ### Phase 1: Foundation (Days 1-2) - COMPLETE
63
- **Goal**: Set up infrastructure and load parameters
64
- **Status**: 100% complete (1,323 lines implemented)
65
- **Tasks**:
66
- 1. [x] Create module directory structure (8 sub-packages)
67
- 2. [x] Implement parameter loader
68
- * Read stage_transition_probs.csv
69
- * Read stage_duration.csv
70
- * Read court_capacity_global.json
71
- * Read adjournment_proxies.csv
72
- * Read cases_features.csv
73
- * Automatic latest version detection
74
- * Lazy loading with caching
75
- 3. [x] Create core entities (Case, Courtroom, Judge, Hearing)
76
- * Case: Lifecycle, readiness score, priority score (218 lines)
77
- * Courtroom: Capacity tracking, scheduling, utilization (228 lines)
78
- * Judge: Workload tracking, specialization, adjournment rate (167 lines)
79
- * Hearing: Outcome tracking, rescheduling support (134 lines)
80
- 4. [x] Implement working days calculator (192 days/year)
81
- * Weekend/holiday detection
82
- * Seasonality factors
83
- * Working days counting (217 lines)
84
- 5. [x] Configuration system with EDA-derived constants (115 lines)
85
- **Outputs**:
86
- * `scheduler/data/param_loader.py` (244 lines)
87
- * `scheduler/data/config.py` (115 lines)
88
- * `scheduler/core/case.py` (218 lines)
89
- * `scheduler/core/courtroom.py` (228 lines)
90
- * `scheduler/core/judge.py` (167 lines)
91
- * `scheduler/core/hearing.py` (134 lines)
92
- * `scheduler/utils/calendar.py` (217 lines)
93
- **Quality**: Type hints 100%, Docstrings 100%, Integration complete
94
- ### Phase 2: Case Generation (Days 3-4)
95
- **Goal**: Generate synthetic case pool for simulation
96
- **Tasks**:
97
- 1. Implement case generator using historical distributions
98
- * Case type distribution (CRP: 20.1%, CA: 20%, etc.)
99
- * Filing rate (monthly inflow from temporal analysis)
100
- * Initial stage assignment
101
- 2. Generate 2-year case pool (~10,000 cases)
102
- 3. Assign readiness scores and attributes
103
- **Outputs**:
104
- * `scheduler/data/case_generator.py`
105
- * Synthetic case dataset for simulation
106
- ### Phase 3: Simulation Engine (Days 5-7)
107
- **Goal**: Build discrete event simulation framework
108
- **Tasks**:
109
- 1. Implement SimPy environment setup
110
- 2. Create courtroom resources (5 courtrooms)
111
- 3. Implement case lifecycle process
112
- * Stage progression using transition probabilities
113
- * Duration sampling from distributions
114
- * Adjournment modeling (stochastic)
115
- 4. Implement daily scheduling loop
116
- 5. Add case inflow/outflow dynamics
117
- **Outputs**:
118
- * `scheduler/simulation/engine.py`
119
- * `scheduler/simulation/events.py`
120
- * Working simulation (baseline)
121
- ### Phase 4: Scheduling Policies (Days 8-10)
122
- **Goal**: Implement multiple scheduling algorithms
123
- **Tasks**:
124
- 1. Base scheduler interface
125
- 2. FIFO scheduler (baseline)
126
- 3. Priority-based scheduler
127
- * Use case age as primary factor
128
- * Use case type as secondary
129
- 4. Readiness-score scheduler
130
- * Use EDA-computed readiness scores
131
- * Apply urgency weights
132
- 5. Compare policies on metrics
133
- **Outputs**:
134
- * `scheduler/simulation/scheduler.py` (interface)
135
- * `scheduler/simulation/policies/` (implementations)
136
- * Performance comparison report
137
- ### Phase 5: Optimization Model (Days 11-14)
138
- **Goal**: Implement OR-Tools-based optimal scheduler
139
- **Tasks**:
140
- 1. Define decision variables
141
- * hearing_slots[case, date, court] ∈ {0,1}
142
- 2. Implement constraints
143
- * Daily capacity per courtroom
144
- * Case can only be in one court per day
145
- * Minimum gap between hearings
146
- * Stage progression requirements
147
- 3. Implement objective functions
148
- * Fairness: Minimize age variance
149
- * Efficiency: Maximize utilization
150
- * Urgency: Prioritize ready cases
151
- 4. Multi-objective optimization (weighted sum)
152
- 5. Solve for 30-day scheduling window (rolling)
153
- **Outputs**:
154
- * `scheduler/optimization/model.py`
155
- * `scheduler/optimization/objectives.py`
156
- * `scheduler/optimization/constraints.py`
157
- * Optimized scheduling policy
158
- ### Phase 6: Metrics & Validation (Days 15-16)
159
- **Goal**: Comprehensive performance evaluation
160
- **Tasks**:
161
- 1. Implement fairness metrics
162
- * Gini coefficient of disposal times
163
- * Age variance within case types
164
- * Max age tracking
165
- 2. Implement efficiency metrics
166
- * Court utilization rate
167
- * Average disposal time
168
- * Throughput (cases/month)
169
- 3. Implement urgency metrics
170
- * Readiness score coverage
171
- * High-priority case delay
172
- 4. Compare all policies
173
- 5. Validate against historical data
174
- **Outputs**:
175
- * `scheduler/metrics/` (all modules)
176
- * Validation report
177
- * Policy comparison matrix
178
- ### Phase 7: Dashboard (Days 17-18)
179
- **Goal**: Interactive visualization and monitoring
180
- **Tasks**:
181
- 1. Streamlit dashboard setup
182
- 2. Real-time queue visualization
183
- 3. Judge workload display
184
- 4. Alert system for long-pending cases
185
- 5. What-if scenario analysis
186
- 6. Export capability (cause lists as PDF/CSV)
187
- **Outputs**:
188
- * `scheduler/visualization/dashboard.py`
189
- * Interactive web interface
190
- * User documentation
191
- ### Phase 8: Polish & Documentation (Days 19-20)
192
- **Goal**: Production-ready system
193
- **Tasks**:
194
- 1. Unit tests (pytest)
195
- 2. Integration tests
196
- 3. Performance benchmarking
197
- 4. Comprehensive documentation
198
- 5. Example notebooks
199
- 6. Deployment guide
200
- **Outputs**:
201
- * Test suite (90%+ coverage)
202
- * Documentation (README, API docs)
203
- * Example usage notebooks
204
- * Final presentation materials
205
- ## Key Design Decisions
206
- ### 1. Hybrid Approach
207
- **Decision**: Use simulation for long-term dynamics, optimization for short-term scheduling
208
- **Rationale**: Simulation captures stochastic nature (adjournments, case progression), optimization finds optimal daily schedules within constraints
209
- ### 2. Rolling Optimization Window
210
- **Decision**: Optimize 30-day windows, re-optimize weekly
211
- **Rationale**: Balance computational cost with scheduling quality, allow for dynamic adjustments
212
- ### 3. Stage-Based Progression Model
213
- **Decision**: Model cases as finite state machines with probabilistic transitions
214
- **Rationale**: Matches our EDA findings (strong stage patterns), enables realistic progression
215
- ### 4. Multi-Objective Weighting
216
- **Decision**: Fairness (40%), Efficiency (30%), Urgency (30%)
217
- **Rationale**: Prioritize fairness slightly, balance with practical concerns
218
- ### 5. Capacity Model
219
- **Decision**: Use median capacity (151 cases/court/day) with seasonal adjustment
220
- **Rationale**: Conservative estimate from EDA, account for vacation periods
221
- ## Parameter Utilization from EDA
222
- | EDA Output | Scheduler Use |
223
- |------------|---------------|
224
- | stage_transition_probs.csv | Case progression probabilities |
225
- | stage_duration.csv | Duration sampling (median, p90) |
226
- | court_capacity_global.json | Daily capacity constraints |
227
- | adjournment_proxies.csv | Hearing outcome probabilities |
228
- | cases_features.csv | Initial readiness scores |
229
- | case_type_summary.csv | Case type distributions |
230
- | monthly_hearings.csv | Seasonal adjustment factors |
231
- | correlations_spearman.csv | Feature importance weights |
232
- ## Assumptions Made Explicit
233
- ### Court Operations
234
- 1. **Working days**: 192 days/year (from Karnataka HC calendar)
235
- 2. **Courtrooms**: 5 courtrooms, each with 1 judge
236
- 3. **Daily capacity**: 151 hearings/court/day (median from EDA)
237
- 4. **Hearing duration**: Not modeled explicitly (capacity is count-based)
238
- 5. **Case queue assignment**: By case type (RSA → Court 1, CRP → Court 2, etc.)
239
- ### Case Dynamics
240
- 1. **Filing rate**: ~6,000 cases/year (derived from historical data)
241
- 2. **Disposal rate**: Matches filing rate (steady-state assumption)
242
- 3. **Stage progression**: Probabilistic (Markov chain from EDA)
243
- 4. **Adjournment rate**: 36-48% depending on stage and case type
244
- 5. **Case readiness**: Computed from hearings, gaps, and stage
245
- ### Scheduling Constraints
246
- 1. **Minimum gap**: 7 days between hearings for same case
247
- 2. **Maximum gap**: 90 days (alert triggered)
248
- 3. **Urgent cases**: 5% of pool marked urgent (jump queue)
249
- 4. **Judge preferences**: Not modeled (future enhancement)
250
- 5. **Multi-judge benches**: Not modeled (all single-judge)
251
- ### Simplifications
252
- 1. **No lawyer availability**: Assumed all advocates always available
253
- 2. **No case dependencies**: Each case independent
254
- 3. **No physical constraints**: Assume sufficient courtrooms/facilities
255
- 4. **Deterministic durations**: Within-hearing time not modeled
256
- 5. **Perfect information**: All case attributes known
257
- ## Success Criteria
258
- ### Fairness Metrics
259
- * Gini coefficient < 0.4 (disposal time inequality)
260
- * Age variance reduction: 20% vs FIFO baseline
261
- * No case unlisted > 90 days without alert
262
- ### Efficiency Metrics
263
- * Court utilization > 85%
264
- * Average disposal time: Within 10% of historical median by case type
265
- * Throughput: Match or exceed filing rate
266
- ### Urgency Metrics
267
- * High-readiness cases: 80% scheduled within 14 days
268
- * Urgent cases: 95% scheduled within 7 days
269
- * Alert response: 100% of flagged cases reviewed
270
- ## Risk Mitigation
271
- ### Technical Risks
272
- 1. **Optimization solver timeout**: Use heuristics as fallback
273
- 2. **Memory constraints**: Batch processing for large case pools
274
- 3. **Stochastic variability**: Run multiple simulation replications
275
- ### Model Risks
276
- 1. **Parameter drift**: Allow manual parameter overrides
277
- 2. **Edge cases**: Implement rule-based fallbacks
278
- 3. **Unexpected patterns**: Continuous monitoring and adjustment
279
- ## Future Enhancements
280
- ### Short-term
281
- 1. Judge preference modeling
282
- 2. Multi-judge bench support
283
- 3. Case dependency tracking
284
- 4. Lawyer availability constraints
285
- ### Medium-term
286
- 1. Machine learning for duration prediction
287
- 2. Automated parameter updates from live data
288
- 3. Real-time integration with eCourts
289
- 4. Mobile app for judges
290
- ### Long-term
291
- 1. Multi-court coordination (district + high court)
292
- 2. Predictive analytics for case outcomes
293
- 3. Resource optimization (judges, courtrooms)
294
- 4. National deployment framework
295
- ## Deliverables Checklist
296
- - [ ] Scheduler module (fully functional)
297
- - [ ] Parameter loader (tested with EDA outputs)
298
- - [ ] Case generator (realistic synthetic data)
299
- - [ ] Simulation engine (2-year simulation capability)
300
- - [ ] Multiple scheduling policies (FIFO, Priority, Optimized)
301
- - [ ] Optimization model (OR-Tools implementation)
302
- - [ ] Metrics framework (fairness, efficiency, urgency)
303
- - [ ] Dashboard (Streamlit web interface)
304
- - [ ] Validation report (comparison vs historical data)
305
- - [ ] Documentation (comprehensive)
306
- - [ ] Test suite (90%+ coverage)
307
- - [ ] Example notebooks (usage demonstrations)
308
- - [ ] Presentation materials (slides, demo video)
309
- ## Timeline Summary
310
- | Phase | Days | Key Deliverable |
311
- |-------|------|----------------|
312
- | Foundation | 1-2 | Parameter loader, core entities |
313
- | Case Generation | 3-4 | Synthetic case dataset |
314
- | Simulation | 5-7 | Working SimPy simulation |
315
- | Policies | 8-10 | Multiple scheduling algorithms |
316
- | Optimization | 11-14 | OR-Tools optimal scheduler |
317
- | Metrics | 15-16 | Validation and comparison |
318
- | Dashboard | 17-18 | Interactive visualization |
319
- | Polish | 19-20 | Tests, docs, deployment |
320
- **Total**: 20 days (aggressive timeline, assumes full-time focus)
321
- ## Next Immediate Actions
322
- 1. Create scheduler module directory structure
323
- 2. Implement parameter loader (read all EDA CSVs/JSONs)
324
- 3. Define core entities (Case, Courtroom, Judge, Hearing)
325
- 4. Set up development environment with uv
326
- 5. Initialize git repository with proper .gitignore
327
- 6. Create initial unit tests
328
- ***
329
- **Plan Version**: 1.0
330
- **Created**: 2025-11-19
331
- **Status**: Ready to begin implementation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEVELOPMENT.md DELETED
@@ -1,270 +0,0 @@
1
- # Court Scheduling System - Development Documentation
2
-
3
- Living document tracking architectural decisions, implementation rationale, and design patterns.
4
-
5
- ## Table of Contents
6
- 1. [Ripeness Classification System](#ripeness-classification-system)
7
- 2. [Simulation Architecture](#simulation-architecture)
8
- 3. [Code Quality Standards](#code-quality-standards)
9
-
10
- ---
11
-
12
- ## Ripeness Classification System
13
-
14
- ### Overview
15
- The ripeness classifier determines whether cases are ready for substantive judicial time or have bottlenecks that prevent meaningful progress. This addresses hackathon requirement: "Determine how cases could be classified as 'ripe' or 'unripe' based on purposes of hearing and stage."
16
-
17
- ### Implementation Location
18
- - **Classifier**: `scheduler/core/ripeness.py`
19
- - **Integration**: `scheduler/simulation/engine.py` (lines 248-266)
20
- - **Case entity**: `scheduler/core/case.py` (ripeness fields: lines 68-72)
21
-
22
- ### Classification Algorithm
23
-
24
- The `RipenessClassifier.classify()` method uses a 5-step hierarchy:
25
-
26
- ```python
27
- def classify(case: Case, current_date: datetime) -> RipenessStatus:
28
- # 1. Check last hearing purpose for explicit bottleneck keywords
29
- if "SUMMONS" in last_hearing_purpose or "NOTICE" in last_hearing_purpose:
30
- return UNRIPE_SUMMONS
31
- if "STAY" in last_hearing_purpose or "PENDING" in last_hearing_purpose:
32
- return UNRIPE_DEPENDENT
33
-
34
- # 2. Check stage - ADMISSION stage with few hearings is likely unripe
35
- if current_stage == "ADMISSION" and hearing_count < 3:
36
- return UNRIPE_SUMMONS
37
-
38
- # 3. Check if case is "stuck" (many hearings but no progress)
39
- if hearing_count > 10 and avg_gap > 60 days:
40
- return UNRIPE_PARTY
41
-
42
- # 4. Check stage-based ripeness (ripe stages are substantive)
43
- if current_stage in ["ARGUMENTS", "EVIDENCE", "ORDERS / JUDGMENT", "FINAL DISPOSAL"]:
44
- return RIPE
45
-
46
- # 5. Default to RIPE if no bottlenecks detected
47
- return RIPE
48
- ```
49
-
50
- ### Ripeness Statuses
51
-
52
- | Status | Meaning | Example Scenarios |
53
- |--------|---------|-------------------|
54
- | `RIPE` | Ready for substantive hearing | Arguments scheduled, evidence ready, parties available |
55
- | `UNRIPE_SUMMONS` | Waiting for summons service | "ISSUE SUMMONS", "FOR NOTICE", admission <3 hearings |
56
- | `UNRIPE_DEPENDENT` | Waiting for dependent case/order | "STAY APPLICATION PENDING", awaiting higher court |
57
- | `UNRIPE_PARTY` | Party/lawyer unavailable | Stuck cases (>10 hearings, avg gap >60 days) |
58
- | `UNRIPE_DOCUMENT` | Missing documents/evidence | (Future: when document tracking added) |
59
- | `UNKNOWN` | Insufficient data | (Rare, only if case has no history) |
60
-
61
- ### Integration with Simulation
62
-
63
- **Daily scheduling flow** (engine.py `_choose_cases_for_day()`):
64
-
65
- ```python
66
- # 1. Get all active cases
67
- candidates = [c for c in cases if c.status != DISPOSED]
68
-
69
- # 2. Update age and readiness scores
70
- for c in candidates:
71
- c.update_age(current_date)
72
- c.compute_readiness_score()
73
-
74
- # 3. Filter by ripeness (NEW - critical for bottleneck detection)
75
- ripe_candidates = []
76
- for c in candidates:
77
- ripeness = RipenessClassifier.classify(c, current_date)
78
-
79
- if ripeness.is_ripe():
80
- ripe_candidates.append(c)
81
- else:
82
- unripe_filtered_count += 1
83
-
84
- # 4. Apply MIN_GAP_BETWEEN_HEARINGS filter
85
- eligible = [c for c in ripe_candidates if c.is_ready_for_scheduling(14)]
86
-
87
- # 5. Prioritize by policy (FIFO/age/readiness)
88
- eligible = policy.prioritize(eligible, current_date)
89
-
90
- # 6. Allocate to courtrooms
91
- allocations = allocator.allocate(eligible[:total_capacity], current_date)
92
- ```
93
-
94
- **Key points**:
95
- - Ripeness evaluation happens BEFORE gap enforcement
96
- - Unripe cases are completely filtered out (no scheduling)
97
- - Periodic re-evaluation every 7 days to detect ripeness transitions
98
- - Ripeness status stored in case entity for persistence
99
-
100
- ### Ripeness Transitions
101
-
102
- Cases can transition between statuses as bottlenecks are resolved:
103
-
104
- ```python
105
- # Periodic re-evaluation (every 7 days in simulation)
106
- def _evaluate_ripeness(current_date):
107
- for case in active_cases:
108
- prev_status = case.ripeness_status
109
- new_status = RipenessClassifier.classify(case, current_date)
110
-
111
- if new_status != prev_status:
112
- ripeness_transitions += 1
113
-
114
- if new_status.is_ripe():
115
- case.mark_ripe(current_date)
116
- # Case now eligible for scheduling
117
- else:
118
- case.mark_unripe(new_status, reason, current_date)
119
- # Case removed from scheduling pool
120
- ```
121
-
122
- ### Synthetic Data Generation
123
-
124
- To test ripeness in simulation, the case generator (`case_generator.py`) adds realistic `last_hearing_purpose` values:
125
-
126
- ```python
127
- # 20% of cases have bottlenecks (configurable)
128
- bottleneck_purposes = [
129
- "ISSUE SUMMONS",
130
- "FOR NOTICE",
131
- "AWAIT SERVICE OF NOTICE",
132
- "STAY APPLICATION PENDING",
133
- "FOR ORDERS",
134
- ]
135
-
136
- ripe_purposes = [
137
- "ARGUMENTS",
138
- "HEARING",
139
- "FINAL ARGUMENTS",
140
- "FOR JUDGMENT",
141
- "EVIDENCE",
142
- ]
143
-
144
- # Stage-aware assignment
145
- if stage == "ADMISSION" and hearing_count < 3:
146
- # 40% unripe for early admission cases
147
- last_hearing_purpose = random.choice(bottleneck_purposes if random() < 0.4 else ripe_purposes)
148
- elif stage in ["ARGUMENTS", "ORDERS / JUDGMENT"]:
149
- # Advanced stages usually ripe
150
- last_hearing_purpose = random.choice(ripe_purposes)
151
- else:
152
- # 20% unripe for other cases
153
- last_hearing_purpose = random.choice(bottleneck_purposes if random() < 0.2 else ripe_purposes)
154
- ```
155
-
156
- ### Expected Behavior
157
-
158
- For a simulation with 10,000 synthetic cases:
159
- - **If all cases RIPE**:
160
- - Ripeness transitions: 0
161
- - Cases filtered: 0
162
- - All eligible cases can be scheduled
163
-
164
- - **With realistic bottlenecks (20% unripe)**:
165
- - Ripeness transitions: ~50-200 (cases becoming ripe/unripe during simulation)
166
- - Cases filtered per day: ~200-400 (unripe cases blocked from scheduling)
167
- - Scheduling queue smaller (only ripe cases compete for slots)
168
-
169
- ### Why Default is RIPE
170
-
171
- The classifier defaults to RIPE (step 5) because:
172
- 1. **Conservative approach**: If we can't detect a bottleneck, assume case is ready
173
- 2. **Avoid false negatives**: Better to schedule a case that might adjourn than never schedule it
174
- 3. **Real-world behavior**: Most cases in advanced stages are ripe
175
- 4. **Gap enforcement still applies**: Even RIPE cases must respect MIN_GAP_BETWEEN_HEARINGS
176
-
177
- ### Future Enhancements
178
-
179
- 1. **Historical purpose analysis**: Mine actual PurposeOfHearing data to refine keyword mappings
180
- 2. **Machine learning**: Train classifier on labeled cases (ripe/unripe) from court data
181
- 3. **Document tracking**: Integrate with document management system for UNRIPE_DOCUMENT detection
182
- 4. **Dependency graphs**: Model case dependencies explicitly for UNRIPE_DEPENDENT
183
- 5. **Dynamic thresholds**: Learn optimal thresholds (e.g., <3 hearings, >60 day gaps) from data
184
-
185
- ### Metrics Tracked
186
-
187
- The simulation reports:
188
- - `ripeness_transitions`: Number of status changes during simulation
189
- - `unripe_filtered`: Total cases blocked from scheduling due to unripeness
190
- - `ripeness_distribution`: Breakdown of active cases by status at simulation end
191
-
192
- ### Decision Rationale
193
-
194
- **Why separate ripeness from MIN_GAP_BETWEEN_HEARINGS?**
195
- - Ripeness = substantive bottleneck (summons, dependencies, parties)
196
- - Gap = administrative constraint (give time for preparation)
197
- - Conceptually distinct; ripeness can last weeks/months, gap is fixed 14 days
198
-
199
- **Why mark cases as unripe vs. just skip them?**
200
- - Persistence enables tracking and reporting
201
- - Dashboard can show WHY cases weren't scheduled
202
- - Alerts can trigger when unripeness duration exceeds threshold
203
-
204
- **Why evaluate ripeness every 7 days vs. every day?**
205
- - Performance optimization (classification has some cost)
206
- - Ripeness typically doesn't change daily (summons takes weeks)
207
- - Balance between responsiveness and efficiency
208
-
209
- ---
210
-
211
- ## Simulation Architecture
212
-
213
- ### Discrete Event Simulation Flow
214
-
215
- (TODO: Document daily processing, stochastic outcomes, stage transitions)
216
-
217
- ---
218
-
219
- ## Code Quality Standards
220
-
221
- ### Type Hints
222
- Modern Python 3.11+ syntax:
223
- - `X | None` instead of `Optional[X]`
224
- - `list[X]` instead of `List[X]`
225
- - `dict[K, V]` instead of `Dict[K, V]`
226
-
227
- ### Import Organization
228
- - Absolute imports from `scheduler.*` for internal modules
229
- - Inline imports prohibited (all imports at top of file)
230
- - Lazy imports only for TYPE_CHECKING blocks
231
-
232
- ### Performance Guidelines
233
- - Use Polars-native operations (avoid `.map_elements()`)
234
- - Cache expensive computations (see `param_loader._build_*` pattern)
235
- - Profile before optimizing
236
-
237
- ---
238
-
239
- ## Known Issues and Fixes
240
-
241
- ### Fixed: "Cases switched courtrooms" metric
242
- **Problem**: Initial allocations were counted as "switches"
243
- **Fix**: Changed condition to `courtroom_id is not None and courtroom_id != 0`
244
- **Commit**: [TODO]
245
-
246
- ### Fixed: All cases showing RIPE in synthetic data
247
- **Problem**: Generator didn't include `last_hearing_purpose`
248
- **Fix**: Added stage-aware purpose assignment in `case_generator.py`
249
- **Commit**: [TODO]
250
-
251
- ---
252
-
253
- ## Recent Updates (2025-11-25)
254
-
255
- ### Algorithm Override System Fixed
256
- - **Fixed circular dependency**: Moved `SchedulerPolicy` from `scheduler.simulation.scheduler` to `scheduler.core.policy`
257
- - **Implemented missing overrides**: ADD_CASE and PRIORITY overrides now fully functional
258
- - **Added override validation**: `OverrideValidator` integrated with proper constraint checking
259
- - **Extended Override dataclass**: Added algorithm-required fields (`make_ripe`, `new_position`, `new_priority`, `new_capacity`)
260
- - **Judge Preferences**: Added `capacity_overrides` for per-courtroom capacity control
261
-
262
- ### System Status Update
263
- - **Project completion**: 90% complete (not 50% as previously estimated)
264
- - **All core hackathon requirements**: Implemented and tested
265
- - **Production readiness**: System ready for Karnataka High Court pilot deployment
266
- - **Performance validated**: 81.4% disposal rate, perfect load balance (Gini 0.002)
267
-
268
- ---
269
-
270
- Last updated: 2025-11-25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
HACKATHON_SUBMISSION.md CHANGED
@@ -68,21 +68,21 @@ After completion, you'll find in your output directory:
68
 
69
  ```
70
  data/hackathon_run/
71
- ├── pipeline_config.json # Full configuration used
72
- ├── training_cases.csv # Generated case dataset
73
- ├── trained_rl_agent.pkl # Trained RL model
74
- ├── EXECUTIVE_SUMMARY.md # Hackathon submission summary
75
- ├── COMPARISON_REPORT.md # Detailed performance comparison
76
- ├── simulation_rl/ # RL policy results
77
- │ ├── events.csv
78
- │ ├── metrics.csv
79
- │ ├── report.txt
80
- │ └── cause_lists/
81
- │ └── daily_cause_list.csv # 730 days of cause lists
82
- ├── simulation_readiness/ # Baseline results
83
- │ └── ...
84
- └── visualizations/ # Performance charts
85
- └── performance_charts.md
86
  ```
87
 
88
  ### Hackathon Winning Features
 
68
 
69
  ```
70
  data/hackathon_run/
71
+ |-- pipeline_config.json # Full configuration used
72
+ |-- training_cases.csv # Generated case dataset
73
+ |-- trained_rl_agent.pkl # Trained RL model
74
+ |-- EXECUTIVE_SUMMARY.md # Hackathon submission summary
75
+ |-- COMPARISON_REPORT.md # Detailed performance comparison
76
+ |-- simulation_rl/ # RL policy results
77
+ |-- events.csv
78
+ |-- metrics.csv
79
+ |-- report.txt
80
+ |-- cause_lists/
81
+ |-- daily_cause_list.csv # 730 days of cause lists
82
+ |-- simulation_readiness/ # Baseline results
83
+ |-- ...
84
+ |-- visualizations/ # Performance charts
85
+ |-- performance_charts.md
86
  ```
87
 
88
  ### Hackathon Winning Features
PIPELINE.md DELETED
@@ -1,259 +0,0 @@
1
- # Court Scheduling System - Pipeline Documentation
2
-
3
- This document outlines the complete development and deployment pipeline for the intelligent court scheduling system.
4
-
5
- ## Project Structure
6
-
7
- ```
8
- code4change-analysis/
9
- ├── configs/ # Configuration files
10
- │ ├── rl_training_fast.json # Fast RL training config
11
- │ └── rl_training_intensive.json # Intensive RL training config
12
- ├── court_scheduler/ # CLI interface (legacy)
13
- ├── Data/ # Raw data files
14
- │ ├── court_data.duckdb # DuckDB database
15
- │ ├── ISDMHack_Cases_WPfinal.csv
16
- │ └── ISDMHack_Hear.csv
17
- ├── data/generated/ # Generated datasets
18
- │ ├── cases.csv # Standard test cases
19
- │ └── large_training_cases.csv # Large RL training set
20
- ├── models/ # Trained RL models
21
- │ ├── trained_rl_agent.pkl # Standard trained agent
22
- │ └── intensive_trained_rl_agent.pkl # Intensive trained agent
23
- ├── reports/figures/ # EDA outputs and parameters
24
- │ └── v0.4.0_*/ # Versioned analysis runs
25
- │ └── params/ # Simulation parameters
26
- ├── rl/ # Reinforcement Learning module
27
- │ ├── __init__.py # Module interface
28
- │ ├── simple_agent.py # Tabular Q-learning agent
29
- │ ├── training.py # Training environment
30
- │ └── README.md # RL documentation
31
- ├── scheduler/ # Core scheduling system
32
- │ ├── core/ # Base entities and algorithms
33
- │ ├── data/ # Data loading and generation
34
- │ └── simulation/ # Simulation engine and policies
35
- ├── scripts/ # Utility scripts
36
- │ ├── compare_policies.py # Policy comparison framework
37
- │ ├── generate_cases.py # Case generation utility
38
- │ └── simulate.py # Single simulation runner
39
- ├── src/ # EDA pipeline
40
- │ ├── run_eda.py # Full EDA pipeline
41
- │ ├── eda_config.py # EDA configuration
42
- │ ├── eda_load_clean.py # Data loading and cleaning
43
- │ ├── eda_exploration.py # Exploratory analysis
44
- │ └── eda_parameters.py # Parameter extraction
45
- ├── tests/ # Test suite
46
- ├── train_rl_agent.py # RL training script
47
- └── README.md # Main documentation
48
- ```
49
-
50
- ## Pipeline Overview
51
-
52
- ### 1. Data Pipeline
53
-
54
- #### EDA and Parameter Extraction
55
- ```bash
56
- # Run full EDA pipeline
57
- uv run python src/run_eda.py
58
- ```
59
-
60
- **Outputs:**
61
- - Parameter CSVs in `reports/figures/v0.4.0_*/params/`
62
- - Visualization HTML files
63
- - Cleaned data in Parquet format
64
-
65
- **Key Parameters Generated:**
66
- - `stage_duration.csv` - Duration statistics per stage
67
- - `stage_transition_probs.csv` - Transition probabilities
68
- - `adjournment_proxies.csv` - Adjournment rates by stage/type
69
- - `court_capacity_global.json` - Court capacity metrics
70
-
71
- #### Case Generation
72
- ```bash
73
- # Generate training dataset
74
- uv run python scripts/generate_cases.py \
75
- --start 2023-01-01 --end 2024-06-30 \
76
- --n 10000 --stage-mix auto \
77
- --out data/generated/large_cases.csv
78
- ```
79
-
80
- ### 2. Model Training Pipeline
81
-
82
- #### RL Agent Training
83
- ```bash
84
- # Fast training (development)
85
- uv run python train_rl_agent.py --config configs/rl_training_fast.json
86
-
87
- # Production training
88
- uv run python train_rl_agent.py --config configs/rl_training_intensive.json
89
- ```
90
-
91
- **Training Process:**
92
- 1. Load configuration parameters
93
- 2. Initialize TabularQAgent with specified hyperparameters
94
- 3. Run episodic training with case generation
95
- 4. Save trained model to `models/` directory
96
- 5. Generate learning statistics and analysis
97
-
98
- ### 3. Evaluation Pipeline
99
-
100
- #### Single Policy Simulation
101
- ```bash
102
- uv run python scripts/simulate.py \
103
- --cases-csv data/generated/large_cases.csv \
104
- --policy rl --days 90 --seed 42
105
- ```
106
-
107
- #### Multi-Policy Comparison
108
- ```bash
109
- uv run python scripts/compare_policies.py \
110
- --cases-csv data/generated/large_cases.csv \
111
- --days 90 --policies readiness rl fifo age
112
- ```
113
-
114
- **Outputs:**
115
- - Simulation reports in `runs/` directory
116
- - Performance metrics (disposal rates, utilization)
117
- - Comparison analysis markdown
118
-
119
- ## Configuration Management
120
-
121
- ### RL Training Configurations
122
-
123
- #### Fast Training (`configs/rl_training_fast.json`)
124
- ```json
125
- {
126
- "episodes": 20,
127
- "cases_per_episode": 200,
128
- "episode_length": 15,
129
- "learning_rate": 0.2,
130
- "initial_epsilon": 0.5,
131
- "model_name": "fast_rl_agent.pkl"
132
- }
133
- ```
134
-
135
- #### Intensive Training (`configs/rl_training_intensive.json`)
136
- ```json
137
- {
138
- "episodes": 100,
139
- "cases_per_episode": 1000,
140
- "episode_length": 45,
141
- "learning_rate": 0.15,
142
- "initial_epsilon": 0.4,
143
- "model_name": "intensive_rl_agent.pkl"
144
- }
145
- ```
146
-
147
- ### Parameter Override
148
- ```bash
149
- # Override specific parameters
150
- uv run python train_rl_agent.py \
151
- --episodes 50 \
152
- --learning-rate 0.12 \
153
- --epsilon 0.3 \
154
- --model-name "custom_agent.pkl"
155
- ```
156
-
157
- ## Scheduling Policies
158
-
159
- ### Available Policies
160
-
161
- 1. **FIFO** - First In, First Out scheduling
162
- 2. **Age** - Prioritize older cases
163
- 3. **Readiness** - Composite score (age + readiness + urgency)
164
- 4. **RL** - Reinforcement learning based prioritization
165
-
166
- ### Policy Integration
167
-
168
- All policies implement the `SchedulerPolicy` interface:
169
- - `prioritize(cases, current_date)` - Main scheduling logic
170
- - `get_name()` - Policy identifier
171
- - `requires_readiness_score()` - Readiness computation flag
172
-
173
- ## Performance Benchmarks
174
-
175
- ### Current Results (10,000 cases, 90 days)
176
-
177
- | Policy | Disposal Rate | Utilization | Gini Coefficient |
178
- |--------|---------------|-------------|------------------|
179
- | Readiness | 51.9% | 85.7% | 0.243 |
180
- | RL Agent | 52.1% | 85.4% | 0.248 |
181
-
182
- **Status**: Performance parity achieved between RL and expert heuristic
183
-
184
- ## Development Workflow
185
-
186
- ### 1. Feature Development
187
- ```bash
188
- # Create feature branch
189
- git checkout -b feature/new-scheduling-policy
190
-
191
- # Implement changes
192
- # Run tests
193
- uv run python -m pytest tests/
194
-
195
- # Validate with simulation
196
- uv run python scripts/simulate.py --policy new_policy --days 30
197
- ```
198
-
199
- ### 2. Model Iteration
200
- ```bash
201
- # Update training config
202
- vim configs/rl_training_custom.json
203
-
204
- # Retrain model
205
- uv run python train_rl_agent.py --config configs/rl_training_custom.json
206
-
207
- # Evaluate performance
208
- uv run python scripts/compare_policies.py --policies readiness rl
209
- ```
210
-
211
- ### 3. Production Deployment
212
- ```bash
213
- # Run full EDA pipeline
214
- uv run python src/run_eda.py
215
-
216
- # Generate production dataset
217
- uv run python scripts/generate_cases.py --n 50000 --out data/production/cases.csv
218
-
219
- # Train production model
220
- uv run python train_rl_agent.py --config configs/rl_training_intensive.json
221
-
222
- # Validate performance
223
- uv run python scripts/compare_policies.py --cases-csv data/production/cases.csv
224
- ```
225
-
226
- ## Quality Assurance
227
-
228
- ### Testing Framework
229
- ```bash
230
- # Run all tests
231
- uv run python -m pytest tests/
232
-
233
- # Test specific component
234
- uv run python -m pytest tests/test_invariants.py
235
-
236
- # Validate system integration
237
- uv run python test_phase1.py
238
- ```
239
-
240
- ### Performance Validation
241
- - Disposal rate benchmarks
242
- - Utilization efficiency metrics
243
- - Load balancing fairness (Gini coefficient)
244
- - Case coverage verification
245
-
246
- ## Monitoring and Maintenance
247
-
248
- ### Key Metrics to Monitor
249
- - Model performance degradation
250
- - State space exploration coverage
251
- - Training convergence metrics
252
- - Simulation runtime performance
253
-
254
- ### Model Refresh Cycle
255
- 1. Monthly EDA pipeline refresh
256
- 2. Quarterly model retraining
257
- 3. Annual architecture review
258
-
259
- This pipeline ensures reproducible, configurable, and maintainable court scheduling system development and deployment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
RL_EXPLORATION_PLAN.md DELETED
Binary file (14.9 kB)
 
SUBMISSION_SUMMARY.md DELETED
@@ -1,417 +0,0 @@
1
- # Court Scheduling System - Hackathon Submission Summary
2
-
3
- **Karnataka High Court Case Scheduling Optimization**
4
- **Code4Change Hackathon 2025**
5
-
6
- ---
7
-
8
- ## Executive Summary
9
-
10
- This system simulates and optimizes court case scheduling for Karnataka High Court over a 2-year period, incorporating intelligent ripeness classification, dynamic multi-courtroom allocation, and data-driven priority scheduling.
11
-
12
- ### Key Results (500-day simulation, 10,000 cases)
13
-
14
- - **81.4% disposal rate** - Significantly higher than baseline
15
- - **97.7% cases scheduled** - Near-zero case abandonment
16
- - **68.9% hearing success rate** - Effective adjournment management
17
- - **45% utilization** - Realistic capacity usage accounting for workload variation
18
- - **0.002 Gini (load balance)** - Perfect fairness across courtrooms
19
- - **40.8% unripe filter rate** - Intelligent bottleneck detection preventing wasted judicial time
20
-
21
- ---
22
-
23
- ## System Architecture
24
-
25
- ### 1. Ripeness Classification System
26
-
27
- **Problem**: Courts waste time on cases with unresolved bottlenecks (summons not served, parties unavailable, documents pending).
28
-
29
- **Solution**: Data-driven classifier filters cases into RIPE vs UNRIPE:
30
-
31
- | Status | Cases (End) | Meaning |
32
- |--------|-------------|---------|
33
- | RIPE | 87.4% | Ready for substantive hearing |
34
- | UNRIPE_SUMMONS | 9.4% | Waiting for summons/notice service |
35
- | UNRIPE_DEPENDENT | 3.2% | Waiting for dependent case/order |
36
-
37
- **Algorithm**:
38
- 1. Check last hearing purpose for bottleneck keywords
39
- 2. Flag early ADMISSION cases (<3 hearings) as potentially unripe
40
- 3. Detect "stuck" cases (>10 hearings, >60 day gaps)
41
- 4. Stage-based classification (ARGUMENTS → RIPE)
42
- 5. Default to RIPE if no bottlenecks detected
43
-
44
- **Impact**:
45
- - Filtered 93,834 unripe case-day combinations (40.8% filter rate)
46
- - Prevented wasteful hearings that would adjourn immediately
47
- - Optimized judicial time for cases ready to progress
48
-
49
- ### 2. Dynamic Multi-Courtroom Allocation
50
-
51
- **Problem**: Static courtroom assignments create workload imbalances and inefficiency.
52
-
53
- **Solution**: Load-balanced allocator distributes cases evenly across 5 courtrooms daily.
54
-
55
- **Results**:
56
- - Perfect load balance (Gini = 0.002)
57
- - Courtroom loads: 67.6-68.3 cases/day (±0.5%)
58
- - 101,260 allocation decisions over 401 working days
59
- - Zero capacity rejections
60
-
61
- **Strategy**:
62
- - Least-loaded courtroom selection
63
- - Dynamic reallocation as workload changes
64
- - Respects per-courtroom capacity (151 cases/day)
65
-
66
- ### 3. Intelligent Priority Scheduling
67
-
68
- **Policy**: Readiness-based with adjournment boost
69
-
70
- **Formula**:
71
- ```
72
- priority = age*0.35 + readiness*0.25 + urgency*0.25 + adjournment_boost*0.15
73
- ```
74
-
75
- **Components**:
76
- - **Age (35%)**: Fairness - older cases get priority
77
- - **Readiness (25%)**: Efficiency - cases with more hearings/advanced stages prioritized
78
- - **Urgency (25%)**: Critical cases (medical, custodial) fast-tracked
79
- - **Adjournment boost (15%)**: Recently adjourned cases boosted to prevent indefinite postponement
80
-
81
- **Adjournment Boost Decay**:
82
- - Exponential decay: `boost = exp(-days_since_hearing / 21)`
83
- - Day 7: 71% boost (strong)
84
- - Day 14: 50% boost (moderate)
85
- - Day 21: 37% boost (weak)
86
- - Day 28: 26% boost (very weak)
87
-
88
- **Impact**:
89
- - Balanced fairness (old cases progress) with efficiency (recent cases complete)
90
- - 31.1% adjournment rate (realistic given court dynamics)
91
- - Average 20.9 hearings to disposal (efficient case progression)
92
-
93
- ### 4. Stochastic Simulation Engine
94
-
95
- **Design**: Discrete event simulation with probabilistic outcomes
96
-
97
- **Daily Flow**:
98
- 1. Evaluate ripeness for all active cases (every 7 days)
99
- 2. Filter by ripeness status (RIPE only)
100
- 3. Apply MIN_GAP_BETWEEN_HEARINGS (14 days)
101
- 4. Prioritize by policy
102
- 5. Allocate to courtrooms (capacity-constrained)
103
- 6. Execute hearings with stochastic outcomes:
104
- - 68.9% heard → stage progression possible
105
- - 31.1% adjourned → reschedule
106
- 7. Check disposal probability (case-type-aware, maturity-based)
107
- 8. Record metrics and events
108
-
109
- **Data-Driven Parameters**:
110
- - Adjournment probabilities by stage × case type (from historical data)
111
- - Stage transition probabilities (from Karnataka HC data)
112
- - Stage duration distributions (median, p90)
113
- - Case-type-specific disposal patterns
114
-
115
- ### 5. Comprehensive Metrics Framework
116
-
117
- **Tracked Metrics**:
118
- - **Fairness**: Gini coefficient, age variance, disposal equity
119
- - **Efficiency**: Utilization, throughput, disposal time
120
- - **Ripeness**: Transitions, filter rate, bottleneck breakdown
121
- - **Allocation**: Load variance, courtroom balance
122
- - **No-case-left-behind**: Coverage, max gap, alert triggers
123
-
124
- **Outputs**:
125
- - `metrics.csv`: Daily time-series (date, scheduled, heard, adjourned, disposals, utilization)
126
- - `events.csv`: Full audit trail (scheduling, outcomes, stage changes, disposals, ripeness changes)
127
- - `report.txt`: Comprehensive simulation summary
128
-
129
- ---
130
-
131
- ## Disposal Performance by Case Type
132
-
133
- | Case Type | Disposed | Total | Rate |
134
- |-----------|----------|-------|------|
135
- | CP (Civil Petition) | 833 | 963 | **86.5%** |
136
- | CMP (Miscellaneous) | 237 | 275 | **86.2%** |
137
- | CA (Civil Appeal) | 1,676 | 1,949 | **86.0%** |
138
- | CCC | 978 | 1,147 | **85.3%** |
139
- | CRP (Civil Revision) | 1,750 | 2,062 | **84.9%** |
140
- | RSA (Regular Second Appeal) | 1,488 | 1,924 | **77.3%** |
141
- | RFA (Regular First Appeal) | 1,174 | 1,680 | **69.9%** |
142
-
143
- **Analysis**:
144
- - Short-lifecycle cases (CP, CMP, CA) achieve 85%+ disposal
145
- - Complex appeals (RFA, RSA) have lower disposal rates (expected behavior - require more hearings)
146
- - System correctly prioritizes case complexity in disposal logic
147
-
148
- ---
149
-
150
- ## No-Case-Left-Behind Verification
151
-
152
- **Requirement**: Ensure no case is forgotten in 2-year simulation.
153
-
154
- **Results**:
155
- - **97.7% scheduled at least once** (9,766/10,000)
156
- - **2.3% never scheduled** (234 cases)
157
- - Reason: Newly filed cases near simulation end + capacity constraints
158
- - All were RIPE and eligible, just lower priority than older cases
159
- - **0 cases stuck >90 days** in active pool (forced scheduling not triggered)
160
-
161
- **Tracking Mechanism**:
162
- - `last_scheduled_date` field on every case
163
- - `days_since_last_scheduled` counter
164
- - Alert thresholds: 60 days (yellow), 90 days (red, forced scheduling)
165
-
166
- **Validation**: Zero red alerts over 500 days confirms effective coverage.
167
-
168
- ---
169
-
170
- ## Courtroom Utilization Analysis
171
-
172
- **Overall Utilization**: 45.0%
173
-
174
- **Why Not 100%?**
175
-
176
- 1. **Ripeness filtering**: 40.8% of candidate case-days filtered as unripe
177
- 2. **Gap enforcement**: MIN_GAP_BETWEEN_HEARINGS (14 days) prevents immediate rescheduling
178
- 3. **Case progression**: As cases dispose, pool shrinks (10,000 → 1,864 active by end)
179
- 4. **Realistic constraint**: Courts don't operate at theoretical max capacity
180
-
181
- **Daily Load Variation**:
182
- - Max: 151 cases/courtroom (full capacity, early days)
183
- - Min: 27 cases/courtroom (late simulation, many disposed)
184
- - Avg: 68 cases/courtroom (healthy sustainable load)
185
-
186
- **Comparison to Real Courts**:
187
- - Real Karnataka HC utilization: ~40-50% (per industry reports)
188
- - Simulation: 45% (matches reality)
189
-
190
- ---
191
-
192
- ## Key Features Implemented
193
-
194
- ### ✅ Phase 4: Ripeness Classification
195
- - 5-step hierarchical classifier
196
- - Keyword-based bottleneck detection
197
- - Stage-aware classification
198
- - Periodic re-evaluation (every 7 days)
199
- - 93,834 unripe cases filtered over 500 days
200
-
201
- ### ✅ Phase 5: Dynamic Multi-Courtroom Allocation
202
- - Load-balanced allocator
203
- - Perfect fairness (Gini 0.002)
204
- - Zero capacity rejections
205
- - 101,260 allocation decisions
206
-
207
- ### ✅ Phase 9: Advanced Scheduling Policy
208
- - Readiness-based composite priority
209
- - Adjournment boost with exponential decay
210
- - Data-driven adjournment probabilities
211
- - Case-type-aware disposal logic
212
-
213
- ### ✅ Phase 10: Comprehensive Metrics
214
- - Fairness metrics (Gini, age variance)
215
- - Efficiency metrics (utilization, throughput)
216
- - Ripeness metrics (transitions, filter rate)
217
- - Disposal metrics (rate by case type)
218
- - No-case-left-behind tracking
219
-
220
- ---
221
-
222
- ## Technical Excellence
223
-
224
- ### Code Quality
225
- - Modern Python 3.11+ type hints (`X | None`, `list[X]`)
226
- - Clean architecture: separation of concerns (core, simulation, data, metrics)
227
- - Comprehensive documentation (DEVELOPMENT.md)
228
- - No inline imports
229
- - Polars-native operations (performance optimized)
230
-
231
- ### Testing
232
- - Validated against historical Karnataka HC data
233
- - Stochastic simulations with multiple seeds
234
- - Metrics match real-world court behavior
235
- - Edge cases handled (new filings, disposal, adjournments)
236
-
237
- ### Performance
238
- - 500-day simulation: ~30 seconds
239
- - 136,303 hearings simulated
240
- - 10,000 cases tracked
241
- - Event-level audit trail maintained
242
-
243
- ---
244
-
245
- ## Data Gap Analysis
246
-
247
- ### Current Limitations
248
- Our synthetic data lacks:
249
- 1. Summons service status
250
- 2. Case dependency information
251
- 3. Lawyer/party availability
252
- 4. Document completeness tracking
253
- 5. Actual hearing duration
254
-
255
- ### Proposed Enrichments
256
-
257
- Courts should capture:
258
-
259
- | Field | Type | Justification | Impact |
260
- |-------|------|---------------|--------|
261
- | `summons_service_status` | Enum | Enable precise UNRIPE_SUMMONS detection | -15% wasted hearings |
262
- | `dependent_case_ids` | List[str] | Model case dependencies explicitly | -10% premature scheduling |
263
- | `lawyer_registered` | bool | Track lawyer availability | -8% party absence adjournments |
264
- | `party_attendance_rate` | float | Predict party no-shows | -12% party absence adjournments |
265
- | `documents_submitted` | int | Track document readiness | -7% document delay adjournments |
266
- | `estimated_hearing_duration` | int | Better capacity planning | +20% utilization |
267
- | `bottleneck_type` | Enum | Explicit bottleneck tracking | +25% ripeness accuracy |
268
- | `priority_flag` | Enum | Judge-set priority overrides | +30% urgent case throughput |
269
-
270
- **Expected Combined Impact**:
271
- - 40% reduction in adjournments due to bottlenecks
272
- - 20% increase in utilization
273
- - 50% improvement in ripeness classification accuracy
274
-
275
- ---
276
-
277
- ## Additional Features Implemented
278
-
279
- ### Daily Cause List Generator - COMPLETE
280
- - CSV cause lists generated per courtroom per day (`scheduler/output/cause_list.py`)
281
- - Export format includes: Date, Courtroom, Case_ID, Case_Type, Stage, Sequence
282
- - Comprehensive statistics and no-case-left-behind verification
283
- - Script available: `scripts/generate_all_cause_lists.py`
284
-
285
- ### Judge Override System - CORE COMPLETE
286
- - Complete API for judge control (`scheduler/control/overrides.py`)
287
- - ADD_CASE, REMOVE_CASE, PRIORITY, REORDER, RIPENESS overrides implemented
288
- - Override validation and audit trail system
289
- - Judge preferences for capacity control
290
- - UI component pending (backend fully functional)
291
-
292
- ### No-Case-Left-Behind Verification - COMPLETE
293
- - Built-in tracking system in case entity
294
- - Alert thresholds: 60 days (warning), 90 days (critical)
295
- - 97.7% coverage achieved (9,766/10,000 cases scheduled)
296
- - Comprehensive verification reports generated
297
-
298
- ### Remaining Enhancements
299
- - **Interactive Dashboard**: Streamlit UI for visualization and control
300
- - **Real-time Alerts**: Email/SMS notification system
301
- - **Advanced Visualizations**: Sankey diagrams, heatmaps
302
-
303
- ---
304
-
305
- ## Validation Against Requirements
306
-
307
- ### Step 2: Data-Informed Modelling ✅
308
-
309
- **Requirement**: "Determine how cases could be classified as 'ripe' or 'unripe'"
310
- - **Delivered**: 5-step ripeness classifier with 3 bottleneck types
311
- - **Evidence**: 40.8% filter rate, 93,834 unripe cases blocked
312
-
313
- **Requirement**: "Identify gaps in current data capture"
314
- - **Delivered**: 8 proposed synthetic fields with justification
315
- - **Document**: Data Gap Analysis section above
316
-
317
- ### Step 3: Algorithm Development ✅
318
-
319
- **Requirement**: "Allocates cases dynamically across multiple simulated courtrooms"
320
- - **Delivered**: Load-balanced allocator, Gini 0.002
321
- - **Evidence**: 101,260 allocations, perfect balance
322
-
323
- **Requirement**: "Simulates case progression over a two-year period"
324
- - **Delivered**: 500-day simulation (18 months)
325
- - **Evidence**: 136,303 hearings, 8,136 disposals
326
-
327
- **Requirement**: "Ensures no case is left behind"
328
- - **Delivered**: 97.7% coverage, 0 red alerts
329
- - **Evidence**: Comprehensive tracking system
330
-
331
- ---
332
-
333
- ## Conclusion
334
-
335
- This Court Scheduling System demonstrates a production-ready solution for Karnataka High Court's case management challenges. By combining intelligent ripeness classification, dynamic allocation, and data-driven priority scheduling, the system achieves:
336
-
337
- - **High disposal rate** (81.4%) through bottleneck filtering and adjournment management
338
- - **Perfect fairness** (Gini 0.002) via load-balanced allocation
339
- - **Near-complete coverage** (97.7%) ensuring no case abandonment
340
- - **Realistic performance** (45% utilization) matching real-world court operations
341
-
342
- The system is **ready for pilot deployment** with Karnataka High Court, with clear pathways for enhancement through cause list generation, judge overrides, and interactive dashboards.
343
-
344
- ---
345
-
346
- ## Repository Structure
347
-
348
- ```
349
- code4change-analysis/
350
- ├── scheduler/ # Core simulation engine
351
- │ ├── core/ # Case, Courtroom, Judge entities
352
- │ │ ├── case.py # Case entity with priority scoring
353
- │ │ ├── ripeness.py # Ripeness classifier
354
- │ │ └── ...
355
- │ ├── simulation/ # Simulation engine
356
- │ │ ├── engine.py # Main simulation loop
357
- │ │ ├── allocator.py # Multi-courtroom allocator
358
- │ │ ├── policies/ # Scheduling policies
359
- │ │ └── ...
360
- │ ├── data/ # Data generation and loading
361
- │ │ ├── case_generator.py # Synthetic case generator
362
- │ │ ├── param_loader.py # Historical data parameters
363
- │ │ └── ...
364
- │ └── metrics/ # Performance metrics
365
-
366
- ├── data/ # Data files
367
- │ ├── generated/ # Synthetic cases
368
- │ └── full_simulation/ # Simulation outputs
369
- │ ├── report.txt # Comprehensive report
370
- │ ├── metrics.csv # Daily time-series
371
- │ └── events.csv # Full audit trail
372
-
373
- ├── main.py # CLI entry point
374
- ├── DEVELOPMENT.md # Technical documentation
375
- ├── SUBMISSION_SUMMARY.md # This document
376
- └── README.md # Quick start guide
377
- ```
378
-
379
- ---
380
-
381
- ## Usage
382
-
383
- ### Quick Start
384
- ```bash
385
- # Install dependencies
386
- uv sync
387
-
388
- # Generate test cases
389
- uv run python main.py generate --cases 10000
390
-
391
- # Run 2-year simulation
392
- uv run python main.py simulate --days 500 --cases data/generated/cases.csv
393
-
394
- # View results
395
- cat data/sim_runs/*/report.txt
396
- ```
397
-
398
- ### Full Pipeline
399
- ```bash
400
- # End-to-end workflow
401
- uv run python main.py workflow --cases 10000 --days 500
402
- ```
403
-
404
- ---
405
-
406
- ## Contact
407
-
408
- **Team**: [Your Name/Team Name]
409
- **Institution**: [Your Institution]
410
- **Email**: [Your Email]
411
- **GitHub**: [Repository URL]
412
-
413
- ---
414
-
415
- **Last Updated**: 2025-11-25
416
- **Simulation Version**: 1.0
417
- **Status**: Production Ready - Hackathon Submission Complete
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SYSTEM_WORKFLOW.md DELETED
@@ -1,642 +0,0 @@
1
- # Court Scheduling System - Complete Workflow & Logic Flow
2
-
3
- **Step-by-Step Guide: How the System Actually Works**
4
-
5
- ---
6
-
7
- ## Table of Contents
8
- 1. [System Workflow Overview](#system-workflow-overview)
9
- 2. [Phase 1: Data Preparation](#phase-1-data-preparation)
10
- 3. [Phase 2: Simulation Initialization](#phase-2-simulation-initialization)
11
- 4. [Phase 3: Daily Scheduling Loop](#phase-3-daily-scheduling-loop)
12
- 5. [Phase 4: Output Generation](#phase-4-output-generation)
13
- 6. [Phase 5: Analysis & Reporting](#phase-5-analysis--reporting)
14
- 7. [Complete Example Walkthrough](#complete-example-walkthrough)
15
- 8. [Data Flow Pipeline](#data-flow-pipeline)
16
-
17
- ---
18
-
19
- ## System Workflow Overview
20
-
21
- The Court Scheduling System operates in **5 sequential phases** that transform historical court data into optimized daily cause lists:
22
-
23
- ```
24
- Historical Data → Data Preparation → Simulation Setup → Daily Scheduling → Output Generation → Analysis
25
- ↓ ↓ ↓ ↓ ↓ ↓
26
- 739K hearings Parameters & Initialized Daily cause CSV files & Performance
27
- 134K cases Generated cases simulation lists for 384 Reports metrics
28
- ```
29
-
30
- **Key Outputs:**
31
- - **Daily Cause Lists**: CSV files for each courtroom/day
32
- - **Simulation Report**: Overall performance summary
33
- - **Metrics File**: Daily performance tracking
34
- - **Individual Case Audit**: Complete hearing history
35
-
36
- ---
37
-
38
- ## Phase 1: Data Preparation
39
-
40
- ### Step 1.1: Historical Data Analysis (EDA Pipeline)
41
-
42
- **Input**:
43
- - `ISDMHack_Case.csv` (134,699 cases)
44
- - `ISDMHack_Hear.csv` (739,670 hearings)
45
-
46
- **Process**:
47
- ```python
48
- # Load and merge historical data
49
- cases_df = pd.read_csv("ISDMHack_Case.csv")
50
- hearings_df = pd.read_csv("ISDMHack_Hear.csv")
51
- merged_data = cases_df.merge(hearings_df, on="Case_ID")
52
-
53
- # Extract key parameters
54
- case_type_distribution = cases_df["Type"].value_counts(normalize=True)
55
- stage_transitions = calculate_stage_progression_probabilities(merged_data)
56
- adjournment_rates = calculate_adjournment_rates_by_stage(hearings_df)
57
- daily_capacity = hearings_df.groupby("Hearing_Date").size().mean()
58
- ```
59
-
60
- **Output**:
61
- ```python
62
- # Extracted parameters stored in config.py
63
- CASE_TYPE_DISTRIBUTION = {"CRP": 0.201, "CA": 0.200, ...}
64
- STAGE_TRANSITIONS = {"ADMISSION->ARGUMENTS": 0.72, ...}
65
- ADJOURNMENT_RATES = {"ADMISSION": 0.38, "ARGUMENTS": 0.31, ...}
66
- DEFAULT_DAILY_CAPACITY = 151 # cases per courtroom per day
67
- ```
68
-
69
- ### Step 1.2: Synthetic Case Generation
70
-
71
- **Input**:
72
- - Configuration: `configs/generate.sample.toml`
73
- - Extracted parameters from Step 1.1
74
-
75
- **Process**:
76
- ```python
77
- # Generate 10,000 synthetic cases
78
- for i in range(10000):
79
- case = Case(
80
- case_id=f"C{i:06d}",
81
- case_type=random_choice_weighted(CASE_TYPE_DISTRIBUTION),
82
- filed_date=random_date_in_range("2022-01-01", "2023-12-31"),
83
- current_stage=random_choice_weighted(STAGE_DISTRIBUTION),
84
- is_urgent=random_boolean(0.05), # 5% urgent cases
85
- )
86
-
87
- # Add realistic hearing history
88
- generate_hearing_history(case, historical_patterns)
89
- cases.append(case)
90
- ```
91
-
92
- **Output**:
93
- - `data/generated/cases.csv` with 10,000 synthetic cases
94
- - Each case has realistic attributes based on historical patterns
95
-
96
- ---
97
-
98
- ## Phase 2: Simulation Initialization
99
-
100
- ### Step 2.1: Load Configuration
101
-
102
- **Input**: `configs/simulate.sample.toml`
103
- ```toml
104
- cases = "data/generated/cases.csv"
105
- days = 384 # 2-year simulation
106
- policy = "readiness" # Scheduling policy
107
- courtrooms = 5
108
- daily_capacity = 151
109
- ```
110
-
111
- ### Step 2.2: Initialize System State
112
-
113
- **Process**:
114
- ```python
115
- # Load generated cases
116
- cases = load_cases_from_csv("data/generated/cases.csv")
117
-
118
- # Initialize courtrooms
119
- courtrooms = [
120
- Courtroom(id=1, daily_capacity=151),
121
- Courtroom(id=2, daily_capacity=151),
122
- # ... 5 courtrooms total
123
- ]
124
-
125
- # Initialize scheduling policy
126
- policy = ReadinessPolicy(
127
- fairness_weight=0.4,
128
- efficiency_weight=0.3,
129
- urgency_weight=0.3
130
- )
131
-
132
- # Initialize simulation clock
133
- current_date = datetime(2023, 12, 29) # Start date
134
- end_date = current_date + timedelta(days=384)
135
- ```
136
-
137
- **Output**:
138
- - Simulation environment ready with 10,000 cases and 5 courtrooms
139
- - Policy configured with optimization weights
140
-
141
- ---
142
-
143
- ## Phase 3: Daily Scheduling Loop
144
-
145
- **This is the core algorithm that runs 384 times (once per working day)**
146
-
147
- ### Daily Loop Structure
148
- ```python
149
- for day in range(384): # Each working day for 2 years
150
- current_date += timedelta(days=1)
151
-
152
- # Skip weekends and holidays
153
- if not is_working_day(current_date):
154
- continue
155
-
156
- # Execute daily scheduling algorithm
157
- daily_result = schedule_daily_hearings(cases, current_date)
158
-
159
- # Update system state for next day
160
- update_case_states(cases, daily_result)
161
-
162
- # Generate daily outputs
163
- generate_cause_lists(daily_result, current_date)
164
- ```
165
-
166
- ### Step 3.1: Daily Scheduling Algorithm (Core Logic)
167
-
168
- **INPUT**:
169
- - All active cases (initially 10,000)
170
- - Current date
171
- - Courtroom capacities
172
-
173
- **CHECKPOINT 1: Case Status Filtering**
174
- ```python
175
- # Filter out disposed cases
176
- active_cases = [case for case in all_cases
177
- if case.status in [PENDING, SCHEDULED]]
178
-
179
- print(f"Day {day}: {len(active_cases)} active cases")
180
- # Example: Day 1: 10,000 active cases → Day 200: 6,500 active cases
181
- ```
182
-
183
- **CHECKPOINT 2: Case Attribute Updates**
184
- ```python
185
- for case in active_cases:
186
- # Update age (days since filing)
187
- case.age_days = (current_date - case.filed_date).days
188
-
189
- # Update readiness score based on stage and hearing history
190
- case.readiness_score = calculate_readiness(case)
191
-
192
- # Update days since last scheduled
193
- if case.last_scheduled_date:
194
- case.days_since_last_scheduled = (current_date - case.last_scheduled_date).days
195
- ```
196
-
197
- **CHECKPOINT 3: Ripeness Classification (Critical Filter)**
198
- ```python
199
- ripe_cases = []
200
- ripeness_stats = {"RIPE": 0, "UNRIPE_SUMMONS": 0, "UNRIPE_DEPENDENT": 0, "UNRIPE_PARTY": 0}
201
-
202
- for case in active_cases:
203
- ripeness = RipenessClassifier.classify(case, current_date)
204
- ripeness_stats[ripeness.status] += 1
205
-
206
- if ripeness.is_ripe():
207
- ripe_cases.append(case)
208
- else:
209
- case.bottleneck_reason = ripeness.reason
210
-
211
- print(f"Ripeness Filter: {len(active_cases)} → {len(ripe_cases)} cases")
212
- # Example: 6,500 active → 3,850 ripe cases (40.8% filtered out)
213
- ```
214
-
215
- **Ripeness Classification Logic**:
216
- ```python
217
- def classify(case, current_date):
218
- # Step 1: Check explicit bottlenecks in last hearing purpose
219
- if "SUMMONS" in case.last_hearing_purpose:
220
- return RipenessStatus.UNRIPE_SUMMONS
221
- if "STAY" in case.last_hearing_purpose:
222
- return RipenessStatus.UNRIPE_DEPENDENT
223
-
224
- # Step 2: Early admission cases likely waiting for service
225
- if case.current_stage == "ADMISSION" and case.hearing_count < 3:
226
- return RipenessStatus.UNRIPE_SUMMONS
227
-
228
- # Step 3: Detect stuck cases (many hearings, no progress)
229
- if case.hearing_count > 10 and case.avg_gap_days > 60:
230
- return RipenessStatus.UNRIPE_PARTY
231
-
232
- # Step 4: Advanced stages are usually ready
233
- if case.current_stage in ["ARGUMENTS", "EVIDENCE", "ORDERS / JUDGMENT"]:
234
- return RipenessStatus.RIPE
235
-
236
- # Step 5: Conservative default
237
- return RipenessStatus.RIPE
238
- ```
239
-
240
- **CHECKPOINT 4: Eligibility Check (Timing Constraints)**
241
- ```python
242
- eligible_cases = []
243
- for case in ripe_cases:
244
- # Check minimum 14-day gap between hearings
245
- if case.last_hearing_date:
246
- days_since_last = (current_date - case.last_hearing_date).days
247
- if days_since_last < MIN_GAP_BETWEEN_HEARINGS:
248
- continue
249
-
250
- eligible_cases.append(case)
251
-
252
- print(f"Eligibility Filter: {len(ripe_cases)} → {len(eligible_cases)} cases")
253
- # Example: 3,850 ripe → 3,200 eligible cases
254
- ```
255
-
256
- **CHECKPOINT 5: Priority Scoring (Policy Application)**
257
- ```python
258
- for case in eligible_cases:
259
- # Multi-factor priority calculation
260
- age_component = min(case.age_days / 365, 1.0) * 0.35
261
- readiness_component = case.readiness_score * 0.25
262
- urgency_component = (1.0 if case.is_urgent else 0.5) * 0.25
263
- boost_component = calculate_adjournment_boost(case) * 0.15
264
-
265
- case.priority_score = age_component + readiness_component + urgency_component + boost_component
266
-
267
- # Sort by priority (highest first)
268
- prioritized_cases = sorted(eligible_cases, key=lambda c: c.priority_score, reverse=True)
269
- ```
270
-
271
- **CHECKPOINT 6: Judge Overrides (Optional)**
272
- ```python
273
- if daily_overrides:
274
- # Apply ADD_CASE overrides (highest priority)
275
- for override in add_case_overrides:
276
- case_to_add = find_case_by_id(override.case_id)
277
- prioritized_cases.insert(override.new_position, case_to_add)
278
-
279
- # Apply REMOVE_CASE overrides
280
- for override in remove_case_overrides:
281
- prioritized_cases = [c for c in prioritized_cases if c.case_id != override.case_id]
282
-
283
- # Apply PRIORITY overrides
284
- for override in priority_overrides:
285
- case = find_case_in_list(prioritized_cases, override.case_id)
286
- case.priority_score = override.new_priority
287
-
288
- # Re-sort after priority changes
289
- prioritized_cases.sort(key=lambda c: c.priority_score, reverse=True)
290
- ```
291
-
292
- **CHECKPOINT 7: Multi-Courtroom Allocation**
293
- ```python
294
- # Load balancing algorithm
295
- courtroom_loads = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0}
296
- daily_schedule = {1: [], 2: [], 3: [], 4: [], 5: []}
297
-
298
- for case in prioritized_cases:
299
- # Find least loaded courtroom
300
- target_courtroom = min(courtroom_loads.items(), key=lambda x: x[1])[0]
301
-
302
- # Check capacity constraint
303
- if courtroom_loads[target_courtroom] >= DEFAULT_DAILY_CAPACITY:
304
- # All courtrooms at capacity, remaining cases unscheduled
305
- break
306
-
307
- # Assign case to courtroom
308
- daily_schedule[target_courtroom].append(case)
309
- courtroom_loads[target_courtroom] += 1
310
- case.last_scheduled_date = current_date
311
-
312
- total_scheduled = sum(len(cases) for cases in daily_schedule.values())
313
- print(f"Allocation: {total_scheduled} cases scheduled across 5 courtrooms")
314
- # Example: 703 cases scheduled (5 × 140-141 per courtroom)
315
- ```
316
-
317
- **CHECKPOINT 8: Generate Explanations**
318
- ```python
319
- explanations = {}
320
- for courtroom_id, cases in daily_schedule.items():
321
- for i, case in enumerate(cases):
322
- urgency_text = "HIGH URGENCY" if case.is_urgent else "standard urgency"
323
- stage_text = f"{case.current_stage.lower()} stage"
324
- assignment_text = f"assigned to Courtroom {courtroom_id}"
325
-
326
- explanations[case.case_id] = f"{urgency_text} | {stage_text} | {assignment_text}"
327
- ```
328
-
329
- ### Step 3.2: Case State Updates (After Each Day)
330
-
331
- ```python
332
- def update_case_states(cases, daily_result):
333
- for case in cases:
334
- if case.case_id in daily_result.scheduled_cases:
335
- # Case was scheduled today
336
- case.status = CaseStatus.SCHEDULED
337
- case.hearing_count += 1
338
- case.last_hearing_date = current_date
339
-
340
- # Simulate hearing outcome
341
- if random.random() < get_adjournment_rate(case.current_stage):
342
- # Case adjourned - stays in same stage
343
- case.history.append({
344
- "date": current_date,
345
- "outcome": "ADJOURNED",
346
- "next_hearing": current_date + timedelta(days=21)
347
- })
348
- else:
349
- # Case heard - may progress to next stage or dispose
350
- if should_progress_stage(case):
351
- case.current_stage = get_next_stage(case.current_stage)
352
-
353
- if should_dispose(case):
354
- case.status = CaseStatus.DISPOSED
355
- case.disposal_date = current_date
356
- else:
357
- # Case not scheduled today
358
- case.days_since_last_scheduled += 1
359
- ```
360
-
361
- ---
362
-
363
- ## Phase 4: Output Generation
364
-
365
- ### Step 4.1: Daily Cause List Generation
366
-
367
- **For each courtroom and each day**:
368
- ```python
369
- # Generate cause_list_courtroom_1_2024-01-15.csv
370
- def generate_daily_cause_list(courtroom_id, date, scheduled_cases):
371
- cause_list = []
372
- for i, case in enumerate(scheduled_cases):
373
- cause_list.append({
374
- "Date": date.strftime("%Y-%m-%d"),
375
- "Courtroom_ID": courtroom_id,
376
- "Case_ID": case.case_id,
377
- "Case_Type": case.case_type,
378
- "Stage": case.current_stage,
379
- "Purpose": "HEARING",
380
- "Sequence_Number": i + 1,
381
- "Explanation": explanations[case.case_id]
382
- })
383
-
384
- # Save to CSV
385
- df = pd.DataFrame(cause_list)
386
- df.to_csv(f"cause_list_courtroom_{courtroom_id}_{date.strftime('%Y-%m-%d')}.csv")
387
- ```
388
-
389
- **Example Output**:
390
- ```csv
391
- Date,Courtroom_ID,Case_ID,Case_Type,Stage,Purpose,Sequence_Number,Explanation
392
- 2024-01-15,1,C002847,CRP,ARGUMENTS,HEARING,1,"HIGH URGENCY | arguments stage | assigned to Courtroom 1"
393
- 2024-01-15,1,C005123,CA,ADMISSION,HEARING,2,"standard urgency | admission stage | assigned to Courtroom 1"
394
- 2024-01-15,1,C001456,RSA,EVIDENCE,HEARING,3,"standard urgency | evidence stage | assigned to Courtroom 1"
395
- ```
396
-
397
- ### Step 4.2: Daily Metrics Tracking
398
-
399
- ```python
400
- def record_daily_metrics(date, daily_result):
401
- metrics = {
402
- "date": date,
403
- "scheduled": daily_result.total_scheduled,
404
- "heard": calculate_heard_cases(daily_result),
405
- "adjourned": calculate_adjourned_cases(daily_result),
406
- "disposed": count_disposed_today(daily_result),
407
- "utilization": daily_result.total_scheduled / (COURTROOMS * DEFAULT_DAILY_CAPACITY),
408
- "gini_coefficient": calculate_gini_coefficient(courtroom_loads),
409
- "ripeness_filtered": daily_result.ripeness_filtered_count
410
- }
411
-
412
- # Append to metrics.csv
413
- append_to_csv("metrics.csv", metrics)
414
- ```
415
-
416
- **Example metrics.csv**:
417
- ```csv
418
- date,scheduled,heard,adjourned,disposed,utilization,gini_coefficient,ripeness_filtered
419
- 2024-01-15,703,430,273,12,0.931,0.245,287
420
- 2024-01-16,698,445,253,15,0.924,0.248,301
421
- 2024-01-17,701,421,280,18,0.928,0.251,294
422
- ```
423
-
424
- ---
425
-
426
- ## Phase 5: Analysis & Reporting
427
-
428
- ### Step 5.1: Simulation Summary Report
429
-
430
- **After all 384 days complete**:
431
- ```python
432
- def generate_simulation_report():
433
- total_hearings = sum(daily_metrics["scheduled"])
434
- total_heard = sum(daily_metrics["heard"])
435
- total_adjourned = sum(daily_metrics["adjourned"])
436
- total_disposed = count_disposed_cases()
437
-
438
- report = f"""
439
- SIMULATION SUMMARY
440
- Horizon: {start_date} → {end_date} ({simulation_days} days)
441
-
442
- Case Metrics:
443
- Initial cases: {initial_case_count:,}
444
- Cases disposed: {total_disposed:,} ({total_disposed/initial_case_count:.1%})
445
- Cases remaining: {initial_case_count - total_disposed:,}
446
-
447
- Hearing Metrics:
448
- Total hearings: {total_hearings:,}
449
- Heard: {total_heard:,} ({total_heard/total_hearings:.1%})
450
- Adjourned: {total_adjourned:,} ({total_adjourned/total_hearings:.1%})
451
-
452
- Efficiency Metrics:
453
- Disposal rate: {total_disposed/initial_case_count:.1%}
454
- Utilization: {avg_utilization:.1%}
455
- Gini coefficient: {avg_gini:.3f}
456
- Ripeness filtering: {avg_ripeness_filtered/avg_eligible:.1%}
457
- """
458
-
459
- with open("simulation_report.txt", "w") as f:
460
- f.write(report)
461
- ```
462
-
463
- ### Step 5.2: Performance Analysis
464
-
465
- ```python
466
- # Calculate key performance indicators
467
- disposal_rate = total_disposed / initial_cases # Target: >70%
468
- load_balance = calculate_gini_coefficient(courtroom_loads) # Target: <0.4
469
- case_coverage = scheduled_cases / eligible_cases # Target: >95%
470
- bottleneck_efficiency = ripeness_filtered / total_cases # Higher = better filtering
471
-
472
- print(f"PERFORMANCE RESULTS:")
473
- print(f"Disposal Rate: {disposal_rate:.1%} ({'✓' if disposal_rate > 0.70 else '✗'})")
474
- print(f"Load Balance: {load_balance:.3f} ({'✓' if load_balance < 0.40 else '✗'})")
475
- print(f"Case Coverage: {case_coverage:.1%} ({'✓' if case_coverage > 0.95 else '✗'})")
476
- ```
477
-
478
- ---
479
-
480
- ## Complete Example Walkthrough
481
-
482
- Let's trace a single case through the entire system:
483
-
484
- ### Case: C002847 (Civil Revision Petition)
485
-
486
- **Day 0: Case Generation**
487
- ```python
488
- case = Case(
489
- case_id="C002847",
490
- case_type="CRP",
491
- filed_date=date(2022, 03, 15),
492
- current_stage="ADMISSION",
493
- is_urgent=True, # Medical emergency
494
- hearing_count=0,
495
- last_hearing_date=None
496
- )
497
- ```
498
-
499
- **Day 1: First Scheduling Attempt (2023-12-29)**
500
- ```python
501
- # Checkpoint 1: Active? YES (status = PENDING)
502
- # Checkpoint 2: Updates
503
- case.age_days = 654 # Almost 2 years old
504
- case.readiness_score = 0.3 # Low (admission stage)
505
-
506
- # Checkpoint 3: Ripeness
507
- ripeness = classify(case, current_date) # UNRIPE_SUMMONS (admission stage, 0 hearings)
508
-
509
- # Result: FILTERED OUT (not scheduled)
510
- ```
511
-
512
- **Day 45: Second Attempt (2024-02-26)**
513
- ```python
514
- # Case now has 3 hearings, still in admission but making progress
515
- case.hearing_count = 3
516
- case.current_stage = "ADMISSION"
517
-
518
- # Checkpoint 3: Ripeness
519
- ripeness = classify(case, current_date) # RIPE (>3 hearings in admission)
520
-
521
- # Checkpoint 5: Priority Scoring
522
- age_component = min(689 / 365, 1.0) * 0.35 = 0.35
523
- readiness_component = 0.4 * 0.25 = 0.10
524
- urgency_component = 1.0 * 0.25 = 0.25 # HIGH URGENCY
525
- boost_component = 0.0 * 0.15 = 0.0
526
- case.priority_score = 0.70 # High priority
527
-
528
- # Checkpoint 7: Allocation
529
- # Assigned to Courtroom 1 (least loaded), Position 3
530
-
531
- # Result: SCHEDULED
532
- ```
533
-
534
- **Daily Cause List Entry**:
535
- ```csv
536
- 2024-02-26,1,C002847,CRP,ADMISSION,HEARING,3,"HIGH URGENCY | admission stage | assigned to Courtroom 1"
537
- ```
538
-
539
- **Hearing Outcome**:
540
- ```python
541
- # Simulated outcome: Case heard successfully, progresses to ARGUMENTS
542
- case.current_stage = "ARGUMENTS"
543
- case.hearing_count = 4
544
- case.last_hearing_date = date(2024, 2, 26)
545
- case.history.append({
546
- "date": date(2024, 2, 26),
547
- "outcome": "HEARD",
548
- "stage_progression": "ADMISSION → ARGUMENTS"
549
- })
550
- ```
551
-
552
- **Day 125: Arguments Stage (2024-06-15)**
553
- ```python
554
- # Case now in arguments, higher readiness
555
- case.current_stage = "ARGUMENTS"
556
- case.readiness_score = 0.8 # High (arguments stage)
557
-
558
- # Priority calculation
559
- age_component = 0.35 # Still max age
560
- readiness_component = 0.8 * 0.25 = 0.20 # Higher
561
- urgency_component = 0.25 # Still urgent
562
- boost_component = 0.0
563
- case.priority_score = 0.80 # Very high priority
564
-
565
- # Result: Scheduled in Position 1 (highest priority)
566
- ```
567
-
568
- **Final Disposal (Day 200: 2024-09-15)**
569
- ```python
570
- # After multiple hearings in arguments stage
571
- case.current_stage = "ORDERS / JUDGMENT"
572
- case.hearing_count = 12
573
-
574
- # Hearing outcome: Case disposed
575
- case.status = CaseStatus.DISPOSED
576
- case.disposal_date = date(2024, 9, 15)
577
- case.total_lifecycle_days = (disposal_date - filed_date).days # 549 days
578
- ```
579
-
580
- ---
581
-
582
- ## Data Flow Pipeline
583
-
584
- ### Complete Data Transformation Chain
585
-
586
- ```
587
- 1. Historical CSV Files (Raw Data)
588
- ├── ISDMHack_Case.csv (134,699 rows × 24 columns)
589
- └── ISDMHack_Hear.csv (739,670 rows × 31 columns)
590
-
591
- 2. Parameter Extraction (EDA Analysis)
592
- ├── case_type_distribution.json
593
- ├── stage_transition_probabilities.json
594
- ├── adjournment_rates_by_stage.json
595
- └── daily_capacity_statistics.json
596
-
597
- 3. Synthetic Case Generation
598
- └── cases.csv (10,000 rows × 15 columns)
599
- ├── Case_ID, Case_Type, Filed_Date
600
- ├── Current_Stage, Is_Urgent, Hearing_Count
601
- └── Last_Hearing_Date, Last_Purpose
602
-
603
- 4. Daily Scheduling Loop (384 iterations)
604
- ├── Day 1: cases.csv → ripeness_filter → 6,850 → eligible_filter → 5,200 → priority_sort → allocate → 703 scheduled
605
- ├── Day 2: updated_cases → ripeness_filter → 6,820 → eligible_filter → 5,180 → priority_sort → allocate → 698 scheduled
606
- └── Day 384: updated_cases → ripeness_filter → 2,100 → eligible_filter → 1,950 → priority_sort → allocate → 421 scheduled
607
-
608
- 5. Daily Output Generation (per day × 5 courtrooms)
609
- ├── cause_list_courtroom_1_2024-01-15.csv (140 rows)
610
- ├── cause_list_courtroom_2_2024-01-15.csv (141 rows)
611
- ├── cause_list_courtroom_3_2024-01-15.csv (140 rows)
612
- ├── cause_list_courtroom_4_2024-01-15.csv (141 rows)
613
- └── cause_list_courtroom_5_2024-01-15.csv (141 rows)
614
-
615
- 6. Aggregated Metrics
616
- ├── metrics.csv (384 rows × 8 columns)
617
- ├── simulation_report.txt (summary statistics)
618
- └── case_audit_trail.csv (complete hearing history)
619
- ```
620
-
621
- ### Data Volume at Each Stage
622
- - **Input**: 874K+ historical records
623
- - **Generated**: 10K synthetic cases
624
- - **Daily Processing**: ~6K cases evaluated daily
625
- - **Daily Output**: ~700 scheduled cases/day
626
- - **Total Output**: ~42K total cause list entries
627
- - **Final Reports**: 384 daily metrics + summary reports
628
-
629
- ---
630
-
631
- **Key Takeaways:**
632
- 1. **Ripeness filtering** removes 40.8% of cases daily (most critical efficiency gain)
633
- 2. **Priority scoring** ensures fairness while handling urgent cases
634
- 3. **Load balancing** achieves near-perfect distribution (Gini 0.002)
635
- 4. **Daily loop** processes 6,000+ cases in seconds with multi-objective optimization
636
- 5. **Complete audit trail** tracks every case decision for transparency
637
-
638
- ---
639
-
640
- **Last Updated**: 2025-11-25
641
- **Version**: 1.0
642
- **Status**: Production Ready
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
TECHNICAL_IMPLEMENTATION.md DELETED
@@ -1,658 +0,0 @@
1
- # Court Scheduling System - Technical Implementation Documentation
2
-
3
- **Complete Implementation Guide for Code4Change Hackathon Submission**
4
-
5
- ---
6
-
7
- ## Table of Contents
8
- 1. [System Overview](#system-overview)
9
- 2. [Architecture & Design](#architecture--design)
10
- 3. [Configuration Management](#configuration-management)
11
- 4. [Core Algorithms](#core-algorithms)
12
- 5. [Data Models](#data-models)
13
- 6. [Decision Logic](#decision-logic)
14
- 7. [Input/Output Specifications](#inputoutput-specifications)
15
- 8. [Deployment & Usage](#deployment--usage)
16
- 9. [Assumptions & Constraints](#assumptions--constraints)
17
-
18
- ---
19
-
20
- ## System Overview
21
-
22
- ### Purpose
23
- Production-ready court scheduling system for Karnataka High Court that optimizes daily cause lists across multiple courtrooms while ensuring fairness, efficiency, and judicial control.
24
-
25
- ### Key Achievements
26
- - **81.4% Disposal Rate** - Exceeds baseline expectations
27
- - **Perfect Load Balance** - Gini coefficient 0.002 across courtrooms
28
- - **97.7% Case Coverage** - Near-zero case abandonment
29
- - **Smart Bottleneck Detection** - 40.8% unripe cases filtered
30
- - **Complete Judge Control** - Override system with audit trails
31
-
32
- ### Technology Stack
33
- ```toml
34
- # Core Dependencies (from pyproject.toml)
35
- dependencies = [
36
- "pandas>=2.2", # Data manipulation
37
- "polars>=1.30", # High-performance data processing
38
- "plotly>=6.0", # Visualization
39
- "numpy>=2.0", # Numerical computing
40
- "simpy>=4.1", # Discrete event simulation
41
- "typer>=0.12", # CLI interface
42
- "pydantic>=2.0", # Data validation
43
- "scipy>=1.14", # Statistical algorithms
44
- "streamlit>=1.28", # Dashboard (future)
45
- ]
46
- ```
47
-
48
- ---
49
-
50
- ## Architecture & Design
51
-
52
- ### System Architecture
53
- ```
54
- Court Scheduling System
55
- ├── Core Domain Layer (scheduler/core/)
56
- │ ├── case.py # Case entity with lifecycle management
57
- │ ├── courtroom.py # Courtroom resource management
58
- │ ├── ripeness.py # Bottleneck detection classifier
59
- │ ├── policy.py # Scheduling policy interface
60
- │ └── algorithm.py # Main scheduling algorithm
61
- ├── Simulation Engine (scheduler/simulation/)
62
- │ ├── engine.py # Discrete event simulation
63
- │ ├── allocator.py # Multi-courtroom load balancer
64
- │ └── policies/ # FIFO, Age, Readiness policies
65
- ├── Data Management (scheduler/data/)
66
- │ ├── param_loader.py # Historical parameter loading
67
- │ ├── case_generator.py # Synthetic case generation
68
- │ └── config.py # System configuration
69
- ├── Control Systems (scheduler/control/)
70
- │ └── overrides.py # Judge override & audit system
71
- ├── Output Generation (scheduler/output/)
72
- │ └── cause_list.py # Daily cause list CSV generation
73
- └── Analysis Tools (src/, scripts/)
74
- ├── EDA pipeline # Historical data analysis
75
- └── Validation tools # Performance verification
76
- ```
77
-
78
- ### Design Principles
79
- 1. **Clean Architecture** - Domain-driven design with clear layer separation
80
- 2. **Production Ready** - Type hints, error handling, comprehensive logging
81
- 3. **Data-Driven** - All parameters extracted from 739K+ historical hearings
82
- 4. **Judge Autonomy** - Complete override system with audit trails
83
- 5. **Scalable** - Supports multiple courtrooms, thousands of cases
84
-
85
- ---
86
-
87
- ## Configuration Management
88
-
89
- ### Primary Configuration (scheduler/data/config.py)
90
- ```python
91
- # Court Operational Constants
92
- WORKING_DAYS_PER_YEAR = 192 # Karnataka HC calendar
93
- COURTROOMS = 5 # Number of courtrooms
94
- SIMULATION_DAYS = 384 # 2-year simulation period
95
-
96
- # Scheduling Constraints
97
- MIN_GAP_BETWEEN_HEARINGS = 14 # Days between hearings
98
- MAX_GAP_WITHOUT_ALERT = 90 # Alert threshold
99
- DEFAULT_DAILY_CAPACITY = 151 # Cases per courtroom per day
100
-
101
- # Case Type Distribution (from EDA)
102
- CASE_TYPE_DISTRIBUTION = {
103
- "CRP": 0.201, # Civil Revision Petition (most common)
104
- "CA": 0.200, # Civil Appeal
105
- "RSA": 0.196, # Regular Second Appeal
106
- "RFA": 0.167, # Regular First Appeal
107
- "CCC": 0.111, # Civil Contempt Petition
108
- "CP": 0.096, # Civil Petition
109
- "CMP": 0.028, # Civil Miscellaneous Petition
110
- }
111
-
112
- # Multi-objective Optimization Weights
113
- FAIRNESS_WEIGHT = 0.4 # Age-based fairness priority
114
- EFFICIENCY_WEIGHT = 0.3 # Readiness-based efficiency
115
- URGENCY_WEIGHT = 0.3 # High-priority case handling
116
- ```
117
-
118
- ### TOML Configuration Files
119
-
120
- #### Case Generation (configs/generate.sample.toml)
121
- ```toml
122
- n_cases = 10000
123
- start = "2022-01-01"
124
- end = "2023-12-31"
125
- output = "data/generated/cases.csv"
126
- seed = 42
127
- ```
128
-
129
- #### Simulation (configs/simulate.sample.toml)
130
- ```toml
131
- cases = "data/generated/cases.csv"
132
- days = 384
133
- policy = "readiness" # readiness|fifo|age
134
- seed = 42
135
- courtrooms = 5
136
- daily_capacity = 151
137
- ```
138
-
139
- #### Parameter Sweep (configs/parameter_sweep.toml)
140
- ```toml
141
- [sweep]
142
- simulation_days = 500
143
- policies = ["fifo", "age", "readiness"]
144
-
145
- # Dataset variations for comprehensive testing
146
- [[datasets]]
147
- name = "baseline"
148
- cases = 10000
149
- stage_mix_auto = true
150
- urgent_percentage = 0.10
151
-
152
- [[datasets]]
153
- name = "admission_heavy"
154
- cases = 10000
155
- stage_mix = { "ADMISSION" = 0.70, "ARGUMENTS" = 0.15 }
156
- urgent_percentage = 0.10
157
- ```
158
-
159
- ---
160
-
161
- ## Core Algorithms
162
-
163
- ### 1. Ripeness Classification System
164
-
165
- #### Purpose
166
- Identifies cases with substantive bottlenecks to prevent wasteful scheduling of unready cases.
167
-
168
- #### Algorithm (scheduler/core/ripeness.py)
169
- ```python
170
- def classify(case: Case, current_date: date) -> RipenessStatus:
171
- """5-step hierarchical classifier"""
172
-
173
- # Step 1: Check hearing purpose for explicit bottlenecks
174
- if "SUMMONS" in last_hearing_purpose or "NOTICE" in last_hearing_purpose:
175
- return UNRIPE_SUMMONS
176
- if "STAY" in last_hearing_purpose or "PENDING" in last_hearing_purpose:
177
- return UNRIPE_DEPENDENT
178
-
179
- # Step 2: Stage analysis - Early admission cases likely unripe
180
- if current_stage == "ADMISSION" and hearing_count < 3:
181
- return UNRIPE_SUMMONS
182
-
183
- # Step 3: Detect "stuck" cases (many hearings, no progress)
184
- if hearing_count > 10 and avg_gap_days > 60:
185
- return UNRIPE_PARTY
186
-
187
- # Step 4: Stage-based classification
188
- if current_stage in ["ARGUMENTS", "EVIDENCE", "ORDERS / JUDGMENT"]:
189
- return RIPE
190
-
191
- # Step 5: Conservative default
192
- return RIPE
193
- ```
194
-
195
- #### Ripeness Statuses
196
- | Status | Meaning | Impact |
197
- |--------|---------|---------|
198
- | `RIPE` | Ready for hearing | Eligible for scheduling |
199
- | `UNRIPE_SUMMONS` | Awaiting summons service | Blocked until served |
200
- | `UNRIPE_DEPENDENT` | Waiting for dependent case | Blocked until resolved |
201
- | `UNRIPE_PARTY` | Party/lawyer unavailable | Blocked until responsive |
202
-
203
- ### 2. Multi-Courtroom Load Balancing
204
-
205
- #### Algorithm (scheduler/simulation/allocator.py)
206
- ```python
207
- def allocate(cases: List[Case], current_date: date) -> Dict[str, int]:
208
- """Dynamic load-balanced allocation"""
209
-
210
- allocation = {}
211
- courtroom_loads = {room.id: room.get_current_load() for room in courtrooms}
212
-
213
- for case in cases:
214
- # Find least-loaded courtroom
215
- target_room = min(courtroom_loads.items(), key=lambda x: x[1])
216
-
217
- # Assign case and update load
218
- allocation[case.case_id] = target_room[0]
219
- courtroom_loads[target_room[0]] += 1
220
-
221
- # Respect capacity constraints
222
- if courtroom_loads[target_room[0]] >= room.daily_capacity:
223
- break
224
-
225
- return allocation
226
- ```
227
-
228
- #### Load Balancing Results
229
- - **Perfect Distribution**: Gini coefficient 0.002
230
- - **Courtroom Loads**: 67.6-68.3 cases/day (±0.5% variance)
231
- - **Zero Capacity Violations**: All constraints respected
232
-
233
- ### 3. Intelligent Priority Scheduling
234
-
235
- #### Readiness-Based Policy (scheduler/simulation/policies/readiness.py)
236
- ```python
237
- def prioritize(cases: List[Case], current_date: date) -> List[Case]:
238
- """Multi-factor priority calculation"""
239
-
240
- for case in cases:
241
- # Age component (35%) - Fairness
242
- age_score = min(case.age_days / 365, 1.0) * 0.35
243
-
244
- # Readiness component (25%) - Efficiency
245
- readiness_score = case.compute_readiness_score() * 0.25
246
-
247
- # Urgency component (25%) - Critical cases
248
- urgency_score = (1.0 if case.is_urgent else 0.5) * 0.25
249
-
250
- # Adjournment boost (15%) - Prevent indefinite postponement
251
- boost_score = case.get_adjournment_boost() * 0.15
252
-
253
- case.priority_score = age_score + readiness_score + urgency_score + boost_score
254
-
255
- return sorted(cases, key=lambda c: c.priority_score, reverse=True)
256
- ```
257
-
258
- #### Adjournment Boost Calculation
259
- ```python
260
- def get_adjournment_boost(self) -> float:
261
- """Exponential decay boost for recently adjourned cases"""
262
- if not self.last_hearing_date:
263
- return 0.0
264
-
265
- days_since = (current_date - self.last_hearing_date).days
266
- return math.exp(-days_since / 21) # 21-day half-life
267
- ```
268
-
269
- ### 4. Judge Override System
270
-
271
- #### Override Types (scheduler/control/overrides.py)
272
- ```python
273
- class OverrideType(Enum):
274
- RIPENESS = "ripeness" # Override ripeness classification
275
- PRIORITY = "priority" # Adjust case priority
276
- ADD_CASE = "add_case" # Manually add case to list
277
- REMOVE_CASE = "remove_case" # Remove case from list
278
- REORDER = "reorder" # Change hearing sequence
279
- CAPACITY = "capacity" # Adjust daily capacity
280
- ```
281
-
282
- #### Validation Logic
283
- ```python
284
- def validate(self, override: Override) -> bool:
285
- """Comprehensive override validation"""
286
-
287
- if override.override_type == OverrideType.RIPENESS:
288
- return self.validate_ripeness_override(override)
289
- elif override.override_type == OverrideType.CAPACITY:
290
- return self.validate_capacity_override(override)
291
- elif override.override_type == OverrideType.PRIORITY:
292
- return 0 <= override.new_priority <= 1.0
293
-
294
- return True
295
- ```
296
-
297
- ---
298
-
299
- ## Data Models
300
-
301
- ### Core Case Entity (scheduler/core/case.py)
302
- ```python
303
- @dataclass
304
- class Case:
305
- # Core Identification
306
- case_id: str
307
- case_type: str # CRP, CA, RSA, etc.
308
- filed_date: date
309
-
310
- # Lifecycle Tracking
311
- current_stage: str = "ADMISSION"
312
- status: CaseStatus = CaseStatus.PENDING
313
- hearing_count: int = 0
314
- last_hearing_date: Optional[date] = None
315
-
316
- # Scheduling Attributes
317
- priority_score: float = 0.0
318
- readiness_score: float = 0.0
319
- is_urgent: bool = False
320
-
321
- # Ripeness Classification
322
- ripeness_status: str = "UNKNOWN"
323
- bottleneck_reason: Optional[str] = None
324
- ripeness_updated_at: Optional[datetime] = None
325
-
326
- # No-Case-Left-Behind Tracking
327
- last_scheduled_date: Optional[date] = None
328
- days_since_last_scheduled: int = 0
329
-
330
- # Audit Trail
331
- history: List[dict] = field(default_factory=list)
332
- ```
333
-
334
- ### Override Entity
335
- ```python
336
- @dataclass
337
- class Override:
338
- # Core Fields
339
- override_id: str
340
- override_type: OverrideType
341
- case_id: str
342
- judge_id: str
343
- timestamp: datetime
344
- reason: str = ""
345
-
346
- # Type-Specific Fields
347
- make_ripe: Optional[bool] = None # For RIPENESS
348
- new_position: Optional[int] = None # For REORDER/ADD_CASE
349
- new_priority: Optional[float] = None # For PRIORITY
350
- new_capacity: Optional[int] = None # For CAPACITY
351
- ```
352
-
353
- ### Scheduling Result
354
- ```python
355
- @dataclass
356
- class SchedulingResult:
357
- # Core Output
358
- scheduled_cases: Dict[int, List[Case]] # courtroom_id -> cases
359
-
360
- # Transparency
361
- explanations: Dict[str, SchedulingExplanation]
362
- applied_overrides: List[Override]
363
-
364
- # Diagnostics
365
- unscheduled_cases: List[Tuple[Case, str]]
366
- ripeness_filtered: int
367
- capacity_limited: int
368
-
369
- # Metadata
370
- scheduling_date: date
371
- policy_used: str
372
- total_scheduled: int
373
- ```
374
-
375
- ---
376
-
377
- ## Decision Logic
378
-
379
- ### Daily Scheduling Sequence
380
- ```python
381
- def schedule_day(cases, courtrooms, current_date, overrides=None):
382
- """Complete daily scheduling algorithm"""
383
-
384
- # CHECKPOINT 1: Filter disposed cases
385
- active_cases = [c for c in cases if c.status != DISPOSED]
386
-
387
- # CHECKPOINT 2: Update case attributes
388
- for case in active_cases:
389
- case.update_age(current_date)
390
- case.compute_readiness_score()
391
-
392
- # CHECKPOINT 3: Ripeness filtering (CRITICAL)
393
- ripe_cases = []
394
- for case in active_cases:
395
- ripeness = RipenessClassifier.classify(case, current_date)
396
- if ripeness.is_ripe():
397
- ripe_cases.append(case)
398
- else:
399
- # Track filtered cases for metrics
400
- unripe_filtered_count += 1
401
-
402
- # CHECKPOINT 4: Eligibility check (MIN_GAP_BETWEEN_HEARINGS)
403
- eligible_cases = [c for c in ripe_cases
404
- if c.is_ready_for_scheduling(MIN_GAP_DAYS)]
405
-
406
- # CHECKPOINT 5: Apply scheduling policy
407
- prioritized_cases = policy.prioritize(eligible_cases, current_date)
408
-
409
- # CHECKPOINT 6: Apply judge overrides
410
- if overrides:
411
- prioritized_cases = apply_overrides(prioritized_cases, overrides)
412
-
413
- # CHECKPOINT 7: Allocate to courtrooms
414
- allocation = allocator.allocate(prioritized_cases, current_date)
415
-
416
- # CHECKPOINT 8: Generate explanations
417
- explanations = generate_explanations(allocation, unscheduled_cases)
418
-
419
- return SchedulingResult(...)
420
- ```
421
-
422
- ### Override Application Logic
423
- ```python
424
- def apply_overrides(cases: List[Case], overrides: List[Override]) -> List[Case]:
425
- """Apply judge overrides in priority order"""
426
-
427
- result = cases.copy()
428
-
429
- # 1. Apply ADD_CASE overrides (highest priority)
430
- for override in [o for o in overrides if o.override_type == ADD_CASE]:
431
- case_to_add = find_case_by_id(override.case_id)
432
- if case_to_add and case_to_add not in result:
433
- insert_position = override.new_position or 0
434
- result.insert(insert_position, case_to_add)
435
-
436
- # 2. Apply REMOVE_CASE overrides
437
- for override in [o for o in overrides if o.override_type == REMOVE_CASE]:
438
- result = [c for c in result if c.case_id != override.case_id]
439
-
440
- # 3. Apply PRIORITY overrides
441
- for override in [o for o in overrides if o.override_type == PRIORITY]:
442
- case = find_case_in_list(result, override.case_id)
443
- if case and override.new_priority is not None:
444
- case.priority_score = override.new_priority
445
-
446
- # 4. Re-sort by updated priorities
447
- result.sort(key=lambda c: c.priority_score, reverse=True)
448
-
449
- # 5. Apply REORDER overrides (final positioning)
450
- for override in [o for o in overrides if o.override_type == REORDER]:
451
- case = find_case_in_list(result, override.case_id)
452
- if case and override.new_position is not None:
453
- result.remove(case)
454
- result.insert(override.new_position, case)
455
-
456
- return result
457
- ```
458
-
459
- ---
460
-
461
- ## Input/Output Specifications
462
-
463
- ### Input Data Requirements
464
-
465
- #### Historical Data (for parameter extraction)
466
- - **ISDMHack_Case.csv**: 134,699 cases with 24 attributes
467
- - **ISDMHack_Hear.csv**: 739,670 hearings with 31 attributes
468
- - Required fields: Case_ID, Type, Filed_Date, Current_Stage, Hearing_Date, Purpose_Of_Hearing
469
-
470
- #### Generated Case Data (for simulation)
471
- ```python
472
- # Case generation schema
473
- Case(
474
- case_id="C{:06d}", # C000001, C000002, etc.
475
- case_type=random_choice(types), # CRP, CA, RSA, etc.
476
- filed_date=random_date(range), # Within specified period
477
- current_stage=stage_from_mix, # Based on distribution
478
- is_urgent=random_bool(0.05), # 5% urgent cases
479
- last_hearing_purpose=purpose, # For ripeness classification
480
- )
481
- ```
482
-
483
- ### Output Specifications
484
-
485
- #### Daily Cause Lists (CSV)
486
- ```csv
487
- Date,Courtroom_ID,Case_ID,Case_Type,Stage,Purpose,Sequence_Number,Explanation
488
- 2024-01-15,1,C000123,CRP,ARGUMENTS,HEARING,1,"HIGH URGENCY | ready for orders/judgment | assigned to Courtroom 1"
489
- 2024-01-15,1,C000456,CA,ADMISSION,HEARING,2,"standard urgency | admission stage | assigned to Courtroom 1"
490
- ```
491
-
492
- #### Simulation Report (report.txt)
493
- ```
494
- SIMULATION SUMMARY
495
- Horizon: 2023-12-29 → 2024-03-21 (60 days)
496
-
497
- Hearing Metrics:
498
- Total: 42,193
499
- Heard: 26,245 (62.2%)
500
- Adjourned: 15,948 (37.8%)
501
-
502
- Disposal Metrics:
503
- Cases disposed: 4,401 (44.0%)
504
- Gini coefficient: 0.255
505
-
506
- Efficiency:
507
- Utilization: 93.1%
508
- Avg hearings/day: 703.2
509
- ```
510
-
511
- #### Metrics CSV (metrics.csv)
512
- ```csv
513
- date,scheduled,heard,adjourned,disposed,utilization,gini_coefficient,ripeness_filtered
514
- 2024-01-15,703,430,273,12,0.931,0.245,287
515
- 2024-01-16,698,445,253,15,0.924,0.248,301
516
- ```
517
-
518
- ---
519
-
520
- ## Deployment & Usage
521
-
522
- ### Installation
523
- ```bash
524
- # Clone repository
525
- git clone git@github.com:RoyAalekh/hackathon_code4change.git
526
- cd hackathon_code4change
527
-
528
- # Setup environment
529
- uv sync
530
-
531
- # Verify installation
532
- uv run court-scheduler --help
533
- ```
534
-
535
- ### CLI Commands
536
-
537
- #### Quick Start
538
- ```bash
539
- # Generate test cases
540
- uv run court-scheduler generate --cases 10000 --output data/cases.csv
541
-
542
- # Run simulation
543
- uv run court-scheduler simulate --cases data/cases.csv --days 384
544
-
545
- # Full pipeline
546
- uv run court-scheduler workflow --cases 10000 --days 384
547
- ```
548
-
549
- #### Advanced Usage
550
- ```bash
551
- # Custom policy simulation
552
- uv run court-scheduler simulate \
553
- --cases data/cases.csv \
554
- --days 384 \
555
- --policy readiness \
556
- --seed 42 \
557
- --log-dir data/sim_runs/custom
558
-
559
- # Parameter sweep comparison
560
- uv run python scripts/compare_policies.py
561
-
562
- # Generate cause lists
563
- uv run python scripts/generate_all_cause_lists.py
564
- ```
565
-
566
- ### Configuration Override
567
- ```bash
568
- # Use custom config file
569
- uv run court-scheduler simulate --config configs/custom.toml
570
-
571
- # Override specific parameters
572
- uv run court-scheduler simulate \
573
- --cases data/cases.csv \
574
- --days 60 \
575
- --courtrooms 3 \
576
- --daily-capacity 100
577
- ```
578
-
579
- ---
580
-
581
- ## Assumptions & Constraints
582
-
583
- ### Operational Assumptions
584
-
585
- #### Court Operations
586
- 1. **Working Days**: 192 days/year (Karnataka HC calendar)
587
- 2. **Courtroom Availability**: 5 courtrooms, single-judge benches
588
- 3. **Daily Capacity**: 151 hearings/courtroom/day (from historical data)
589
- 4. **Hearing Duration**: Not modeled explicitly (capacity is count-based)
590
-
591
- #### Case Dynamics
592
- 1. **Filing Rate**: Steady-state assumption (disposal ≈ filing)
593
- 2. **Stage Progression**: Markovian (history-independent transitions)
594
- 3. **Adjournment Rate**: 31-38% depending on stage and case type
595
- 4. **Case Independence**: No inter-case dependencies modeled
596
-
597
- #### Scheduling Constraints
598
- 1. **Minimum Gap**: 14 days between hearings (same case)
599
- 2. **Maximum Gap**: 90 days triggers alert
600
- 3. **Ripeness Re-evaluation**: Every 7 days
601
- 4. **Judge Availability**: Assumed 100% (no vacation modeling)
602
-
603
- ### Technical Constraints
604
-
605
- #### Performance Limits
606
- - **Case Volume**: Tested up to 15,000 cases
607
- - **Simulation Period**: Up to 500 working days
608
- - **Memory Usage**: <500MB for typical workload
609
- - **Execution Time**: ~30 seconds for 10K cases, 384 days
610
-
611
- #### Data Limitations
612
- - **No Real-time Integration**: Batch processing only
613
- - **Synthetic Ripeness Data**: Real purpose-of-hearing analysis needed
614
- - **Fixed Parameters**: No dynamic learning from outcomes
615
- - **Single Court Model**: No multi-court coordination
616
-
617
- ### Validation Boundaries
618
-
619
- #### Tested Scenarios
620
- - **Baseline**: 10,000 cases, balanced distribution
621
- - **Admission Heavy**: 70% early-stage cases (backlog scenario)
622
- - **Advanced Heavy**: 70% late-stage cases (efficient court)
623
- - **High Urgency**: 20% urgent cases (medical/custodial heavy)
624
- - **Large Backlog**: 15,000 cases (capacity stress test)
625
-
626
- #### Success Criteria Met
627
- - **Disposal Rate**: 81.4% achieved (target: >70%)
628
- - **Load Balance**: Gini 0.002 (target: <0.4)
629
- - **Case Coverage**: 97.7% (target: >95%)
630
- - **Utilization**: 45% (realistic given constraints)
631
-
632
- ---
633
-
634
- ## Performance Benchmarks
635
-
636
- ### Execution Performance
637
- - **EDA Pipeline**: ~2 minutes for 739K hearings
638
- - **Case Generation**: ~5 seconds for 10K cases
639
- - **2-Year Simulation**: ~30 seconds for 10K cases
640
- - **Cause List Generation**: ~10 seconds for 42K hearings
641
-
642
- ### Algorithm Efficiency
643
- - **Ripeness Classification**: O(n) per case, O(n²) total with re-evaluation
644
- - **Load Balancing**: O(n log k) where n=cases, k=courtrooms
645
- - **Priority Calculation**: O(n log n) sorting overhead
646
- - **Override Processing**: O(m·n) where m=overrides, n=cases
647
-
648
- ### Memory Usage
649
- - **Case Objects**: ~1KB per case (10K cases = 10MB)
650
- - **Simulation State**: ~50MB working memory
651
- - **Output Generation**: ~100MB for full reports
652
- - **Total Peak**: <500MB for largest tested scenarios
653
-
654
- ---
655
-
656
- **Last Updated**: 2025-11-25
657
- **Version**: 1.0
658
- **Status**: Production Ready
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/CONFIGURATION.md CHANGED
@@ -92,16 +92,15 @@ The codebase uses a layered configuration approach separating concerns by domain
92
 
93
  ```
94
  Pipeline Execution:
95
- ├── PipelineConfig (workflow orchestration)
96
- │ ├── RLTrainingConfig (training hyperparameters)
97
- │ └── Data generation params
98
-
99
- └── Per-Policy Simulation:
100
- ├── CourtSimConfig (simulation settings)
101
- │ └── rl_agent_path (from training output)
102
-
103
- └── Policy instantiation:
104
- └── PolicyConfig (policy-specific settings)
105
  ```
106
 
107
  ## Design Principles
@@ -174,21 +173,21 @@ policy = RLPolicy(agent_path=model_path, policy_config=strict_policy)
174
  ## Validation Rules
175
 
176
  All config classes validate in `__post_init__`:
177
- - Value ranges (0 < learning_rate 1)
178
  - Type consistency (convert strings to Path)
179
- - Cross-parameter constraints (max_gap min_gap)
180
  - Required file existence (rl_agent_path must exist)
181
 
182
  ## Anti-Patterns
183
 
184
  **DON'T**:
185
- - Hardcode magic numbers in algorithms
186
- - Use module-level mutable globals
187
- - Mix domain constants with tunable parameters
188
- - Create "god config" with everything in one class
189
 
190
  **DO**:
191
- - Separate by lifecycle and ownership
192
- - Validate early (constructor time)
193
- - Use dataclasses for immutability
194
- - Provide sensible defaults with named presets
 
92
 
93
  ```
94
  Pipeline Execution:
95
+ |-- PipelineConfig (workflow orchestration)
96
+ |-- RLTrainingConfig (training hyperparameters)
97
+ |-- Data generation params
98
+
99
+ |-- Per-Policy Simulation:
100
+ |-- CourtSimConfig (simulation settings)
101
+ |-- rl_agent_path (from training output)
102
+ |-- Policy instantiation:
103
+ |-- PolicyConfig (policy-specific settings)
 
104
  ```
105
 
106
  ## Design Principles
 
173
  ## Validation Rules
174
 
175
  All config classes validate in `__post_init__`:
176
+ - Value ranges (0 < learning_rate <= 1)
177
  - Type consistency (convert strings to Path)
178
+ - Cross-parameter constraints (max_gap >= min_gap)
179
  - Required file existence (rl_agent_path must exist)
180
 
181
  ## Anti-Patterns
182
 
183
  **DON'T**:
184
+ - Hardcode magic numbers in algorithms
185
+ - Use module-level mutable globals
186
+ - Mix domain constants with tunable parameters
187
+ - Create "god config" with everything in one class
188
 
189
  **DO**:
190
+ - Separate by lifecycle and ownership
191
+ - Validate early (constructor time)
192
+ - Use dataclasses for immutability
193
+ - Provide sensible defaults with named presets
reports/codebase_analysis_2024-07-01.md DELETED
@@ -1,30 +0,0 @@
1
- # Court Scheduling System – Comprehensive Codebase Analysis
2
-
3
- ## Architecture Snapshot
4
- - **Unified CLI workflows**: `court_scheduler/cli.py` orchestrates EDA, synthetic case generation, and simulation runs with progress feedback, wiring together the data pipeline and scheduler from one entry point.【F:court_scheduler/cli.py†L1-L200】
5
- - **Scheduling core**: `SchedulingAlgorithm` remains the central coordinator for ripeness filtering, eligibility checks, prioritization, allocation, and explainability output via `SchedulingResult` dataclass.【F:scheduler/core/algorithm.py†L1-L200】
6
- - **EDA pipeline**: `src/run_eda.py` drives three stages—load/clean, exploratory visuals, and parameter extraction—by calling `eda_load_clean`, `eda_exploration`, and `eda_parameters` in sequence.【F:src/run_eda.py†L1-L23】 `eda_exploration` loads cleaned Parquet data, converts to pandas, and produces interactive Plotly HTML dashboards and CSV summaries for case mix, temporal trends, stage transitions, and gap distributions.【F:src/eda_exploration.py†L1-L120】
7
- - **Synthetic data + parameter sources**: `scheduler.data.case_generator` samples stage mixes (optionally from EDA-derived parameters), case types, and working-day seasonality to produce `Case` objects compatible with the scheduler and RL training.【F:scheduler/data/case_generator.py†L1-L120】
8
- - **RL training stack**: `rl/training.py` wraps a lightweight simulation to train the tabular Q-learning `TabularQAgent`, generating fresh cases per episode and stepping day-by-day to update rewards; `rl/simple_agent.py` encodes cases into 6-D discrete states with epsilon-greedy Q updates and reward shaping for urgency, ripeness, adjournments, and progression.【F:rl/training.py†L1-L200】【F:rl/simple_agent.py†L1-L200】
9
-
10
- ## Strengths
11
- - **End-to-end operability**: The Typer CLI offers cohesive commands for EDA, data generation, and simulation, lowering friction for analysts and operators running the whole workflow.【F:court_scheduler/cli.py†L1-L200】
12
- - **Transparent scheduling outputs**: `SchedulingResult` captures scheduled cases, unscheduled reasons, ripeness filtering counts, applied overrides, and explanations, supporting audits and downstream dashboards.【F:scheduler/core/algorithm.py†L32-L200】
13
- - **Reproducible EDA artifacts**: The EDA module saves HTML plots and CSV summaries (e.g., stage durations, transitions) and writes them to versioned run directories, enabling offline review and parameter reuse.【F:src/eda_exploration.py†L1-L120】
14
- - **Configurable RL experiments**: The RL pipeline isolates hyperparameters in dataclasses and regenerates cases per episode, making it easy to tweak learning rates, epsilon decay, and episode lengths without touching training logic.【F:rl/training.py†L140-L200】【F:rl/simple_agent.py†L41-L160】
15
-
16
- ## Risks and Quality Gaps
17
- 1. **Override validation mutates inputs and leaks state across runs**. Invalid overrides are removed from the caller’s list and logged as `(None, reason)` while priority overrides set `_priority_override` on shared `Case` objects without cleanup, so repeated scheduling can inherit stale manual priorities and unscheduled entries with `None` cases complicate consumers.【F:scheduler/core/algorithm.py†L136-L200】
18
- 2. **Ripeness defaults to optimistic**. When no bottleneck keyword or stage hint fires, the classifier returns `RIPE`, and admission-stage cases with ≥3 hearings are marked ripe without service/compliance proof, risking overscheduling unready matters.【F:scheduler/core/ripeness.py†L54-L129】
19
- 3. **Eligibility omits calendar blocks and per-case gap rules**. `_filter_eligible` enforces only the global minimum gap, ignoring judge or courtroom block dates and any per-case gap overrides, so schedules may violate availability assumptions despite capacity adjustments.【F:scheduler/core/algorithm.py†L129-L200】【F:scheduler/control/overrides.py†L103-L169】
20
- 4. **EDA scaling risks**. `eda_exploration` converts full Parquet datasets to pandas DataFrames before plotting, which can exhaust memory on larger extracts and lacks sampling/downcasting safeguards; renderer defaults to "browser", which can fail in headless batch environments.【F:src/eda_exploration.py†L38-L120】
21
- 5. **Training–production gap for RL**. The Q-learning loop trains on a simplified simulation that bypasses the production `SchedulingAlgorithm`, ripeness classifier, and courtroom capacity logic, so learned policies may not transfer. Rewards are computed via a freshly instantiated agent inside the environment, divorcing reward shaping from the training agent’s evolving parameters.【F:rl/training.py†L19-L138】【F:rl/simple_agent.py†L188-L200】
22
- 6. **Configuration robustness**. `get_latest_params_dir` still raises when no versioned params directory exists, blocking fresh environments from running simulations or RL without manual setup or bundled defaults.【F:scheduler/data/config.py†L1-L37】
23
-
24
- ## Recommendations
25
- - Make override handling side-effect-free: validate into separate structures, preserve original override lists for auditing, and clear any temporary priority attributes after use.【F:scheduler/core/algorithm.py†L136-L200】
26
- - Require affirmative ripeness evidence or add an `UNKNOWN` state so ambiguous cases don’t default to `RIPE`; integrate service/compliance indicators and stage-specific checks before scheduling.【F:scheduler/core/ripeness.py†L54-L129】
27
- - Enforce calendar constraints and per-case gap overrides in eligibility and allocation to avoid scheduling on blocked dates or ignoring individualized spacing rules.【F:scheduler/core/algorithm.py†L129-L200】【F:scheduler/control/overrides.py†L103-L169】
28
- - Harden EDA for large datasets: stream or sample before `to_pandas`, allow a static image renderer in headless runs, and gate expensive plots behind flags to keep CLI runs reliable.【F:src/eda_exploration.py†L38-L120】
29
- - Align RL training with the production scheduler: reuse `SchedulingAlgorithm` or its readiness/ripeness filters inside the training environment, and compute rewards without re-instantiating agents so learning signals match deployed policy behavior.【F:rl/training.py†L19-L138】【F:rl/simple_agent.py†L188-L200】
30
- - Provide a fallback baseline parameters bundle or clearer setup guidance in `get_latest_params_dir` so simulations and RL can run out of the box.【F:scheduler/data/config.py†L1-L37】
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test_phase1.py DELETED
@@ -1,326 +0,0 @@
1
- """Phase 1 Validation Script - Test Foundation Components.
2
-
3
- This script validates that all Phase 1 components work correctly:
4
- - Configuration loading
5
- - Parameter loading from EDA outputs
6
- - Core entities (Case, Courtroom, Judge, Hearing)
7
- - Calendar utility
8
-
9
- Run this with: uv run python test_phase1.py
10
- """
11
-
12
- from datetime import date, timedelta
13
-
14
- print("=" * 70)
15
- print("PHASE 1 VALIDATION - Court Scheduler Foundation")
16
- print("=" * 70)
17
-
18
- # Test 1: Configuration
19
- print("\n[1/6] Testing Configuration...")
20
- try:
21
- from scheduler.data.config import (
22
- WORKING_DAYS_PER_YEAR,
23
- COURTROOMS,
24
- SIMULATION_YEARS,
25
- CASE_TYPE_DISTRIBUTION,
26
- STAGES,
27
- FAIRNESS_WEIGHT,
28
- EFFICIENCY_WEIGHT,
29
- URGENCY_WEIGHT,
30
- )
31
-
32
- print(f" Working days/year: {WORKING_DAYS_PER_YEAR}")
33
- print(f" Courtrooms: {COURTROOMS}")
34
- print(f" Simulation years: {SIMULATION_YEARS}")
35
- print(f" Case types: {len(CASE_TYPE_DISTRIBUTION)}")
36
- print(f" Stages: {len(STAGES)}")
37
- print(f" Objective weights: Fairness={FAIRNESS_WEIGHT}, "
38
- f"Efficiency={EFFICIENCY_WEIGHT}, "
39
- f"Urgency={URGENCY_WEIGHT}")
40
- print(" ✓ Configuration loaded successfully")
41
- except Exception as e:
42
- print(f" ✗ Configuration failed: {e}")
43
- exit(1)
44
-
45
- # Test 2: Parameter Loader
46
- print("\n[2/6] Testing Parameter Loader...")
47
- try:
48
- from scheduler.data.param_loader import load_parameters
49
-
50
- params = load_parameters()
51
-
52
- # Test transition probability
53
- prob = params.get_transition_prob("ADMISSION", "ORDERS / JUDGMENT")
54
- print(f" P(ADMISSION → ORDERS/JUDGMENT): {prob:.4f}")
55
-
56
- # Test stage duration
57
- duration = params.get_stage_duration("ADMISSION", "median")
58
- print(f" ADMISSION median duration: {duration:.1f} days")
59
-
60
- # Test capacity
61
- print(f" Daily capacity (median): {params.daily_capacity_median}")
62
-
63
- # Test adjournment rate
64
- adj_rate = params.get_adjournment_prob("ADMISSION", "RSA")
65
- print(f" RSA@ADMISSION adjournment rate: {adj_rate:.3f}")
66
-
67
- print(" ✓ Parameter loader working correctly")
68
- except Exception as e:
69
- print(f" ✗ Parameter loader failed: {e}")
70
- print(f" Note: This requires EDA outputs to exist in reports/figures/")
71
- # Don't exit, continue with other tests
72
-
73
- # Test 3: Case Entity
74
- print("\n[3/6] Testing Case Entity...")
75
- try:
76
- from scheduler.core.case import Case, CaseStatus
77
-
78
- # Create a sample case
79
- case = Case(
80
- case_id="RSA/2025/001",
81
- case_type="RSA",
82
- filed_date=date(2025, 1, 15),
83
- current_stage="ADMISSION",
84
- is_urgent=False,
85
- )
86
-
87
- print(f" Created case: {case.case_id}")
88
- print(f" Type: {case.case_type}, Stage: {case.current_stage}")
89
- print(f" Status: {case.status.value}")
90
-
91
- # Test methods
92
- case.update_age(date(2025, 3, 1))
93
- print(f" Age after 45 days: {case.age_days} days")
94
-
95
- # Record a hearing
96
- case.record_hearing(date(2025, 2, 1), was_heard=True, outcome="Heard")
97
- print(f" Hearings recorded: {case.hearing_count}")
98
-
99
- # Compute priority
100
- priority = case.get_priority_score()
101
- print(f" Priority score: {priority:.3f}")
102
-
103
- print(" ✓ Case entity working correctly")
104
- except Exception as e:
105
- print(f" ✗ Case entity failed: {e}")
106
- exit(1)
107
-
108
- # Test 4: Courtroom Entity
109
- print("\n[4/6] Testing Courtroom Entity...")
110
- try:
111
- from scheduler.core.courtroom import Courtroom
112
-
113
- # Create a courtroom
114
- courtroom = Courtroom(
115
- courtroom_id=1,
116
- judge_id="J001",
117
- daily_capacity=151,
118
- )
119
-
120
- print(f" Created courtroom {courtroom.courtroom_id} with Judge {courtroom.judge_id}")
121
- print(f" Daily capacity: {courtroom.daily_capacity}")
122
-
123
- # Schedule some cases
124
- test_date = date(2025, 2, 1)
125
- case1_id = "RSA/2025/001"
126
- case2_id = "CRP/2025/002"
127
-
128
- courtroom.schedule_case(test_date, case1_id)
129
- courtroom.schedule_case(test_date, case2_id)
130
-
131
- scheduled = courtroom.get_daily_schedule(test_date)
132
- print(f" Scheduled {len(scheduled)} cases on {test_date}")
133
-
134
- # Check utilization
135
- utilization = courtroom.compute_utilization(test_date)
136
- print(f" Utilization: {utilization:.2%}")
137
-
138
- print(" ✓ Courtroom entity working correctly")
139
- except Exception as e:
140
- print(f" ✗ Courtroom entity failed: {e}")
141
- exit(1)
142
-
143
- # Test 5: Judge Entity
144
- print("\n[5/6] Testing Judge Entity...")
145
- try:
146
- from scheduler.core.judge import Judge
147
-
148
- # Create a judge
149
- judge = Judge(
150
- judge_id="J001",
151
- name="Justice Smith",
152
- courtroom_id=1,
153
- )
154
-
155
- judge.add_preferred_types("RSA", "CRP")
156
-
157
- print(f" Created {judge.name} (ID: {judge.judge_id})")
158
- print(f" Assigned to courtroom: {judge.courtroom_id}")
159
- print(f" Specializations: {judge.preferred_case_types}")
160
-
161
- # Record workload
162
- judge.record_daily_workload(date(2025, 2, 1), cases_heard=25, cases_adjourned=10)
163
-
164
- avg_workload = judge.get_average_daily_workload()
165
- adj_rate = judge.get_adjournment_rate()
166
-
167
- print(f" Average daily workload: {avg_workload:.1f} cases")
168
- print(f" Adjournment rate: {adj_rate:.2%}")
169
-
170
- print(" ✓ Judge entity working correctly")
171
- except Exception as e:
172
- print(f" ✗ Judge entity failed: {e}")
173
- exit(1)
174
-
175
- # Test 6: Hearing Entity
176
- print("\n[6/6] Testing Hearing Entity...")
177
- try:
178
- from scheduler.core.hearing import Hearing, HearingOutcome
179
-
180
- # Create a hearing
181
- hearing = Hearing(
182
- hearing_id="H001",
183
- case_id="RSA/2025/001",
184
- scheduled_date=date(2025, 2, 1),
185
- courtroom_id=1,
186
- judge_id="J001",
187
- stage="ADMISSION",
188
- )
189
-
190
- print(f" Created hearing {hearing.hearing_id} for case {hearing.case_id}")
191
- print(f" Scheduled: {hearing.scheduled_date}, Stage: {hearing.stage}")
192
- print(f" Initial outcome: {hearing.outcome.value}")
193
-
194
- # Mark as heard
195
- hearing.mark_as_heard()
196
- print(f" Outcome after hearing: {hearing.outcome.value}")
197
- print(f" Is successful: {hearing.is_successful()}")
198
-
199
- print(" ✓ Hearing entity working correctly")
200
- except Exception as e:
201
- print(f" ✗ Hearing entity failed: {e}")
202
- exit(1)
203
-
204
- # Test 7: Calendar Utility
205
- print("\n[7/7] Testing Calendar Utility...")
206
- try:
207
- from scheduler.utils.calendar import CourtCalendar
208
-
209
- calendar = CourtCalendar()
210
-
211
- # Add some holidays
212
- calendar.add_standard_holidays(2025)
213
-
214
- print(f" Calendar initialized with {len(calendar.holidays)} holidays")
215
-
216
- # Test working day check
217
- monday = date(2025, 2, 3) # Monday
218
- saturday = date(2025, 2, 1) # Saturday
219
-
220
- print(f" Is {monday} (Mon) a working day? {calendar.is_working_day(monday)}")
221
- print(f" Is {saturday} (Sat) a working day? {calendar.is_working_day(saturday)}")
222
-
223
- # Count working days
224
- start = date(2025, 1, 1)
225
- end = date(2025, 1, 31)
226
- working_days = calendar.working_days_between(start, end)
227
- print(f" Working days in Jan 2025: {working_days}")
228
-
229
- # Test seasonality
230
- may_factor = calendar.get_seasonality_factor(date(2025, 5, 1))
231
- feb_factor = calendar.get_seasonality_factor(date(2025, 2, 1))
232
- print(f" Seasonality factor for May: {may_factor} (vacation)")
233
- print(f" Seasonality factor for Feb: {feb_factor} (peak)")
234
-
235
- print(" ✓ Calendar utility working correctly")
236
- except Exception as e:
237
- print(f" ✗ Calendar utility failed: {e}")
238
- exit(1)
239
-
240
- # Integration Test
241
- print("\n" + "=" * 70)
242
- print("INTEGRATION TEST - Putting it all together")
243
- print("=" * 70)
244
-
245
- try:
246
- # Create a mini simulation scenario
247
- print("\nScenario: Schedule 3 cases across 2 courtrooms")
248
-
249
- # Setup
250
- calendar = CourtCalendar()
251
- calendar.add_standard_holidays(2025)
252
-
253
- courtroom1 = Courtroom(courtroom_id=1, judge_id="J001", daily_capacity=151)
254
- courtroom2 = Courtroom(courtroom_id=2, judge_id="J002", daily_capacity=151)
255
-
256
- judge1 = Judge(judge_id="J001", name="Justice A", courtroom_id=1)
257
- judge2 = Judge(judge_id="J002", name="Justice B", courtroom_id=2)
258
-
259
- # Create cases
260
- cases = [
261
- Case(case_id="RSA/2025/001", case_type="RSA", filed_date=date(2025, 1, 1),
262
- current_stage="ADMISSION", is_urgent=True),
263
- Case(case_id="CRP/2025/002", case_type="CRP", filed_date=date(2025, 1, 5),
264
- current_stage="ADMISSION", is_urgent=False),
265
- Case(case_id="CA/2025/003", case_type="CA", filed_date=date(2025, 1, 10),
266
- current_stage="ORDERS / JUDGMENT", is_urgent=False),
267
- ]
268
-
269
- # Update ages
270
- current_date = date(2025, 2, 1)
271
- for case in cases:
272
- case.update_age(current_date)
273
-
274
- # Sort by priority
275
- cases_sorted = sorted(cases, key=lambda c: c.get_priority_score(), reverse=True)
276
-
277
- print(f"\nCases sorted by priority (as of {current_date}):")
278
- for i, case in enumerate(cases_sorted, 1):
279
- priority = case.get_priority_score()
280
- print(f" {i}. {case.case_id} - Priority: {priority:.3f}, "
281
- f"Age: {case.age_days} days, Urgent: {case.is_urgent}")
282
-
283
- # Schedule cases
284
- hearing_date = calendar.next_working_day(current_date, 7) # 7 days ahead
285
- print(f"\nScheduling hearings for {hearing_date}:")
286
-
287
- for i, case in enumerate(cases_sorted):
288
- courtroom = courtroom1 if i % 2 == 0 else courtroom2
289
- judge = judge1 if courtroom.courtroom_id == 1 else judge2
290
-
291
- if courtroom.can_schedule(hearing_date, case.case_id):
292
- courtroom.schedule_case(hearing_date, case.case_id)
293
-
294
- hearing = Hearing(
295
- hearing_id=f"H{i+1:03d}",
296
- case_id=case.case_id,
297
- scheduled_date=hearing_date,
298
- courtroom_id=courtroom.courtroom_id,
299
- judge_id=judge.judge_id,
300
- stage=case.current_stage,
301
- )
302
-
303
- print(f" ✓ {case.case_id} → Courtroom {courtroom.courtroom_id} (Judge {judge.judge_id})")
304
-
305
- # Check courtroom schedules
306
- print(f"\nCourtroom schedules for {hearing_date}:")
307
- for courtroom in [courtroom1, courtroom2]:
308
- schedule = courtroom.get_daily_schedule(hearing_date)
309
- utilization = courtroom.compute_utilization(hearing_date)
310
- print(f" Courtroom {courtroom.courtroom_id}: {len(schedule)} cases scheduled "
311
- f"(Utilization: {utilization:.2%})")
312
-
313
- print("\n✓ Integration test passed!")
314
-
315
- except Exception as e:
316
- print(f"\n✗ Integration test failed: {e}")
317
- import traceback
318
- traceback.print_exc()
319
- exit(1)
320
-
321
- print("\n" + "=" * 70)
322
- print("ALL TESTS PASSED - Phase 1 Foundation is Solid!")
323
- print("=" * 70)
324
- print("\nNext: Phase 2 - Case Generation")
325
- print(" Implement case_generator.py to create 10,000 synthetic cases")
326
- print("=" * 70)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test_system.py DELETED
@@ -1,8 +0,0 @@
1
- """Quick test to verify core system works before refactoring."""
2
- from scheduler.data.param_loader import load_parameters
3
-
4
- p = load_parameters()
5
- print("✓ Parameters loaded successfully")
6
- print(f"✓ Adjournment rate (ADMISSION, RSA): {p.get_adjournment_prob('ADMISSION', 'RSA'):.3f}")
7
- print("✓ Stage duration (ADMISSION, median): {:.0f} days".format(p.get_stage_duration('ADMISSION', 'median')))
8
- print("✓ Core system works!")