Spaces:

RoyAalekh
/

hackathon_code4change

Sleeping

RoyAalekh commited on Nov 27, 2025

Commit

b512b22

1 Parent(s): efc8383

chore: Major cleanup - remove redundant docs and emoticons

DELETED REDUNDANT DOCUMENTATION (136KB):
- COMPREHENSIVE_ANALYSIS.md (28KB)
- SYSTEM_WORKFLOW.md (21KB)
- TECHNICAL_IMPLEMENTATION.md (21KB)
- SUBMISSION_SUMMARY.md (15KB)
- RL_EXPLORATION_PLAN.md (15KB)
- Court Scheduling System Implementation Plan.md (14KB)
- DEVELOPMENT.md (10KB)
- PIPELINE.md (8KB)
- reports/codebase_analysis_2024-07-01.md (6KB)

DELETED OLD TEST FILES:
- test_phase1.py (312 lines - superseded by test_enhancements.py)
- test_system.py (7 lines - trivial)

REMOVED EMOTICONS:
- HACKATHON_SUBMISSION.md: Replaced tree characters with ASCII
- docs/CONFIGURATION.md: Removed checkmarks, replaced math symbols with ASCII

KEPT ESSENTIAL DOCS:
- README.md (main entry point)
- HACKATHON_SUBMISSION.md (final submission)
- docs/ENHANCEMENT_PLAN.md (tracks bug fixes)
- docs/CONFIGURATION.md (config reference)
- rl/README.md (module docs)
- test_enhancements.py (comprehensive PR validation)

Result: Clean, professional codebase ready for hackathon submission

Files changed (13) hide show

COMPREHENSIVE_ANALYSIS.md +0 -862
Court Scheduling System Implementation Plan.md +0 -331
DEVELOPMENT.md +0 -270
HACKATHON_SUBMISSION.md +15 -15
PIPELINE.md +0 -259
RL_EXPLORATION_PLAN.md +0 -0
SUBMISSION_SUMMARY.md +0 -417
SYSTEM_WORKFLOW.md +0 -642
TECHNICAL_IMPLEMENTATION.md +0 -658
docs/CONFIGURATION.md +19 -20
reports/codebase_analysis_2024-07-01.md +0 -30
test_phase1.py +0 -326
test_system.py +0 -8

COMPREHENSIVE_ANALYSIS.md DELETED Viewed

@@ -1,862 +0,0 @@
-# Code4Change Court Scheduling Analysis: Comprehensive Codebase Documentation
-**Project**: Karnataka High Court Scheduling Optimization
-**Version**: v0.4.0
-**Last Updated**: 2025-11-19
-**Purpose**: Exploratory Data Analysis and Parameter Extraction for Court Scheduling System
----
-## Table of Contents
-1. [Executive Summary](#executive-summary)
-2. [Project Architecture](#project-architecture)
-3. [Dataset Overview](#dataset-overview)
-4. [Data Processing Pipeline](#data-processing-pipeline)
-5. [Exploratory Data Analysis](#exploratory-data-analysis)
-6. [Parameter Extraction](#parameter-extraction)
-7. [Key Findings and Insights](#key-findings-and-insights)
-8. [Technical Implementation](#technical-implementation)
-9. [Outputs and Artifacts](#outputs-and-artifacts)
-10. [Next Steps for Algorithm Development](#next-steps-for-algorithm-development)
----
-## Executive Summary
-This project provides comprehensive analysis tools for the Code4Change hackathon, focused on developing intelligent court scheduling systems for the Karnataka High Court. The codebase implements a complete EDA pipeline that processes 20+ years of court data to extract scheduling parameters, identify patterns, and generate insights for algorithm development.
-### Key Statistics
-- **Cases Analyzed**: 134,699 unique civil cases
-- **Hearings Tracked**: 739,670 individual hearings
-- **Time Period**: 2000-2025 (disposed cases only)
-- **Case Types**: 8 civil case categories (RSA, CRP, RFA, CA, CCC, CP, MISC.CVL, CMP)
-- **Data Quality**: High (minimal lifecycle inconsistencies)
-### Primary Deliverables
-1. **Interactive HTML Visualizations** (15+ plots covering all dimensions)
-2. **Parameter Extraction** (stage transitions, court capacity, adjournment rates)
-3. **Case Features Dataset** with readiness scores and alert flags
-4. **Seasonality and Anomaly Detection** for resource planning
----
-## Project Architecture
-### Technology Stack
-- **Data Processing**: Polars (for performance), Pandas (for visualization)
-- **Visualization**: Plotly (interactive HTML outputs)
-- **Scientific Computing**: NumPy, SciPy, Scikit-learn
-- **Graph Analysis**: NetworkX
-- **Optimization**: OR-Tools
-- **Data Validation**: Pydantic
-- **CLI**: Typer
-### Directory Structure
-```
-code4change-analysis/
-├── Data/                          # Raw CSV inputs
-│   ├── ISDMHack_Cases_WPfinal.csv
-│   └── ISDMHack_Hear.csv
-├── src/                           # Analysis modules
-│   ├── eda_config.py             # Configuration and paths
-│   ├── eda_load_clean.py         # Data loading and cleaning
-│   ├── eda_exploration.py        # Visual EDA
-│   └── eda_parameters.py         # Parameter extraction
-├── reports/                       # Generated outputs
-│   └── figures/
-│       └── v0.4.0_TIMESTAMP/     # Versioned outputs
-│           ├── *.html            # Interactive visualizations
-│           ├── *.parquet         # Cleaned data
-│           ├── *.csv             # Summary tables
-│           └── params/           # Extracted parameters
-├── literature/                    # Problem statements and references
-├── main.py                       # Pipeline orchestrator
-├── pyproject.toml                # Dependencies and metadata
-└── README.md                     # User documentation
-```
-### Execution Flow
-```
-main.py
-  ├─> Step 1: run_load_and_clean()
-  │   ├─ Load raw CSVs
-  │   ├─ Normalize text fields
-  │   ├─ Compute hearing gaps
-  │   ├─ Deduplicate and validate
-  │   └─ Save to Parquet
-  │
-  ├─> Step 2: run_exploration()
-  │   ├─ Generate 15+ interactive visualizations
-  │   ├─ Analyze temporal patterns
-  │   ├─ Compute stage transitions
-  │   └─ Detect anomalies
-  │
-  └─> Step 3: run_parameter_export()
-      ├─ Extract stage transition probabilities
-      ├─ Compute court capacity metrics
-      ├─ Identify adjournment proxies
-      ├─ Calculate readiness scores
-      └─ Generate case features dataset
-```
----
-## Dataset Overview
-### Cases Dataset (ISDMHack_Cases_WPfinal.csv)
-**Shape**: 134,699 rows × 24 columns
-**Primary Key**: CNR_NUMBER (unique case identifier)
-#### Key Attributes
-| Column | Type | Description | Notes |
-|--------|------|-------------|-------|
-| CNR_NUMBER | String | Unique case identifier | Primary key |
-| CASE_TYPE | Categorical | Type of case (RSA, CRP, etc.) | 8 unique values |
-| DATE_FILED | Date | Case filing date | Range: 2000-2025 |
-| DECISION_DATE | Date | Case disposal date | Only disposed cases |
-| DISPOSALTIME_ADJ | Integer | Disposal duration (days) | Adjusted for consistency |
-| COURT_NUMBER | Integer | Courtroom identifier | Resource allocation |
-| CURRENT_STATUS | Categorical | Case status | All "Disposed" |
-| NATURE_OF_DISPOSAL | String | Disposal type/outcome | Varied outcomes |
-#### Derived Attributes (Computed in Pipeline)
-- **YEAR_FILED**: Extracted from DATE_FILED
-- **YEAR_DECISION**: Extracted from DECISION_DATE
-- **N_HEARINGS**: Count of hearings per case
-- **GAP_MEAN/MEDIAN/STD**: Hearing gap statistics
-- **GAP_P25/GAP_P75**: Quartile values for gaps
-### Hearings Dataset (ISDMHack_Hear.csv)
-**Shape**: 739,670 rows × 31 columns
-**Primary Key**: Hearing_ID
-**Foreign Key**: CNR_NUMBER (links to Cases)
-#### Key Attributes
-| Column | Type | Description | Notes |
-|--------|------|-------------|-------|
-| Hearing_ID | String | Unique hearing identifier | Primary key |
-| CNR_NUMBER | String | Links to case | Foreign key |
-| BusinessOnDate | Date | Hearing date | Core temporal attribute |
-| Remappedstages | Categorical | Hearing stage | 11 standardized stages |
-| PurposeofHearing | Text | Purpose description | Used for classification |
-| BeforeHonourableJudge | String | Judge name(s) | May be multi-judge bench |
-| CourtName | String | Courtroom identifier | Resource tracking |
-| PreviousHearing | Date | Prior hearing date | For gap computation |
-#### Stage Taxonomy (Remappedstages)
-1. **PRE-ADMISSION**: Initial procedural stage
-2. **ADMISSION**: Formal admission of case
-3. **FRAMING OF CHARGES**: Charge formulation (rare)
-4. **EVIDENCE**: Evidence presentation
-5. **ARGUMENTS**: Legal arguments phase
-6. **INTERLOCUTORY APPLICATION**: Interim relief requests
-7. **SETTLEMENT**: Settlement negotiations
-8. **ORDERS / JUDGMENT**: Final orders or judgments
-9. **FINAL DISPOSAL**: Case closure
-10. **OTHER**: Miscellaneous hearings
-11. **NA**: Missing or unknown stage
----
-## Data Processing Pipeline
-### Module 1: Load and Clean (eda_load_clean.py)
-#### Responsibilities
-1. **Robust CSV Loading** with null token handling
-2. **Text Normalization** (uppercase, strip, null standardization)
-3. **Date Parsing** with multiple format support
-4. **Deduplication** on primary keys
-5. **Hearing Gap Computation** (mean, median, std, p25, p75)
-6. **Lifecycle Validation** (hearings within case timeline)
-#### Data Quality Checks
-- **Null Summary**: Reports missing values per column
-- **Duplicate Detection**: Removes duplicate CNR_NUMBER and Hearing_ID
-- **Temporal Consistency**: Flags hearings before filing or after decision
-- **Type Validation**: Ensures proper data types for all columns
-#### Key Transformations
-**Stage Canonicalization**:
-```python
-STAGE_MAP = {
-    "ORDERS/JUDGMENTS": "ORDERS / JUDGMENT",
-    "ORDER/JUDGMENT": "ORDERS / JUDGMENT",
-    "ORDERS  /  JUDGMENT": "ORDERS / JUDGMENT",
-    # ... additional mappings
-}
-```
-**Hearing Gap Computation**:
-- Computed as (Current Hearing Date - Previous Hearing Date) per case
-- Statistics: mean, median, std, p25, p75, count
-- Handles first hearing (gap = null) appropriately
-**Outputs**:
-- `cases_clean.parquet`: 134,699 × 33 columns
-- `hearings_clean.parquet`: 739,669 × 31 columns
-- `metadata.json`: Shape, columns, timestamp information
----
-## Exploratory Data Analysis
-### Module 2: Visual EDA (eda_exploration.py)
-This module generates 15+ interactive HTML visualizations covering all analytical dimensions.
-### Visualization Catalog
-#### 1. Case Type Distribution
-**File**: `1_case_type_distribution.html`
-**Type**: Bar chart
-**Insights**:
-- CRP (27,132 cases) - Civil Revision Petitions
-- CA (26,953 cases) - Civil Appeals
-- RSA (26,428 cases) - Regular Second Appeals
-- RFA (22,461 cases) - Regular First Appeals
-- Distribution is relatively balanced across major types
-#### 2. Filing Trends Over Time
-**File**: `2_cases_filed_by_year.html`
-**Type**: Line chart with range slider
-**Insights**:
-- Steady growth from 2000-2010
-- Peak filing years: 2011-2015
-- Recent stabilization (2016-2025)
-- Useful for capacity planning
-#### 3. Disposal Time Distribution
-**File**: `3_disposal_time_distribution.html`
-**Type**: Histogram (50 bins)
-**Insights**:
-- Heavy right-skew (long tail of delayed cases)
-- Median disposal: ~139-903 days depending on case type
-- 90th percentile: 298-2806 days (varies dramatically)
-#### 4. Hearings vs Disposal Time
-**File**: `4_hearings_vs_disposal.html`
-**Type**: Scatter plot (colored by case type)
-**Correlation**: 0.718 (Spearman)
-**Insights**:
-- Strong positive correlation between hearing count and disposal time
-- Non-linear relationship (diminishing returns)
-- Case type influences both dimensions
-#### 5. Disposal Time by Case Type
-**File**: `5_box_disposal_by_type.html`
-**Type**: Box plot
-**Insights**:
-```
-Case Type | Median Days | P90 Days
-----------|-------------|----------
-CCC       | 93          | 298
-CP        | 96          | 541
-CA        | 117         | 588
-CRP       | 139         | 867
-CMP       | 252         | 861
-RSA       | 695.5       | 2,313
-RFA       | 903         | 2,806
-```
-- RSA and RFA cases take significantly longer
-- CCC and CP are fastest to resolve
-#### 6. Stage Frequency Analysis
-**File**: `6_stage_frequency.html`
-**Type**: Bar chart
-**Insights**:
-- ADMISSION: 427,716 hearings (57.8%)
-- ORDERS / JUDGMENT: 159,846 hearings (21.6%)
-- NA: 6,981 hearings (0.9%)
-- Other stages: < 5,000 each
-- Most case time spent in ADMISSION phase
-#### 7. Hearing Gap by Case Type
-**File**: `9_gap_median_by_type.html`
-**Type**: Box plot
-**Insights**:
-- CA: 0 days median (immediate disposals common)
-- CP: 6.75 days median
-- CRP: 14 days median
-- CCC: 18 days median
-- CMP/RFA/RSA: 28-38 days median
-- Significant outliers in all categories
-#### 8. Stage Transition Sankey
-**File**: `10_stage_transition_sankey.html`
-**Type**: Sankey diagram
-**Top Transitions**:
-1. ADMISSION → ADMISSION (396,894) - cases remain in admission
-2. ORDERS / JUDGMENT → ORDERS / JUDGMENT (155,819)
-3. ADMISSION → ORDERS / JUDGMENT (20,808) - direct progression
-4. ADMISSION → NA (9,539) - missing data
-#### 9. Monthly Hearing Volume
-**File**: `11_monthly_hearings.html`
-**Type**: Time series line chart
-**Insights**:
-- Seasonal pattern: Lower volume in May (summer vacations)
-- Higher volume in Feb-Apr and Jul-Nov (peak court periods)
-- Steady growth trend from 2000-2020
-- Recent stabilization at ~30,000-40,000 hearings/month
-#### 10. Monthly Waterfall with Anomalies
-**File**: `11b_monthly_waterfall.html`
-**Type**: Waterfall chart with anomaly markers
-**Anomalies Detected** (|z-score| ≥ 3):
-- COVID-19 impact: March-May 2020 (dramatic drops)
-- System transitions: Data collection changes
-- Holiday impacts: December/January consistently lower
-#### 11. Court Day Load
-**File**: `12b_court_day_load.html`
-**Type**: Box plot per courtroom
-**Capacity Insights**:
-- Median: 151 hearings/courtroom/day
-- P90: 252 hearings/courtroom/day
-- High variability across courtrooms (resource imbalance)
-#### 12. Stage Bottleneck Impact
-**File**: `15_bottleneck_impact.html`
-**Type**: Bar chart (Median Days × Run Count)
-**Top Bottlenecks**:
-1. **ADMISSION**: Median 75 days × 126,979 runs = massive impact
-2. **ORDERS / JUDGMENT**: Median 224 days × 21,974 runs
-3. **ARGUMENTS**: Median 26 days × 743 runs
-### Summary Outputs (CSV)
-- `transitions.csv`: Stage-to-stage transition counts
-- `stage_duration.csv`: Median/mean/p90 duration per stage
-- `monthly_hearings.csv`: Time series of hearing volumes
-- `monthly_anomalies.csv`: Anomaly detection results with z-scores
----
-## Parameter Extraction
-### Module 3: Parameters (eda_parameters.py)
-This module extracts scheduling parameters needed for simulation and optimization algorithms.
-### 1. Stage Transition Probabilities
-**Output**: `stage_transition_probs.csv`
-**Format**:
-```csv
-STAGE_FROM,STAGE_TO,N,row_n,p
-ADMISSION,ADMISSION,396894,427716,0.9279
-ADMISSION,ORDERS / JUDGMENT,20808,427716,0.0486
-```
-**Application**: Markov chain modeling for case progression
-**Key Probabilities**:
-- P(ADMISSION → ADMISSION) = 0.928 (cases stay in admission)
-- P(ADMISSION → ORDERS/JUDGMENT) = 0.049 (direct progression)
-- P(ORDERS/JUDGMENT → ORDERS/JUDGMENT) = 0.975 (iterative judgments)
-- P(ARGUMENTS → ARGUMENTS) = 0.782 (multi-hearing arguments)
-### 2. Stage Transition Entropy
-**Output**: `stage_transition_entropy.csv`
-**Entropy Scores** (predictability metric):
-```
-Stage                      | Entropy
----------------------------|--------
-PRE-ADMISSION             | 1.40  (most unpredictable)
-FRAMING OF CHARGES        | 1.14
-SETTLEMENT                | 0.90
-ADMISSION                 | 0.31  (very predictable)
-ORDERS / JUDGMENT         | 0.12  (highly predictable)
-NA                        | 0.00  (terminal state)
-```
-**Interpretation**: Lower entropy = more predictable transitions
-### 3. Stage Duration Distribution
-**Output**: `stage_duration.csv`
-**Format**:
-```csv
-STAGE,RUN_MEDIAN_DAYS,RUN_P90_DAYS,HEARINGS_PER_RUN_MED,N_RUNS
-ORDERS / JUDGMENT,224.0,1738.0,4.0,21974
-ADMISSION,75.0,889.0,3.0,126979
-```
-**Application**: Duration modeling for scheduling simulation
-### 4. Court Capacity Metrics
-**Outputs**:
-- `court_capacity_stats.csv`: Per-courtroom statistics
-- `court_capacity_global.json`: Global aggregates
-**Global Capacity**:
-```json
-{
-  "slots_median_global": 151.0,
-  "slots_p90_global": 252.0
-}
-```
-**Application**: Resource constraint modeling
-### 5. Adjournment Proxies
-**Output**: `adjournment_proxies.csv`
-**Methodology**:
-- Adjournment proxy: Hearing gap > 1.3 × stage median gap
-- Not-reached proxy: Purpose text contains "NOT REACHED", "NR", etc.
-**Sample Results**:
-```csv
-Stage,CaseType,p_adjourn_proxy,p_not_reached_proxy,n
-ADMISSION,RSA,0.423,0.0,139337
-ADMISSION,RFA,0.356,0.0,120725
-ORDERS / JUDGMENT,RFA,0.448,0.0,90746
-```
-**Application**: Stochastic modeling of hearing outcomes
-### 6. Case Type Summary
-**Output**: `case_type_summary.csv`
-**Format**:
-```csv
-CASE_TYPE,n_cases,disp_median,disp_p90,hear_median,gap_median
-RSA,26428,695.5,2313.0,5.0,38.0
-RFA,22461,903.0,2806.0,6.0,31.0
-```
-**Application**: Case type-specific parameter tuning
-### 7. Correlation Analysis
-**Output**: `correlations_spearman.csv`
-**Spearman Correlations**:
-```
-                 | DISPOSALTIME_ADJ | N_HEARINGS | GAP_MEDIAN
------------------+------------------+------------+-----------
-DISPOSALTIME_ADJ | 1.000            | 0.718      | 0.594
-N_HEARINGS       | 0.718            | 1.000      | 0.502
-GAP_MEDIAN       | 0.594            | 0.502      | 1.000
-```
-**Interpretation**: All metrics are positively correlated, confirming scheduling complexity compounds
-### 8. Case Features with Readiness Scores
-**Output**: `cases_features.csv` (134,699 × 14 columns)
-**Readiness Score Formula**:
-```python
-READINESS_SCORE =
-    (N_HEARINGS_CAPPED / 50) × 0.4 +                    # Hearing progress
-    (100 / GAP_MEDIAN_CLAMPED) × 0.3 +                  # Momentum
-    (LAST_STAGE in [ARGUMENTS, EVIDENCE, ORDERS]) × 0.3 # Stage advancement
-```
-**Range**: [0, 1] (higher = more ready for final hearing)
-**Alert Flags**:
-- `ALERT_P90_TYPE`: Disposal time > 90th percentile within case type
-- `ALERT_HEARING_HEAVY`: Hearing count > 90th percentile within case type
-- `ALERT_LONG_GAP`: Gap > 90th percentile within case type
-**Application**: Priority queue construction, urgency detection
-### 9. Age Funnel Analysis
-**Output**: `age_funnel.csv`
-**Distribution**:
-```
-Age Bucket | Count   | Percentage
------------|---------|------------
-<1y        | 83,887  | 62.3%
-1-3y       | 29,418  | 21.8%
-3-5y       | 10,290  | 7.6%
->5y        | 11,104  | 8.2%
-```
-**Application**: Backlog management, aging case prioritization
----
-## Key Findings and Insights
-### 1. Case Lifecycle Patterns
-**Average Journey**:
-1. **Filing → Admission**: ~2-3 hearings, ~75 days median
-2. **Admission (holding pattern)**: Multiple hearings, 92.8% stay in admission
-3. **Arguments (if reached)**: ~3 hearings, ~26 days median
-4. **Orders/Judgment**: ~4 hearings, ~224 days median
-5. **Final Disposal**: Varies by case type (93-903 days median)
-**Key Observation**: Most cases spend disproportionate time in ADMISSION stage
-### 2. Case Type Complexity
-**Fast Track** (< 150 days median):
-- CCC (93 days) - Ordinary civil cases
-- CP (96 days) - Civil petitions
-- CA (117 days) - Civil appeals
-- CRP (139 days) - Civil revision petitions
-**Extended Process** (> 600 days median):
-- RSA (695.5 days) - Second appeals
-- RFA (903 days) - First appeals
-**Implication**: Scheduling algorithms must differentiate by case type
-### 3. Scheduling Bottlenecks
-**Primary Bottleneck**: ADMISSION stage
-- 57.8% of all hearings
-- Median duration: 75 days per run
-- 126,979 separate runs
-- High self-loop probability (0.928)
-**Secondary Bottleneck**: ORDERS / JUDGMENT stage
-- 21.6% of all hearings
-- Median duration: 224 days per run
-- Complex cases accumulate here
-**Tertiary**: Judge assignment constraints
-- High variance in per-judge workload
-- Some judges handle 2-3× median load
-### 4. Temporal Patterns
-**Seasonality**:
-- **Low Volume**: May (summer vacations), December-January (holidays)
-- **High Volume**: February-April, July-November
-- **Anomalies**: COVID-19 (March-May 2020), system transitions
-**Implications**:
-- Capacity planning must account for 40-60% seasonal variance
-- Vacation schedules create predictable bottlenecks
-### 5. Judge and Court Utilization
-**Capacity Metrics**:
-- Median courtroom load: 151 hearings/day
-- P90 courtroom load: 252 hearings/day
-- High variance suggests resource imbalance
-**Multi-Judge Benches**:
-- Present in dataset (BeforeHonourableJudgeTwo, etc.)
-- Adds scheduling complexity
-### 6. Adjournment Patterns
-**High Adjournment Stages**:
-- ORDERS / JUDGMENT: 40-45% adjournment rate
-- ADMISSION (RSA cases): 42% adjournment rate
-- ADMISSION (RFA cases): 36% adjournment rate
-**Implication**: Stochastic models need adjournment probability by stage × case type
-### 7. Data Quality Insights
-**Strengths**:
-- Comprehensive coverage (20+ years)
-- Minimal missing data in key fields
-- Strong referential integrity (CNR_NUMBER links)
-**Limitations**:
-- Judge names not standardized (typos, variations)
-- Purpose text is free-form (NLP required)
-- Some stages have sparse data (EVIDENCE, SETTLEMENT)
-- "NA" stage used for missing data (0.9% of hearings)
----
-## Technical Implementation
-### Design Decisions
-#### 1. Polars for Data Processing
-**Rationale**: 10-100× faster than Pandas for large datasets
-**Usage**: All ETL and aggregation operations
-**Trade-off**: Convert to Pandas only for Plotly visualization
-#### 2. Parquet for Storage
-**Rationale**: Columnar format, compressed, schema-preserving
-**Benefit**: 10-20× faster I/O vs CSV, type safety
-**Size**: cases_clean.parquet (~5MB), hearings_clean.parquet (~37MB)
-#### 3. Versioned Outputs
-**Pattern**: `reports/figures/v{VERSION}_{TIMESTAMP}/`
-**Benefit**: Reproducibility, comparison across runs
-**Storage**: ~100MB per run (HTML files are large)
-#### 4. Interactive HTML Visualizations
-**Rationale**: Self-contained, shareable, no server required
-**Library**: Plotly (browser-based interaction)
-**Trade-off**: Large file sizes (4-10MB per plot)
-### Code Quality Patterns
-#### Type Hints and Validation
-```python
-def load_raw() -> tuple[pl.DataFrame, pl.DataFrame]:
-    """Load raw data with Polars."""
-    cases = pl.read_csv(
-        CASES_FILE,
-        try_parse_dates=True,
-        null_values=NULL_TOKENS,
-        infer_schema_length=100_000,
-    )
-    return cases, hearings
-```
-#### Null Handling
-```python
-NULL_TOKENS = ["", "NULL", "Null", "null", "NA", "N/A", "na", "NaN", "nan", "-", "--"]
-```
-#### Stage Canonicalization
-```python
-STAGE_MAP = {
-    "ORDERS/JUDGMENTS": "ORDERS / JUDGMENT",
-    "INTERLOCUTARY APPLICATION": "INTERLOCUTORY APPLICATION",
-}
-```
-#### Error Handling
-```python
-try:
-    fig_sankey = create_sankey(transitions)
-    fig_sankey.write_html(FIGURES_DIR / "sankey.html")
-    copy_to_versioned("sankey.html")
-except Exception as e:
-    print(f"Sankey error: {e}")
-    # Continue pipeline
-```
-### Performance Characteristics
-**Full Pipeline Runtime** (on typical laptop):
-- Step 1 (Load & Clean): ~20 seconds
-- Step 2 (Exploration): ~120 seconds (Plotly rendering is slow)
-- Step 3 (Parameter Export): ~30 seconds
-- **Total**: ~3 minutes
-**Memory Usage**:
-- Peak: ~2GB RAM
-- Mostly during Plotly figure generation (holds entire plot in memory)
----
-## Outputs and Artifacts
-### Cleaned Data
-| File | Format | Size | Rows | Columns | Purpose |
-|------|--------|------|------|---------|---------|
-| cases_clean.parquet | Parquet | 5MB | 134,699 | 33 | Clean case data with computed features |
-| hearings_clean.parquet | Parquet | 37MB | 739,669 | 31 | Clean hearing data with stage normalization |
-| metadata.json | JSON | 2KB | - | - | Dataset schema and statistics |
-### Visualizations (HTML)
-| File | Type | Purpose |
-|------|------|---------|
-| 1_case_type_distribution.html | Bar | Case type frequency |
-| 2_cases_filed_by_year.html | Line | Filing trends |
-| 3_disposal_time_distribution.html | Histogram | Disposal duration |
-| 4_hearings_vs_disposal.html | Scatter | Correlation analysis |
-| 5_box_disposal_by_type.html | Box | Case type comparison |
-| 6_stage_frequency.html | Bar | Stage distribution |
-| 9_gap_median_by_type.html | Box | Hearing gap analysis |
-| 10_stage_transition_sankey.html | Sankey | Transition flows |
-| 11_monthly_hearings.html | Line | Volume trends |
-| 11b_monthly_waterfall.html | Waterfall | Monthly changes |
-| 12b_court_day_load.html | Box | Court capacity |
-| 15_bottleneck_impact.html | Bar | Bottleneck ranking |
-### Parameter Files (CSV/JSON)
-| File | Purpose | Application |
-|------|---------|-------------|
-| stage_transitions.csv | Transition counts | Markov chain construction |
-| stage_transition_probs.csv | Probability matrix | Stochastic modeling |
-| stage_transition_entropy.csv | Predictability scores | Uncertainty quantification |
-| stage_duration.csv | Duration distributions | Time estimation |
-| court_capacity_global.json | Capacity limits | Resource constraints |
-| court_capacity_stats.csv | Per-court metrics | Load balancing |
-| adjournment_proxies.csv | Adjournment rates | Stochastic outcomes |
-| case_type_summary.csv | Type-specific stats | Parameter tuning |
-| correlations_spearman.csv | Feature correlations | Feature selection |
-| cases_features.csv | Enhanced case data | Scheduling input |
-| age_funnel.csv | Case age distribution | Priority computation |
----
-## Next Steps for Algorithm Development
-### 1. Scheduling Algorithm Design
-**Multi-Objective Optimization**:
-- **Fairness**: Minimize age variance, equal treatment
-- **Efficiency**: Maximize throughput, minimize idle time
-- **Urgency**: Prioritize high-readiness cases
-**Suggested Approach**: Graph-based optimization with OR-Tools
-```python
-# Pseudo-code
-from ortools.sat.python import cp_model
-model = cp_model.CpModel()
-# Decision variables
-hearing_slots = {}  # (case, date, court) -> binary
-judge_assignments = {}  # (hearing, judge) -> binary
-# Constraints
-for date in dates:
-    for court in courts:
-        model.Add(sum(hearing_slots[c, date, court] for c in cases) <= CAPACITY[court])
-# Objective: weighted sum of fairness + efficiency + urgency
-model.Maximize(...)
-```
-### 2. Simulation Framework
-**Discrete Event Simulation** with SimPy:
-```python
-import simpy
-def case_lifecycle(env, case_id):
-    # Admission phase
-    yield env.timeout(sample_duration("ADMISSION", case.type))
-    # Arguments phase (probabilistic)
-    if random() < transition_prob["ADMISSION", "ARGUMENTS"]:
-        yield env.timeout(sample_duration("ARGUMENTS", case.type))
-    # Adjournment modeling
-    if random() < adjournment_rate[stage, case.type]:
-        yield env.timeout(adjournment_delay())
-    # Orders/Judgment
-    yield env.timeout(sample_duration("ORDERS / JUDGMENT", case.type))
-```
-### 3. Feature Engineering
-**Additional Features to Compute**:
-- Case complexity score (parties, acts, sections)
-- Judge specialization matching
-- Historical disposal rate (judge × case type)
-- Network centrality (advocate recurrence)
-### 4. Machine Learning Integration
-**Potential Models**:
-- **XGBoost**: Disposal time prediction
-- **LSTM**: Sequence modeling for stage progression
-- **Graph Neural Networks**: Relationship modeling (judge-advocate-case)
-**Target Variables**:
-- Disposal time (regression)
-- Next stage (classification)
-- Adjournment probability (binary classification)
-### 5. Real-Time Dashboard
-**Technology**: Streamlit or Plotly Dash
-**Features**:
-- Live scheduling queue
-- Judge workload visualization
-- Bottleneck alerts
-- What-if scenario analysis
-### 6. Validation Metrics
-**Fairness**:
-- Gini coefficient of disposal times
-- Age variance within case type
-- Equal opportunity (demographic analysis if available)
-**Efficiency**:
-- Court utilization rate
-- Average disposal time
-- Throughput (cases/month)
-**Urgency**:
-- Readiness score coverage
-- High-priority case delay
----
-## Appendix: Key Statistics Reference
-### Case Type Distribution
-```
-CRP:   27,132 (20.1%)
-CA:    26,953 (20.0%)
-RSA:   26,428 (19.6%)
-RFA:   22,461 (16.7%)
-CCC:   14,996 (11.1%)
-CP:    12,920 (9.6%)
-CMP:    3,809 (2.8%)
-```
-### Disposal Time Percentiles
-```
-P50 (median): 215 days
-P75:          629 days
-P90:        1,460 days
-P95:        2,152 days
-P99:        3,688 days
-```
-### Stage Transition Matrix (Top 10)
-```
-From               | To                 | Count    | Probability
--------------------|--------------------|---------:|------------:
-ADMISSION          | ADMISSION          | 396,894  | 0.928
-ORDERS / JUDGMENT  | ORDERS / JUDGMENT  | 155,819  | 0.975
-ADMISSION          | ORDERS / JUDGMENT  |  20,808  | 0.049
-ADMISSION          | NA                 |   9,539  | 0.022
-NA                 | NA                 |   6,981  | 1.000
-ORDERS / JUDGMENT  | NA                 |   3,998  | 0.025
-ARGUMENTS          | ARGUMENTS          |   2,612  | 0.782
-```
-### Court Capacity
-```
-Global Median:  151 hearings/court/day
-Global P90:     252 hearings/court/day
-```
-### Correlations (Spearman)
-```
-DISPOSALTIME_ADJ ↔ N_HEARINGS:    0.718
-DISPOSALTIME_ADJ ↔ GAP_MEDIAN:    0.594
-N_HEARINGS ↔ GAP_MEDIAN:          0.502
-```
----
-## Conclusion
-This codebase provides a comprehensive foundation for building intelligent court scheduling systems. The combination of robust data processing, detailed exploratory analysis, and extracted parameters creates a complete information pipeline from raw data to algorithm-ready inputs.
-The analysis reveals that court scheduling is a complex multi-constraint optimization problem with significant temporal patterns, stage-based dynamics, and case type heterogeneity. The extracted parameters and visualizations provide the necessary building blocks for developing fair, efficient, and urgency-aware scheduling algorithms.
-**Recommended Next Action**: Begin with simulation-based validation of scheduling policies using the extracted parameters, then graduate to optimization-based approaches once baseline performance is established.
----
-**Document Version**: 1.0
-**Generated**: 2025-11-19
-**Maintained By**: Code4Change Analysis Team

Court Scheduling System Implementation Plan.md DELETED Viewed

@@ -1,331 +0,0 @@
-# Court Scheduling System Implementation Plan
-## Overview
-Build an intelligent judicial scheduling system for Karnataka High Court that optimizes daily cause lists across multiple courtrooms over a 2-year simulation period, balancing fairness, efficiency, and urgency.
-## Architecture Design
-### System Components
-1. **Parameter Loader**: Load EDA-extracted parameters (transition probs, durations, capacities)
-2. **Case Generator**: Synthetic case creation with realistic attributes
-3. **Simulation Engine**: SimPy-based discrete event simulation
-4. **Scheduling Policies**: Multiple algorithms (FIFO, Priority, Optimized)
-5. **Metrics Tracker**: Performance evaluation (fairness, efficiency, urgency)
-6. **Visualization**: Dashboard for monitoring and analysis
-### Technology Stack
-* **Simulation**: SimPy (discrete event simulation)
-* **Optimization**: OR-Tools (CP-SAT solver)
-* **Data Processing**: Polars, Pandas
-* **Visualization**: Plotly, Streamlit
-* **Testing**: Pytest, Hypothesis
-## Module Structure
-```warp-runnable-command
-scheduler/
-├── core/
-│   ├── __init__.py
-│   ├── case.py              # Case entity and lifecycle
-│   ├── courtroom.py         # Courtroom resource
-│   ├── judge.py             # Judge entity
-│   └── hearing.py           # Hearing event
-├── data/
-│   ├── __init__.py
-│   ├── param_loader.py      # Load EDA parameters
-│   ├── case_generator.py   # Generate synthetic cases
-│   └── config.py            # Configuration constants
-├── simulation/
-│   ├── __init__.py
-│   ├── engine.py            # SimPy simulation engine
-│   ├── scheduler.py         # Base scheduler interface
-│   ├── policies/
-│   │   ├── __init__.py
-│   │   ├── fifo.py         # FIFO scheduling
-│   │   ├── priority.py     # Priority-based
-│   │   └── optimized.py    # OR-Tools optimization
-│   └── events.py           # Event handlers
-├── optimization/
-│   ├── __init__.py
-│   ├── model.py            # OR-Tools model
-│   ├── objectives.py       # Multi-objective functions
-│   └── constraints.py      # Constraint definitions
-├── metrics/
-│   ├── __init__.py
-│   ├── fairness.py         # Gini coefficient, age variance
-│   ├── efficiency.py       # Utilization, throughput
-│   └── urgency.py          # Readiness coverage
-├── visualization/
-│   ├── __init__.py
-│   ├── dashboard.py        # Streamlit dashboard
-│   └── plots.py            # Plotly visualizations
-└── utils/
-    ├── __init__.py
-    ├── distributions.py    # Probability distributions
-    └── calendar.py         # Working days calculator
-```
-## Implementation Phases
-### Phase 1: Foundation (Days 1-2) - COMPLETE
-**Goal**: Set up infrastructure and load parameters
-**Status**: 100% complete (1,323 lines implemented)
-**Tasks**:
-1. [x] Create module directory structure (8 sub-packages)
-2. [x] Implement parameter loader
-    * Read stage_transition_probs.csv
-    * Read stage_duration.csv
-    * Read court_capacity_global.json
-    * Read adjournment_proxies.csv
-    * Read cases_features.csv
-    * Automatic latest version detection
-    * Lazy loading with caching
-3. [x] Create core entities (Case, Courtroom, Judge, Hearing)
-    * Case: Lifecycle, readiness score, priority score (218 lines)
-    * Courtroom: Capacity tracking, scheduling, utilization (228 lines)
-    * Judge: Workload tracking, specialization, adjournment rate (167 lines)
-    * Hearing: Outcome tracking, rescheduling support (134 lines)
-4. [x] Implement working days calculator (192 days/year)
-    * Weekend/holiday detection
-    * Seasonality factors
-    * Working days counting (217 lines)
-5. [x] Configuration system with EDA-derived constants (115 lines)
-**Outputs**:
-* `scheduler/data/param_loader.py` (244 lines)
-* `scheduler/data/config.py` (115 lines)
-* `scheduler/core/case.py` (218 lines)
-* `scheduler/core/courtroom.py` (228 lines)
-* `scheduler/core/judge.py` (167 lines)
-* `scheduler/core/hearing.py` (134 lines)
-* `scheduler/utils/calendar.py` (217 lines)
-**Quality**: Type hints 100%, Docstrings 100%, Integration complete
-### Phase 2: Case Generation (Days 3-4)
-**Goal**: Generate synthetic case pool for simulation
-**Tasks**:
-1. Implement case generator using historical distributions
-    * Case type distribution (CRP: 20.1%, CA: 20%, etc.)
-    * Filing rate (monthly inflow from temporal analysis)
-    * Initial stage assignment
-2. Generate 2-year case pool (~10,000 cases)
-3. Assign readiness scores and attributes
-**Outputs**:
-* `scheduler/data/case_generator.py`
-* Synthetic case dataset for simulation
-### Phase 3: Simulation Engine (Days 5-7)
-**Goal**: Build discrete event simulation framework
-**Tasks**:
-1. Implement SimPy environment setup
-2. Create courtroom resources (5 courtrooms)
-3. Implement case lifecycle process
-    * Stage progression using transition probabilities
-    * Duration sampling from distributions
-    * Adjournment modeling (stochastic)
-4. Implement daily scheduling loop
-5. Add case inflow/outflow dynamics
-**Outputs**:
-* `scheduler/simulation/engine.py`
-* `scheduler/simulation/events.py`
-* Working simulation (baseline)
-### Phase 4: Scheduling Policies (Days 8-10)
-**Goal**: Implement multiple scheduling algorithms
-**Tasks**:
-1. Base scheduler interface
-2. FIFO scheduler (baseline)
-3. Priority-based scheduler
-    * Use case age as primary factor
-    * Use case type as secondary
-4. Readiness-score scheduler
-    * Use EDA-computed readiness scores
-    * Apply urgency weights
-5. Compare policies on metrics
-**Outputs**:
-* `scheduler/simulation/scheduler.py` (interface)
-* `scheduler/simulation/policies/` (implementations)
-* Performance comparison report
-### Phase 5: Optimization Model (Days 11-14)
-**Goal**: Implement OR-Tools-based optimal scheduler
-**Tasks**:
-1. Define decision variables
-    * hearing_slots[case, date, court] ∈ {0,1}
-2. Implement constraints
-    * Daily capacity per courtroom
-    * Case can only be in one court per day
-    * Minimum gap between hearings
-    * Stage progression requirements
-3. Implement objective functions
-    * Fairness: Minimize age variance
-    * Efficiency: Maximize utilization
-    * Urgency: Prioritize ready cases
-4. Multi-objective optimization (weighted sum)
-5. Solve for 30-day scheduling window (rolling)
-**Outputs**:
-* `scheduler/optimization/model.py`
-* `scheduler/optimization/objectives.py`
-* `scheduler/optimization/constraints.py`
-* Optimized scheduling policy
-### Phase 6: Metrics & Validation (Days 15-16)
-**Goal**: Comprehensive performance evaluation
-**Tasks**:
-1. Implement fairness metrics
-    * Gini coefficient of disposal times
-    * Age variance within case types
-    * Max age tracking
-2. Implement efficiency metrics
-    * Court utilization rate
-    * Average disposal time
-    * Throughput (cases/month)
-3. Implement urgency metrics
-    * Readiness score coverage
-    * High-priority case delay
-4. Compare all policies
-5. Validate against historical data
-**Outputs**:
-* `scheduler/metrics/` (all modules)
-* Validation report
-* Policy comparison matrix
-### Phase 7: Dashboard (Days 17-18)
-**Goal**: Interactive visualization and monitoring
-**Tasks**:
-1. Streamlit dashboard setup
-2. Real-time queue visualization
-3. Judge workload display
-4. Alert system for long-pending cases
-5. What-if scenario analysis
-6. Export capability (cause lists as PDF/CSV)
-**Outputs**:
-* `scheduler/visualization/dashboard.py`
-* Interactive web interface
-* User documentation
-### Phase 8: Polish & Documentation (Days 19-20)
-**Goal**: Production-ready system
-**Tasks**:
-1. Unit tests (pytest)
-2. Integration tests
-3. Performance benchmarking
-4. Comprehensive documentation
-5. Example notebooks
-6. Deployment guide
-**Outputs**:
-* Test suite (90%+ coverage)
-* Documentation (README, API docs)
-* Example usage notebooks
-* Final presentation materials
-## Key Design Decisions
-### 1. Hybrid Approach
-**Decision**: Use simulation for long-term dynamics, optimization for short-term scheduling
-**Rationale**: Simulation captures stochastic nature (adjournments, case progression), optimization finds optimal daily schedules within constraints
-### 2. Rolling Optimization Window
-**Decision**: Optimize 30-day windows, re-optimize weekly
-**Rationale**: Balance computational cost with scheduling quality, allow for dynamic adjustments
-### 3. Stage-Based Progression Model
-**Decision**: Model cases as finite state machines with probabilistic transitions
-**Rationale**: Matches our EDA findings (strong stage patterns), enables realistic progression
-### 4. Multi-Objective Weighting
-**Decision**: Fairness (40%), Efficiency (30%), Urgency (30%)
-**Rationale**: Prioritize fairness slightly, balance with practical concerns
-### 5. Capacity Model
-**Decision**: Use median capacity (151 cases/court/day) with seasonal adjustment
-**Rationale**: Conservative estimate from EDA, account for vacation periods
-## Parameter Utilization from EDA
-| EDA Output | Scheduler Use |
-|------------|---------------|
-| stage_transition_probs.csv | Case progression probabilities |
-| stage_duration.csv | Duration sampling (median, p90) |
-| court_capacity_global.json | Daily capacity constraints |
-| adjournment_proxies.csv | Hearing outcome probabilities |
-| cases_features.csv | Initial readiness scores |
-| case_type_summary.csv | Case type distributions |
-| monthly_hearings.csv | Seasonal adjustment factors |
-| correlations_spearman.csv | Feature importance weights |
-## Assumptions Made Explicit
-### Court Operations
-1. **Working days**: 192 days/year (from Karnataka HC calendar)
-2. **Courtrooms**: 5 courtrooms, each with 1 judge
-3. **Daily capacity**: 151 hearings/court/day (median from EDA)
-4. **Hearing duration**: Not modeled explicitly (capacity is count-based)
-5. **Case queue assignment**: By case type (RSA → Court 1, CRP → Court 2, etc.)
-### Case Dynamics
-1. **Filing rate**: ~6,000 cases/year (derived from historical data)
-2. **Disposal rate**: Matches filing rate (steady-state assumption)
-3. **Stage progression**: Probabilistic (Markov chain from EDA)
-4. **Adjournment rate**: 36-48% depending on stage and case type
-5. **Case readiness**: Computed from hearings, gaps, and stage
-### Scheduling Constraints
-1. **Minimum gap**: 7 days between hearings for same case
-2. **Maximum gap**: 90 days (alert triggered)
-3. **Urgent cases**: 5% of pool marked urgent (jump queue)
-4. **Judge preferences**: Not modeled (future enhancement)
-5. **Multi-judge benches**: Not modeled (all single-judge)
-### Simplifications
-1. **No lawyer availability**: Assumed all advocates always available
-2. **No case dependencies**: Each case independent
-3. **No physical constraints**: Assume sufficient courtrooms/facilities
-4. **Deterministic durations**: Within-hearing time not modeled
-5. **Perfect information**: All case attributes known
-## Success Criteria
-### Fairness Metrics
-* Gini coefficient < 0.4 (disposal time inequality)
-* Age variance reduction: 20% vs FIFO baseline
-* No case unlisted > 90 days without alert
-### Efficiency Metrics
-* Court utilization > 85%
-* Average disposal time: Within 10% of historical median by case type
-* Throughput: Match or exceed filing rate
-### Urgency Metrics
-* High-readiness cases: 80% scheduled within 14 days
-* Urgent cases: 95% scheduled within 7 days
-* Alert response: 100% of flagged cases reviewed
-## Risk Mitigation
-### Technical Risks
-1. **Optimization solver timeout**: Use heuristics as fallback
-2. **Memory constraints**: Batch processing for large case pools
-3. **Stochastic variability**: Run multiple simulation replications
-### Model Risks
-1. **Parameter drift**: Allow manual parameter overrides
-2. **Edge cases**: Implement rule-based fallbacks
-3. **Unexpected patterns**: Continuous monitoring and adjustment
-## Future Enhancements
-### Short-term
-1. Judge preference modeling
-2. Multi-judge bench support
-3. Case dependency tracking
-4. Lawyer availability constraints
-### Medium-term
-1. Machine learning for duration prediction
-2. Automated parameter updates from live data
-3. Real-time integration with eCourts
-4. Mobile app for judges
-### Long-term
-1. Multi-court coordination (district + high court)
-2. Predictive analytics for case outcomes
-3. Resource optimization (judges, courtrooms)
-4. National deployment framework
-## Deliverables Checklist
-- [ ] Scheduler module (fully functional)
-- [ ] Parameter loader (tested with EDA outputs)
-- [ ] Case generator (realistic synthetic data)
-- [ ] Simulation engine (2-year simulation capability)
-- [ ] Multiple scheduling policies (FIFO, Priority, Optimized)
-- [ ] Optimization model (OR-Tools implementation)
-- [ ] Metrics framework (fairness, efficiency, urgency)
-- [ ] Dashboard (Streamlit web interface)
-- [ ] Validation report (comparison vs historical data)
-- [ ] Documentation (comprehensive)
-- [ ] Test suite (90%+ coverage)
-- [ ] Example notebooks (usage demonstrations)
-- [ ] Presentation materials (slides, demo video)
-## Timeline Summary
-| Phase | Days | Key Deliverable |
-|-------|------|----------------|
-| Foundation | 1-2 | Parameter loader, core entities |
-| Case Generation | 3-4 | Synthetic case dataset |
-| Simulation | 5-7 | Working SimPy simulation |
-| Policies | 8-10 | Multiple scheduling algorithms |
-| Optimization | 11-14 | OR-Tools optimal scheduler |
-| Metrics | 15-16 | Validation and comparison |
-| Dashboard | 17-18 | Interactive visualization |
-| Polish | 19-20 | Tests, docs, deployment |
-**Total**: 20 days (aggressive timeline, assumes full-time focus)
-## Next Immediate Actions
-1. Create scheduler module directory structure
-2. Implement parameter loader (read all EDA CSVs/JSONs)
-3. Define core entities (Case, Courtroom, Judge, Hearing)
-4. Set up development environment with uv
-5. Initialize git repository with proper .gitignore
-6. Create initial unit tests
-***
-**Plan Version**: 1.0
-**Created**: 2025-11-19
-**Status**: Ready to begin implementation

DEVELOPMENT.md DELETED Viewed

@@ -1,270 +0,0 @@
-# Court Scheduling System - Development Documentation
-Living document tracking architectural decisions, implementation rationale, and design patterns.
-## Table of Contents
-1. [Ripeness Classification System](#ripeness-classification-system)
-2. [Simulation Architecture](#simulation-architecture)
-3. [Code Quality Standards](#code-quality-standards)
----
-## Ripeness Classification System
-### Overview
-The ripeness classifier determines whether cases are ready for substantive judicial time or have bottlenecks that prevent meaningful progress. This addresses hackathon requirement: "Determine how cases could be classified as 'ripe' or 'unripe' based on purposes of hearing and stage."
-### Implementation Location
-- **Classifier**: `scheduler/core/ripeness.py`
-- **Integration**: `scheduler/simulation/engine.py` (lines 248-266)
-- **Case entity**: `scheduler/core/case.py` (ripeness fields: lines 68-72)
-### Classification Algorithm
-The `RipenessClassifier.classify()` method uses a 5-step hierarchy:
-```python
-def classify(case: Case, current_date: datetime) -> RipenessStatus:
-    # 1. Check last hearing purpose for explicit bottleneck keywords
-    if "SUMMONS" in last_hearing_purpose or "NOTICE" in last_hearing_purpose:
-        return UNRIPE_SUMMONS
-    if "STAY" in last_hearing_purpose or "PENDING" in last_hearing_purpose:
-        return UNRIPE_DEPENDENT
-    # 2. Check stage - ADMISSION stage with few hearings is likely unripe
-    if current_stage == "ADMISSION" and hearing_count < 3:
-        return UNRIPE_SUMMONS
-    # 3. Check if case is "stuck" (many hearings but no progress)
-    if hearing_count > 10 and avg_gap > 60 days:
-        return UNRIPE_PARTY
-    # 4. Check stage-based ripeness (ripe stages are substantive)
-    if current_stage in ["ARGUMENTS", "EVIDENCE", "ORDERS / JUDGMENT", "FINAL DISPOSAL"]:
-        return RIPE
-    # 5. Default to RIPE if no bottlenecks detected
-    return RIPE
-```
-### Ripeness Statuses
-| Status | Meaning | Example Scenarios |
-|--------|---------|-------------------|
-| `RIPE` | Ready for substantive hearing | Arguments scheduled, evidence ready, parties available |
-| `UNRIPE_SUMMONS` | Waiting for summons service | "ISSUE SUMMONS", "FOR NOTICE", admission <3 hearings |
-| `UNRIPE_DEPENDENT` | Waiting for dependent case/order | "STAY APPLICATION PENDING", awaiting higher court |
-| `UNRIPE_PARTY` | Party/lawyer unavailable | Stuck cases (>10 hearings, avg gap >60 days) |
-| `UNRIPE_DOCUMENT` | Missing documents/evidence | (Future: when document tracking added) |
-| `UNKNOWN` | Insufficient data | (Rare, only if case has no history) |
-### Integration with Simulation
-**Daily scheduling flow** (engine.py `_choose_cases_for_day()`):
-```python
-# 1. Get all active cases
-candidates = [c for c in cases if c.status != DISPOSED]
-# 2. Update age and readiness scores
-for c in candidates:
-    c.update_age(current_date)
-    c.compute_readiness_score()
-# 3. Filter by ripeness (NEW - critical for bottleneck detection)
-ripe_candidates = []
-for c in candidates:
-    ripeness = RipenessClassifier.classify(c, current_date)
-    if ripeness.is_ripe():
-        ripe_candidates.append(c)
-    else:
-        unripe_filtered_count += 1
-# 4. Apply MIN_GAP_BETWEEN_HEARINGS filter
-eligible = [c for c in ripe_candidates if c.is_ready_for_scheduling(14)]
-# 5. Prioritize by policy (FIFO/age/readiness)
-eligible = policy.prioritize(eligible, current_date)
-# 6. Allocate to courtrooms
-allocations = allocator.allocate(eligible[:total_capacity], current_date)
-```
-**Key points**:
-- Ripeness evaluation happens BEFORE gap enforcement
-- Unripe cases are completely filtered out (no scheduling)
-- Periodic re-evaluation every 7 days to detect ripeness transitions
-- Ripeness status stored in case entity for persistence
-### Ripeness Transitions
-Cases can transition between statuses as bottlenecks are resolved:
-```python
-# Periodic re-evaluation (every 7 days in simulation)
-def _evaluate_ripeness(current_date):
-    for case in active_cases:
-        prev_status = case.ripeness_status
-        new_status = RipenessClassifier.classify(case, current_date)
-        if new_status != prev_status:
-            ripeness_transitions += 1
-            if new_status.is_ripe():
-                case.mark_ripe(current_date)
-                # Case now eligible for scheduling
-            else:
-                case.mark_unripe(new_status, reason, current_date)
-                # Case removed from scheduling pool
-```
-### Synthetic Data Generation
-To test ripeness in simulation, the case generator (`case_generator.py`) adds realistic `last_hearing_purpose` values:
-```python
-# 20% of cases have bottlenecks (configurable)
-bottleneck_purposes = [
-    "ISSUE SUMMONS",
-    "FOR NOTICE",
-    "AWAIT SERVICE OF NOTICE",
-    "STAY APPLICATION PENDING",
-    "FOR ORDERS",
-]
-ripe_purposes = [
-    "ARGUMENTS",
-    "HEARING",
-    "FINAL ARGUMENTS",
-    "FOR JUDGMENT",
-    "EVIDENCE",
-]
-# Stage-aware assignment
-if stage == "ADMISSION" and hearing_count < 3:
-    # 40% unripe for early admission cases
-    last_hearing_purpose = random.choice(bottleneck_purposes if random() < 0.4 else ripe_purposes)
-elif stage in ["ARGUMENTS", "ORDERS / JUDGMENT"]:
-    # Advanced stages usually ripe
-    last_hearing_purpose = random.choice(ripe_purposes)
-else:
-    # 20% unripe for other cases
-    last_hearing_purpose = random.choice(bottleneck_purposes if random() < 0.2 else ripe_purposes)
-```
-### Expected Behavior
-For a simulation with 10,000 synthetic cases:
-- **If all cases RIPE**:
-  - Ripeness transitions: 0
-  - Cases filtered: 0
-  - All eligible cases can be scheduled
-- **With realistic bottlenecks (20% unripe)**:
-  - Ripeness transitions: ~50-200 (cases becoming ripe/unripe during simulation)
-  - Cases filtered per day: ~200-400 (unripe cases blocked from scheduling)
-  - Scheduling queue smaller (only ripe cases compete for slots)
-### Why Default is RIPE
-The classifier defaults to RIPE (step 5) because:
-1. **Conservative approach**: If we can't detect a bottleneck, assume case is ready
-2. **Avoid false negatives**: Better to schedule a case that might adjourn than never schedule it
-3. **Real-world behavior**: Most cases in advanced stages are ripe
-4. **Gap enforcement still applies**: Even RIPE cases must respect MIN_GAP_BETWEEN_HEARINGS
-### Future Enhancements
-1. **Historical purpose analysis**: Mine actual PurposeOfHearing data to refine keyword mappings
-2. **Machine learning**: Train classifier on labeled cases (ripe/unripe) from court data
-3. **Document tracking**: Integrate with document management system for UNRIPE_DOCUMENT detection
-4. **Dependency graphs**: Model case dependencies explicitly for UNRIPE_DEPENDENT
-5. **Dynamic thresholds**: Learn optimal thresholds (e.g., <3 hearings, >60 day gaps) from data
-### Metrics Tracked
-The simulation reports:
-- `ripeness_transitions`: Number of status changes during simulation
-- `unripe_filtered`: Total cases blocked from scheduling due to unripeness
-- `ripeness_distribution`: Breakdown of active cases by status at simulation end
-### Decision Rationale
-**Why separate ripeness from MIN_GAP_BETWEEN_HEARINGS?**
-- Ripeness = substantive bottleneck (summons, dependencies, parties)
-- Gap = administrative constraint (give time for preparation)
-- Conceptually distinct; ripeness can last weeks/months, gap is fixed 14 days
-**Why mark cases as unripe vs. just skip them?**
-- Persistence enables tracking and reporting
-- Dashboard can show WHY cases weren't scheduled
-- Alerts can trigger when unripeness duration exceeds threshold
-**Why evaluate ripeness every 7 days vs. every day?**
-- Performance optimization (classification has some cost)
-- Ripeness typically doesn't change daily (summons takes weeks)
-- Balance between responsiveness and efficiency
----
-## Simulation Architecture
-### Discrete Event Simulation Flow
-(TODO: Document daily processing, stochastic outcomes, stage transitions)
----
-## Code Quality Standards
-### Type Hints
-Modern Python 3.11+ syntax:
-- `X | None` instead of `Optional[X]`
-- `list[X]` instead of `List[X]`
-- `dict[K, V]` instead of `Dict[K, V]`
-### Import Organization
-- Absolute imports from `scheduler.*` for internal modules
-- Inline imports prohibited (all imports at top of file)
-- Lazy imports only for TYPE_CHECKING blocks
-### Performance Guidelines
-- Use Polars-native operations (avoid `.map_elements()`)
-- Cache expensive computations (see `param_loader._build_*` pattern)
-- Profile before optimizing
----
-## Known Issues and Fixes
-### Fixed: "Cases switched courtrooms" metric
-**Problem**: Initial allocations were counted as "switches"
-**Fix**: Changed condition to `courtroom_id is not None and courtroom_id != 0`
-**Commit**: [TODO]
-### Fixed: All cases showing RIPE in synthetic data
-**Problem**: Generator didn't include `last_hearing_purpose`
-**Fix**: Added stage-aware purpose assignment in `case_generator.py`
-**Commit**: [TODO]
----
-## Recent Updates (2025-11-25)
-### Algorithm Override System Fixed
-- **Fixed circular dependency**: Moved `SchedulerPolicy` from `scheduler.simulation.scheduler` to `scheduler.core.policy`
-- **Implemented missing overrides**: ADD_CASE and PRIORITY overrides now fully functional
-- **Added override validation**: `OverrideValidator` integrated with proper constraint checking
-- **Extended Override dataclass**: Added algorithm-required fields (`make_ripe`, `new_position`, `new_priority`, `new_capacity`)
-- **Judge Preferences**: Added `capacity_overrides` for per-courtroom capacity control
-### System Status Update
-- **Project completion**: 90% complete (not 50% as previously estimated)
-- **All core hackathon requirements**: Implemented and tested
-- **Production readiness**: System ready for Karnataka High Court pilot deployment
-- **Performance validated**: 81.4% disposal rate, perfect load balance (Gini 0.002)
----
-Last updated: 2025-11-25

HACKATHON_SUBMISSION.md CHANGED Viewed

@@ -68,21 +68,21 @@ After completion, you'll find in your output directory:
 ```
 data/hackathon_run/
-├── pipeline_config.json          # Full configuration used
-├── training_cases.csv             # Generated case dataset
-├── trained_rl_agent.pkl           # Trained RL model
-├── EXECUTIVE_SUMMARY.md           # Hackathon submission summary
-├── COMPARISON_REPORT.md           # Detailed performance comparison
-├── simulation_rl/                 # RL policy results
-│   ├── events.csv
-│   ├── metrics.csv
-│   ├── report.txt
-│   └── cause_lists/
-│       └── daily_cause_list.csv   # 730 days of cause lists
-├── simulation_readiness/          # Baseline results
-│   └── ...
-└── visualizations/                # Performance charts
-    └── performance_charts.md
 ```
 ### Hackathon Winning Features

 ```
 data/hackathon_run/
+|-- pipeline_config.json          # Full configuration used
+|-- training_cases.csv            # Generated case dataset
+|-- trained_rl_agent.pkl          # Trained RL model
+|-- EXECUTIVE_SUMMARY.md          # Hackathon submission summary
+|-- COMPARISON_REPORT.md          # Detailed performance comparison
+|-- simulation_rl/                # RL policy results
+    |-- events.csv
+    |-- metrics.csv
+    |-- report.txt
+    |-- cause_lists/
+        |-- daily_cause_list.csv  # 730 days of cause lists
+|-- simulation_readiness/         # Baseline results
+    |-- ...
+|-- visualizations/               # Performance charts
+    |-- performance_charts.md
 ```
 ### Hackathon Winning Features

PIPELINE.md DELETED Viewed

@@ -1,259 +0,0 @@
-# Court Scheduling System - Pipeline Documentation
-This document outlines the complete development and deployment pipeline for the intelligent court scheduling system.
-## Project Structure
-```
-code4change-analysis/
-├── configs/                    # Configuration files
-│   ├── rl_training_fast.json   # Fast RL training config
-│   └── rl_training_intensive.json # Intensive RL training config
-├── court_scheduler/            # CLI interface (legacy)
-├── Data/                       # Raw data files
-│   ├── court_data.duckdb       # DuckDB database
-│   ├── ISDMHack_Cases_WPfinal.csv
-│   └── ISDMHack_Hear.csv
-├── data/generated/             # Generated datasets
-│   ├── cases.csv               # Standard test cases
-│   └── large_training_cases.csv # Large RL training set
-├── models/                     # Trained RL models
-│   ├── trained_rl_agent.pkl    # Standard trained agent
-│   └── intensive_trained_rl_agent.pkl # Intensive trained agent
-├── reports/figures/            # EDA outputs and parameters
-│   └── v0.4.0_*/              # Versioned analysis runs
-│       └── params/            # Simulation parameters
-├── rl/                        # Reinforcement Learning module
-│   ├── __init__.py            # Module interface
-│   ├── simple_agent.py        # Tabular Q-learning agent
-│   ├── training.py           # Training environment
-│   └── README.md             # RL documentation
-├── scheduler/                 # Core scheduling system
-│   ├── core/                 # Base entities and algorithms
-│   ├── data/                 # Data loading and generation
-│   └── simulation/           # Simulation engine and policies
-├── scripts/                  # Utility scripts
-│   ├── compare_policies.py   # Policy comparison framework
-│   ├── generate_cases.py     # Case generation utility
-│   └── simulate.py          # Single simulation runner
-├── src/                      # EDA pipeline
-│   ├── run_eda.py           # Full EDA pipeline
-│   ├── eda_config.py        # EDA configuration
-│   ├── eda_load_clean.py    # Data loading and cleaning
-│   ├── eda_exploration.py   # Exploratory analysis
-│   └── eda_parameters.py    # Parameter extraction
-├── tests/                    # Test suite
-├── train_rl_agent.py        # RL training script
-└── README.md               # Main documentation
-```
-## Pipeline Overview
-### 1. Data Pipeline
-#### EDA and Parameter Extraction
-```bash
-# Run full EDA pipeline
-uv run python src/run_eda.py
-```
-**Outputs:**
-- Parameter CSVs in `reports/figures/v0.4.0_*/params/`
-- Visualization HTML files
-- Cleaned data in Parquet format
-**Key Parameters Generated:**
-- `stage_duration.csv` - Duration statistics per stage
-- `stage_transition_probs.csv` - Transition probabilities
-- `adjournment_proxies.csv` - Adjournment rates by stage/type
-- `court_capacity_global.json` - Court capacity metrics
-#### Case Generation
-```bash
-# Generate training dataset
-uv run python scripts/generate_cases.py \
-  --start 2023-01-01 --end 2024-06-30 \
-  --n 10000 --stage-mix auto \
-  --out data/generated/large_cases.csv
-```
-### 2. Model Training Pipeline
-#### RL Agent Training
-```bash
-# Fast training (development)
-uv run python train_rl_agent.py --config configs/rl_training_fast.json
-# Production training
-uv run python train_rl_agent.py --config configs/rl_training_intensive.json
-```
-**Training Process:**
-1. Load configuration parameters
-2. Initialize TabularQAgent with specified hyperparameters
-3. Run episodic training with case generation
-4. Save trained model to `models/` directory
-5. Generate learning statistics and analysis
-### 3. Evaluation Pipeline
-#### Single Policy Simulation
-```bash
-uv run python scripts/simulate.py \
-  --cases-csv data/generated/large_cases.csv \
-  --policy rl --days 90 --seed 42
-```
-#### Multi-Policy Comparison
-```bash
-uv run python scripts/compare_policies.py \
-  --cases-csv data/generated/large_cases.csv \
-  --days 90 --policies readiness rl fifo age
-```
-**Outputs:**
-- Simulation reports in `runs/` directory
-- Performance metrics (disposal rates, utilization)
-- Comparison analysis markdown
-## Configuration Management
-### RL Training Configurations
-#### Fast Training (`configs/rl_training_fast.json`)
-```json
-{
-  "episodes": 20,
-  "cases_per_episode": 200,
-  "episode_length": 15,
-  "learning_rate": 0.2,
-  "initial_epsilon": 0.5,
-  "model_name": "fast_rl_agent.pkl"
-}
-```
-#### Intensive Training (`configs/rl_training_intensive.json`)
-```json
-{
-  "episodes": 100,
-  "cases_per_episode": 1000,
-  "episode_length": 45,
-  "learning_rate": 0.15,
-  "initial_epsilon": 0.4,
-  "model_name": "intensive_rl_agent.pkl"
-}
-```
-### Parameter Override
-```bash
-# Override specific parameters
-uv run python train_rl_agent.py \
-  --episodes 50 \
-  --learning-rate 0.12 \
-  --epsilon 0.3 \
-  --model-name "custom_agent.pkl"
-```
-## Scheduling Policies
-### Available Policies
-1. **FIFO** - First In, First Out scheduling
-2. **Age** - Prioritize older cases
-3. **Readiness** - Composite score (age + readiness + urgency)
-4. **RL** - Reinforcement learning based prioritization
-### Policy Integration
-All policies implement the `SchedulerPolicy` interface:
-- `prioritize(cases, current_date)` - Main scheduling logic
-- `get_name()` - Policy identifier
-- `requires_readiness_score()` - Readiness computation flag
-## Performance Benchmarks
-### Current Results (10,000 cases, 90 days)
-| Policy | Disposal Rate | Utilization | Gini Coefficient |
-|--------|---------------|-------------|------------------|
-| Readiness | 51.9% | 85.7% | 0.243 |
-| RL Agent | 52.1% | 85.4% | 0.248 |
-**Status**: Performance parity achieved between RL and expert heuristic
-## Development Workflow
-### 1. Feature Development
-```bash
-# Create feature branch
-git checkout -b feature/new-scheduling-policy
-# Implement changes
-# Run tests
-uv run python -m pytest tests/
-# Validate with simulation
-uv run python scripts/simulate.py --policy new_policy --days 30
-```
-### 2. Model Iteration
-```bash
-# Update training config
-vim configs/rl_training_custom.json
-# Retrain model
-uv run python train_rl_agent.py --config configs/rl_training_custom.json
-# Evaluate performance
-uv run python scripts/compare_policies.py --policies readiness rl
-```
-### 3. Production Deployment
-```bash
-# Run full EDA pipeline
-uv run python src/run_eda.py
-# Generate production dataset
-uv run python scripts/generate_cases.py --n 50000 --out data/production/cases.csv
-# Train production model
-uv run python train_rl_agent.py --config configs/rl_training_intensive.json
-# Validate performance
-uv run python scripts/compare_policies.py --cases-csv data/production/cases.csv
-```
-## Quality Assurance
-### Testing Framework
-```bash
-# Run all tests
-uv run python -m pytest tests/
-# Test specific component
-uv run python -m pytest tests/test_invariants.py
-# Validate system integration
-uv run python test_phase1.py
-```
-### Performance Validation
-- Disposal rate benchmarks
-- Utilization efficiency metrics
-- Load balancing fairness (Gini coefficient)
-- Case coverage verification
-## Monitoring and Maintenance
-### Key Metrics to Monitor
-- Model performance degradation
-- State space exploration coverage
-- Training convergence metrics
-- Simulation runtime performance
-### Model Refresh Cycle
-1. Monthly EDA pipeline refresh
-2. Quarterly model retraining
-3. Annual architecture review
-This pipeline ensures reproducible, configurable, and maintainable court scheduling system development and deployment.

RL_EXPLORATION_PLAN.md DELETED Viewed

Binary file (14.9 kB)

SUBMISSION_SUMMARY.md DELETED Viewed

@@ -1,417 +0,0 @@
-# Court Scheduling System - Hackathon Submission Summary
-**Karnataka High Court Case Scheduling Optimization**
-**Code4Change Hackathon 2025**
----
-## Executive Summary
-This system simulates and optimizes court case scheduling for Karnataka High Court over a 2-year period, incorporating intelligent ripeness classification, dynamic multi-courtroom allocation, and data-driven priority scheduling.
-### Key Results (500-day simulation, 10,000 cases)
-- **81.4% disposal rate** - Significantly higher than baseline
-- **97.7% cases scheduled** - Near-zero case abandonment
-- **68.9% hearing success rate** - Effective adjournment management
-- **45% utilization** - Realistic capacity usage accounting for workload variation
-- **0.002 Gini (load balance)** - Perfect fairness across courtrooms
-- **40.8% unripe filter rate** - Intelligent bottleneck detection preventing wasted judicial time
----
-## System Architecture
-### 1. Ripeness Classification System
-**Problem**: Courts waste time on cases with unresolved bottlenecks (summons not served, parties unavailable, documents pending).
-**Solution**: Data-driven classifier filters cases into RIPE vs UNRIPE:
-| Status | Cases (End) | Meaning |
-|--------|-------------|---------|
-| RIPE | 87.4% | Ready for substantive hearing |
-| UNRIPE_SUMMONS | 9.4% | Waiting for summons/notice service |
-| UNRIPE_DEPENDENT | 3.2% | Waiting for dependent case/order |
-**Algorithm**:
-1. Check last hearing purpose for bottleneck keywords
-2. Flag early ADMISSION cases (<3 hearings) as potentially unripe
-3. Detect "stuck" cases (>10 hearings, >60 day gaps)
-4. Stage-based classification (ARGUMENTS → RIPE)
-5. Default to RIPE if no bottlenecks detected
-**Impact**:
-- Filtered 93,834 unripe case-day combinations (40.8% filter rate)
-- Prevented wasteful hearings that would adjourn immediately
-- Optimized judicial time for cases ready to progress
-### 2. Dynamic Multi-Courtroom Allocation
-**Problem**: Static courtroom assignments create workload imbalances and inefficiency.
-**Solution**: Load-balanced allocator distributes cases evenly across 5 courtrooms daily.
-**Results**:
-- Perfect load balance (Gini = 0.002)
-- Courtroom loads: 67.6-68.3 cases/day (±0.5%)
-- 101,260 allocation decisions over 401 working days
-- Zero capacity rejections
-**Strategy**:
-- Least-loaded courtroom selection
-- Dynamic reallocation as workload changes
-- Respects per-courtroom capacity (151 cases/day)
-### 3. Intelligent Priority Scheduling
-**Policy**: Readiness-based with adjournment boost
-**Formula**:
-```
-priority = age*0.35 + readiness*0.25 + urgency*0.25 + adjournment_boost*0.15
-```
-**Components**:
-- **Age (35%)**: Fairness - older cases get priority
-- **Readiness (25%)**: Efficiency - cases with more hearings/advanced stages prioritized
-- **Urgency (25%)**: Critical cases (medical, custodial) fast-tracked
-- **Adjournment boost (15%)**: Recently adjourned cases boosted to prevent indefinite postponement
-**Adjournment Boost Decay**:
-- Exponential decay: `boost = exp(-days_since_hearing / 21)`
-- Day 7: 71% boost (strong)
-- Day 14: 50% boost (moderate)
-- Day 21: 37% boost (weak)
-- Day 28: 26% boost (very weak)
-**Impact**:
-- Balanced fairness (old cases progress) with efficiency (recent cases complete)
-- 31.1% adjournment rate (realistic given court dynamics)
-- Average 20.9 hearings to disposal (efficient case progression)
-### 4. Stochastic Simulation Engine
-**Design**: Discrete event simulation with probabilistic outcomes
-**Daily Flow**:
-1. Evaluate ripeness for all active cases (every 7 days)
-2. Filter by ripeness status (RIPE only)
-3. Apply MIN_GAP_BETWEEN_HEARINGS (14 days)
-4. Prioritize by policy
-5. Allocate to courtrooms (capacity-constrained)
-6. Execute hearings with stochastic outcomes:
-   - 68.9% heard → stage progression possible
-   - 31.1% adjourned → reschedule
-7. Check disposal probability (case-type-aware, maturity-based)
-8. Record metrics and events
-**Data-Driven Parameters**:
-- Adjournment probabilities by stage × case type (from historical data)
-- Stage transition probabilities (from Karnataka HC data)
-- Stage duration distributions (median, p90)
-- Case-type-specific disposal patterns
-### 5. Comprehensive Metrics Framework
-**Tracked Metrics**:
-- **Fairness**: Gini coefficient, age variance, disposal equity
-- **Efficiency**: Utilization, throughput, disposal time
-- **Ripeness**: Transitions, filter rate, bottleneck breakdown
-- **Allocation**: Load variance, courtroom balance
-- **No-case-left-behind**: Coverage, max gap, alert triggers
-**Outputs**:
-- `metrics.csv`: Daily time-series (date, scheduled, heard, adjourned, disposals, utilization)
-- `events.csv`: Full audit trail (scheduling, outcomes, stage changes, disposals, ripeness changes)
-- `report.txt`: Comprehensive simulation summary
----
-## Disposal Performance by Case Type
-| Case Type | Disposed | Total | Rate |
-|-----------|----------|-------|------|
-| CP (Civil Petition) | 833 | 963 | **86.5%** |
-| CMP (Miscellaneous) | 237 | 275 | **86.2%** |
-| CA (Civil Appeal) | 1,676 | 1,949 | **86.0%** |
-| CCC | 978 | 1,147 | **85.3%** |
-| CRP (Civil Revision) | 1,750 | 2,062 | **84.9%** |
-| RSA (Regular Second Appeal) | 1,488 | 1,924 | **77.3%** |
-| RFA (Regular First Appeal) | 1,174 | 1,680 | **69.9%** |
-**Analysis**:
-- Short-lifecycle cases (CP, CMP, CA) achieve 85%+ disposal
-- Complex appeals (RFA, RSA) have lower disposal rates (expected behavior - require more hearings)
-- System correctly prioritizes case complexity in disposal logic
----
-## No-Case-Left-Behind Verification
-**Requirement**: Ensure no case is forgotten in 2-year simulation.
-**Results**:
-- **97.7% scheduled at least once** (9,766/10,000)
-- **2.3% never scheduled** (234 cases)
-  - Reason: Newly filed cases near simulation end + capacity constraints
-  - All were RIPE and eligible, just lower priority than older cases
-- **0 cases stuck >90 days** in active pool (forced scheduling not triggered)
-**Tracking Mechanism**:
-- `last_scheduled_date` field on every case
-- `days_since_last_scheduled` counter
-- Alert thresholds: 60 days (yellow), 90 days (red, forced scheduling)
-**Validation**: Zero red alerts over 500 days confirms effective coverage.
----
-## Courtroom Utilization Analysis
-**Overall Utilization**: 45.0%
-**Why Not 100%?**
-1. **Ripeness filtering**: 40.8% of candidate case-days filtered as unripe
-2. **Gap enforcement**: MIN_GAP_BETWEEN_HEARINGS (14 days) prevents immediate rescheduling
-3. **Case progression**: As cases dispose, pool shrinks (10,000 → 1,864 active by end)
-4. **Realistic constraint**: Courts don't operate at theoretical max capacity
-**Daily Load Variation**:
-- Max: 151 cases/courtroom (full capacity, early days)
-- Min: 27 cases/courtroom (late simulation, many disposed)
-- Avg: 68 cases/courtroom (healthy sustainable load)
-**Comparison to Real Courts**:
-- Real Karnataka HC utilization: ~40-50% (per industry reports)
-- Simulation: 45% (matches reality)
----
-## Key Features Implemented
-### ✅ Phase 4: Ripeness Classification
-- 5-step hierarchical classifier
-- Keyword-based bottleneck detection
-- Stage-aware classification
-- Periodic re-evaluation (every 7 days)
-- 93,834 unripe cases filtered over 500 days
-### ✅ Phase 5: Dynamic Multi-Courtroom Allocation
-- Load-balanced allocator
-- Perfect fairness (Gini 0.002)
-- Zero capacity rejections
-- 101,260 allocation decisions
-### ✅ Phase 9: Advanced Scheduling Policy
-- Readiness-based composite priority
-- Adjournment boost with exponential decay
-- Data-driven adjournment probabilities
-- Case-type-aware disposal logic
-### ✅ Phase 10: Comprehensive Metrics
-- Fairness metrics (Gini, age variance)
-- Efficiency metrics (utilization, throughput)
-- Ripeness metrics (transitions, filter rate)
-- Disposal metrics (rate by case type)
-- No-case-left-behind tracking
----
-## Technical Excellence
-### Code Quality
-- Modern Python 3.11+ type hints (`X | None`, `list[X]`)
-- Clean architecture: separation of concerns (core, simulation, data, metrics)
-- Comprehensive documentation (DEVELOPMENT.md)
-- No inline imports
-- Polars-native operations (performance optimized)
-### Testing
-- Validated against historical Karnataka HC data
-- Stochastic simulations with multiple seeds
-- Metrics match real-world court behavior
-- Edge cases handled (new filings, disposal, adjournments)
-### Performance
-- 500-day simulation: ~30 seconds
-- 136,303 hearings simulated
-- 10,000 cases tracked
-- Event-level audit trail maintained
----
-## Data Gap Analysis
-### Current Limitations
-Our synthetic data lacks:
-1. Summons service status
-2. Case dependency information
-3. Lawyer/party availability
-4. Document completeness tracking
-5. Actual hearing duration
-### Proposed Enrichments
-Courts should capture:
-| Field | Type | Justification | Impact |
-|-------|------|---------------|--------|
-| `summons_service_status` | Enum | Enable precise UNRIPE_SUMMONS detection | -15% wasted hearings |
-| `dependent_case_ids` | List[str] | Model case dependencies explicitly | -10% premature scheduling |
-| `lawyer_registered` | bool | Track lawyer availability | -8% party absence adjournments |
-| `party_attendance_rate` | float | Predict party no-shows | -12% party absence adjournments |
-| `documents_submitted` | int | Track document readiness | -7% document delay adjournments |
-| `estimated_hearing_duration` | int | Better capacity planning | +20% utilization |
-| `bottleneck_type` | Enum | Explicit bottleneck tracking | +25% ripeness accuracy |
-| `priority_flag` | Enum | Judge-set priority overrides | +30% urgent case throughput |
-**Expected Combined Impact**:
-- 40% reduction in adjournments due to bottlenecks
-- 20% increase in utilization
-- 50% improvement in ripeness classification accuracy
----
-## Additional Features Implemented
-### Daily Cause List Generator - COMPLETE
-- CSV cause lists generated per courtroom per day (`scheduler/output/cause_list.py`)
-- Export format includes: Date, Courtroom, Case_ID, Case_Type, Stage, Sequence
-- Comprehensive statistics and no-case-left-behind verification
-- Script available: `scripts/generate_all_cause_lists.py`
-### Judge Override System - CORE COMPLETE
-- Complete API for judge control (`scheduler/control/overrides.py`)
-- ADD_CASE, REMOVE_CASE, PRIORITY, REORDER, RIPENESS overrides implemented
-- Override validation and audit trail system
-- Judge preferences for capacity control
-- UI component pending (backend fully functional)
-### No-Case-Left-Behind Verification - COMPLETE
-- Built-in tracking system in case entity
-- Alert thresholds: 60 days (warning), 90 days (critical)
-- 97.7% coverage achieved (9,766/10,000 cases scheduled)
-- Comprehensive verification reports generated
-### Remaining Enhancements
-- **Interactive Dashboard**: Streamlit UI for visualization and control
-- **Real-time Alerts**: Email/SMS notification system
-- **Advanced Visualizations**: Sankey diagrams, heatmaps
----
-## Validation Against Requirements
-### Step 2: Data-Informed Modelling ✅
-**Requirement**: "Determine how cases could be classified as 'ripe' or 'unripe'"
-- **Delivered**: 5-step ripeness classifier with 3 bottleneck types
-- **Evidence**: 40.8% filter rate, 93,834 unripe cases blocked
-**Requirement**: "Identify gaps in current data capture"
-- **Delivered**: 8 proposed synthetic fields with justification
-- **Document**: Data Gap Analysis section above
-### Step 3: Algorithm Development ✅
-**Requirement**: "Allocates cases dynamically across multiple simulated courtrooms"
-- **Delivered**: Load-balanced allocator, Gini 0.002
-- **Evidence**: 101,260 allocations, perfect balance
-**Requirement**: "Simulates case progression over a two-year period"
-- **Delivered**: 500-day simulation (18 months)
-- **Evidence**: 136,303 hearings, 8,136 disposals
-**Requirement**: "Ensures no case is left behind"
-- **Delivered**: 97.7% coverage, 0 red alerts
-- **Evidence**: Comprehensive tracking system
----
-## Conclusion
-This Court Scheduling System demonstrates a production-ready solution for Karnataka High Court's case management challenges. By combining intelligent ripeness classification, dynamic allocation, and data-driven priority scheduling, the system achieves:
-- **High disposal rate** (81.4%) through bottleneck filtering and adjournment management
-- **Perfect fairness** (Gini 0.002) via load-balanced allocation
-- **Near-complete coverage** (97.7%) ensuring no case abandonment
-- **Realistic performance** (45% utilization) matching real-world court operations
-The system is **ready for pilot deployment** with Karnataka High Court, with clear pathways for enhancement through cause list generation, judge overrides, and interactive dashboards.
----
-## Repository Structure
-```
-code4change-analysis/
-├── scheduler/               # Core simulation engine
-│   ├── core/               # Case, Courtroom, Judge entities
-│   │   ├── case.py         # Case entity with priority scoring
-│   │   ├── ripeness.py     # Ripeness classifier
-│   │   └── ...
-│   ├── simulation/         # Simulation engine
-│   │   ├── engine.py       # Main simulation loop
-│   │   ├── allocator.py    # Multi-courtroom allocator
-│   │   ├── policies/       # Scheduling policies
-│   │   └── ...
-│   ├── data/               # Data generation and loading
-│   │   ├── case_generator.py  # Synthetic case generator
-│   │   ├── param_loader.py    # Historical data parameters
-│   │   └── ...
-│   └── metrics/            # Performance metrics
-│
-├── data/                   # Data files
-│   ├── generated/          # Synthetic cases
-│   └── full_simulation/    # Simulation outputs
-│       ├── report.txt      # Comprehensive report
-│       ├── metrics.csv     # Daily time-series
-│       └── events.csv      # Full audit trail
-│
-├── main.py                 # CLI entry point
-├── DEVELOPMENT.md          # Technical documentation
-├── SUBMISSION_SUMMARY.md   # This document
-└── README.md               # Quick start guide
-```
----
-## Usage
-### Quick Start
-```bash
-# Install dependencies
-uv sync
-# Generate test cases
-uv run python main.py generate --cases 10000
-# Run 2-year simulation
-uv run python main.py simulate --days 500 --cases data/generated/cases.csv
-# View results
-cat data/sim_runs/*/report.txt
-```
-### Full Pipeline
-```bash
-# End-to-end workflow
-uv run python main.py workflow --cases 10000 --days 500
-```
----
-## Contact
-**Team**: [Your Name/Team Name]
-**Institution**: [Your Institution]
-**Email**: [Your Email]
-**GitHub**: [Repository URL]
----
-**Last Updated**: 2025-11-25
-**Simulation Version**: 1.0
-**Status**: Production Ready - Hackathon Submission Complete

SYSTEM_WORKFLOW.md DELETED Viewed

@@ -1,642 +0,0 @@
-# Court Scheduling System - Complete Workflow & Logic Flow
-**Step-by-Step Guide: How the System Actually Works**
----
-## Table of Contents
-1. [System Workflow Overview](#system-workflow-overview)
-2. [Phase 1: Data Preparation](#phase-1-data-preparation)
-3. [Phase 2: Simulation Initialization](#phase-2-simulation-initialization)
-4. [Phase 3: Daily Scheduling Loop](#phase-3-daily-scheduling-loop)
-5. [Phase 4: Output Generation](#phase-4-output-generation)
-6. [Phase 5: Analysis & Reporting](#phase-5-analysis--reporting)
-7. [Complete Example Walkthrough](#complete-example-walkthrough)
-8. [Data Flow Pipeline](#data-flow-pipeline)
----
-## System Workflow Overview
-The Court Scheduling System operates in **5 sequential phases** that transform historical court data into optimized daily cause lists:
-```
-Historical Data → Data Preparation → Simulation Setup → Daily Scheduling → Output Generation → Analysis
-     ↓                    ↓                 ↓               ↓                    ↓              ↓
-739K hearings      Parameters &       Initialized      Daily cause        CSV files &     Performance
-134K cases         Generated cases    simulation       lists for 384      Reports          metrics
-```
-**Key Outputs:**
-- **Daily Cause Lists**: CSV files for each courtroom/day
-- **Simulation Report**: Overall performance summary
-- **Metrics File**: Daily performance tracking
-- **Individual Case Audit**: Complete hearing history
----
-## Phase 1: Data Preparation
-### Step 1.1: Historical Data Analysis (EDA Pipeline)
-**Input**:
-- `ISDMHack_Case.csv` (134,699 cases)
-- `ISDMHack_Hear.csv` (739,670 hearings)
-**Process**:
-```python
-# Load and merge historical data
-cases_df = pd.read_csv("ISDMHack_Case.csv")
-hearings_df = pd.read_csv("ISDMHack_Hear.csv")
-merged_data = cases_df.merge(hearings_df, on="Case_ID")
-# Extract key parameters
-case_type_distribution = cases_df["Type"].value_counts(normalize=True)
-stage_transitions = calculate_stage_progression_probabilities(merged_data)
-adjournment_rates = calculate_adjournment_rates_by_stage(hearings_df)
-daily_capacity = hearings_df.groupby("Hearing_Date").size().mean()
-```
-**Output**:
-```python
-# Extracted parameters stored in config.py
-CASE_TYPE_DISTRIBUTION = {"CRP": 0.201, "CA": 0.200, ...}
-STAGE_TRANSITIONS = {"ADMISSION->ARGUMENTS": 0.72, ...}
-ADJOURNMENT_RATES = {"ADMISSION": 0.38, "ARGUMENTS": 0.31, ...}
-DEFAULT_DAILY_CAPACITY = 151  # cases per courtroom per day
-```
-### Step 1.2: Synthetic Case Generation
-**Input**:
-- Configuration: `configs/generate.sample.toml`
-- Extracted parameters from Step 1.1
-**Process**:
-```python
-# Generate 10,000 synthetic cases
-for i in range(10000):
-    case = Case(
-        case_id=f"C{i:06d}",
-        case_type=random_choice_weighted(CASE_TYPE_DISTRIBUTION),
-        filed_date=random_date_in_range("2022-01-01", "2023-12-31"),
-        current_stage=random_choice_weighted(STAGE_DISTRIBUTION),
-        is_urgent=random_boolean(0.05),  # 5% urgent cases
-    )
-    # Add realistic hearing history
-    generate_hearing_history(case, historical_patterns)
-    cases.append(case)
-```
-**Output**:
-- `data/generated/cases.csv` with 10,000 synthetic cases
-- Each case has realistic attributes based on historical patterns
----
-## Phase 2: Simulation Initialization
-### Step 2.1: Load Configuration
-**Input**: `configs/simulate.sample.toml`
-```toml
-cases = "data/generated/cases.csv"
-days = 384                    # 2-year simulation
-policy = "readiness"          # Scheduling policy
-courtrooms = 5
-daily_capacity = 151
-```
-### Step 2.2: Initialize System State
-**Process**:
-```python
-# Load generated cases
-cases = load_cases_from_csv("data/generated/cases.csv")
-# Initialize courtrooms
-courtrooms = [
-    Courtroom(id=1, daily_capacity=151),
-    Courtroom(id=2, daily_capacity=151),
-    # ... 5 courtrooms total
-]
-# Initialize scheduling policy
-policy = ReadinessPolicy(
-    fairness_weight=0.4,
-    efficiency_weight=0.3,
-    urgency_weight=0.3
-)
-# Initialize simulation clock
-current_date = datetime(2023, 12, 29)  # Start date
-end_date = current_date + timedelta(days=384)
-```
-**Output**:
-- Simulation environment ready with 10,000 cases and 5 courtrooms
-- Policy configured with optimization weights
----
-## Phase 3: Daily Scheduling Loop
-**This is the core algorithm that runs 384 times (once per working day)**
-### Daily Loop Structure
-```python
-for day in range(384):  # Each working day for 2 years
-    current_date += timedelta(days=1)
-    # Skip weekends and holidays
-    if not is_working_day(current_date):
-        continue
-    # Execute daily scheduling algorithm
-    daily_result = schedule_daily_hearings(cases, current_date)
-    # Update system state for next day
-    update_case_states(cases, daily_result)
-    # Generate daily outputs
-    generate_cause_lists(daily_result, current_date)
-```
-### Step 3.1: Daily Scheduling Algorithm (Core Logic)
-**INPUT**:
-- All active cases (initially 10,000)
-- Current date
-- Courtroom capacities
-**CHECKPOINT 1: Case Status Filtering**
-```python
-# Filter out disposed cases
-active_cases = [case for case in all_cases
-                if case.status in [PENDING, SCHEDULED]]
-print(f"Day {day}: {len(active_cases)} active cases")
-# Example: Day 1: 10,000 active cases → Day 200: 6,500 active cases
-```
-**CHECKPOINT 2: Case Attribute Updates**
-```python
-for case in active_cases:
-    # Update age (days since filing)
-    case.age_days = (current_date - case.filed_date).days
-    # Update readiness score based on stage and hearing history
-    case.readiness_score = calculate_readiness(case)
-    # Update days since last scheduled
-    if case.last_scheduled_date:
-        case.days_since_last_scheduled = (current_date - case.last_scheduled_date).days
-```
-**CHECKPOINT 3: Ripeness Classification (Critical Filter)**
-```python
-ripe_cases = []
-ripeness_stats = {"RIPE": 0, "UNRIPE_SUMMONS": 0, "UNRIPE_DEPENDENT": 0, "UNRIPE_PARTY": 0}
-for case in active_cases:
-    ripeness = RipenessClassifier.classify(case, current_date)
-    ripeness_stats[ripeness.status] += 1
-    if ripeness.is_ripe():
-        ripe_cases.append(case)
-    else:
-        case.bottleneck_reason = ripeness.reason
-print(f"Ripeness Filter: {len(active_cases)} → {len(ripe_cases)} cases")
-# Example: 6,500 active → 3,850 ripe cases (40.8% filtered out)
-```
-**Ripeness Classification Logic**:
-```python
-def classify(case, current_date):
-    # Step 1: Check explicit bottlenecks in last hearing purpose
-    if "SUMMONS" in case.last_hearing_purpose:
-        return RipenessStatus.UNRIPE_SUMMONS
-    if "STAY" in case.last_hearing_purpose:
-        return RipenessStatus.UNRIPE_DEPENDENT
-    # Step 2: Early admission cases likely waiting for service
-    if case.current_stage == "ADMISSION" and case.hearing_count < 3:
-        return RipenessStatus.UNRIPE_SUMMONS
-    # Step 3: Detect stuck cases (many hearings, no progress)
-    if case.hearing_count > 10 and case.avg_gap_days > 60:
-        return RipenessStatus.UNRIPE_PARTY
-    # Step 4: Advanced stages are usually ready
-    if case.current_stage in ["ARGUMENTS", "EVIDENCE", "ORDERS / JUDGMENT"]:
-        return RipenessStatus.RIPE
-    # Step 5: Conservative default
-    return RipenessStatus.RIPE
-```
-**CHECKPOINT 4: Eligibility Check (Timing Constraints)**
-```python
-eligible_cases = []
-for case in ripe_cases:
-    # Check minimum 14-day gap between hearings
-    if case.last_hearing_date:
-        days_since_last = (current_date - case.last_hearing_date).days
-        if days_since_last < MIN_GAP_BETWEEN_HEARINGS:
-            continue
-    eligible_cases.append(case)
-print(f"Eligibility Filter: {len(ripe_cases)} → {len(eligible_cases)} cases")
-# Example: 3,850 ripe → 3,200 eligible cases
-```
-**CHECKPOINT 5: Priority Scoring (Policy Application)**
-```python
-for case in eligible_cases:
-    # Multi-factor priority calculation
-    age_component = min(case.age_days / 365, 1.0) * 0.35
-    readiness_component = case.readiness_score * 0.25
-    urgency_component = (1.0 if case.is_urgent else 0.5) * 0.25
-    boost_component = calculate_adjournment_boost(case) * 0.15
-    case.priority_score = age_component + readiness_component + urgency_component + boost_component
-# Sort by priority (highest first)
-prioritized_cases = sorted(eligible_cases, key=lambda c: c.priority_score, reverse=True)
-```
-**CHECKPOINT 6: Judge Overrides (Optional)**
-```python
-if daily_overrides:
-    # Apply ADD_CASE overrides (highest priority)
-    for override in add_case_overrides:
-        case_to_add = find_case_by_id(override.case_id)
-        prioritized_cases.insert(override.new_position, case_to_add)
-    # Apply REMOVE_CASE overrides
-    for override in remove_case_overrides:
-        prioritized_cases = [c for c in prioritized_cases if c.case_id != override.case_id]
-    # Apply PRIORITY overrides
-    for override in priority_overrides:
-        case = find_case_in_list(prioritized_cases, override.case_id)
-        case.priority_score = override.new_priority
-    # Re-sort after priority changes
-    prioritized_cases.sort(key=lambda c: c.priority_score, reverse=True)
-```
-**CHECKPOINT 7: Multi-Courtroom Allocation**
-```python
-# Load balancing algorithm
-courtroom_loads = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0}
-daily_schedule = {1: [], 2: [], 3: [], 4: [], 5: []}
-for case in prioritized_cases:
-    # Find least loaded courtroom
-    target_courtroom = min(courtroom_loads.items(), key=lambda x: x[1])[0]
-    # Check capacity constraint
-    if courtroom_loads[target_courtroom] >= DEFAULT_DAILY_CAPACITY:
-        # All courtrooms at capacity, remaining cases unscheduled
-        break
-    # Assign case to courtroom
-    daily_schedule[target_courtroom].append(case)
-    courtroom_loads[target_courtroom] += 1
-    case.last_scheduled_date = current_date
-total_scheduled = sum(len(cases) for cases in daily_schedule.values())
-print(f"Allocation: {total_scheduled} cases scheduled across 5 courtrooms")
-# Example: 703 cases scheduled (5 × 140-141 per courtroom)
-```
-**CHECKPOINT 8: Generate Explanations**
-```python
-explanations = {}
-for courtroom_id, cases in daily_schedule.items():
-    for i, case in enumerate(cases):
-        urgency_text = "HIGH URGENCY" if case.is_urgent else "standard urgency"
-        stage_text = f"{case.current_stage.lower()} stage"
-        assignment_text = f"assigned to Courtroom {courtroom_id}"
-        explanations[case.case_id] = f"{urgency_text} | {stage_text} | {assignment_text}"
-```
-### Step 3.2: Case State Updates (After Each Day)
-```python
-def update_case_states(cases, daily_result):
-    for case in cases:
-        if case.case_id in daily_result.scheduled_cases:
-            # Case was scheduled today
-            case.status = CaseStatus.SCHEDULED
-            case.hearing_count += 1
-            case.last_hearing_date = current_date
-            # Simulate hearing outcome
-            if random.random() < get_adjournment_rate(case.current_stage):
-                # Case adjourned - stays in same stage
-                case.history.append({
-                    "date": current_date,
-                    "outcome": "ADJOURNED",
-                    "next_hearing": current_date + timedelta(days=21)
-                })
-            else:
-                # Case heard - may progress to next stage or dispose
-                if should_progress_stage(case):
-                    case.current_stage = get_next_stage(case.current_stage)
-                if should_dispose(case):
-                    case.status = CaseStatus.DISPOSED
-                    case.disposal_date = current_date
-        else:
-            # Case not scheduled today
-            case.days_since_last_scheduled += 1
-```
----
-## Phase 4: Output Generation
-### Step 4.1: Daily Cause List Generation
-**For each courtroom and each day**:
-```python
-# Generate cause_list_courtroom_1_2024-01-15.csv
-def generate_daily_cause_list(courtroom_id, date, scheduled_cases):
-    cause_list = []
-    for i, case in enumerate(scheduled_cases):
-        cause_list.append({
-            "Date": date.strftime("%Y-%m-%d"),
-            "Courtroom_ID": courtroom_id,
-            "Case_ID": case.case_id,
-            "Case_Type": case.case_type,
-            "Stage": case.current_stage,
-            "Purpose": "HEARING",
-            "Sequence_Number": i + 1,
-            "Explanation": explanations[case.case_id]
-        })
-    # Save to CSV
-    df = pd.DataFrame(cause_list)
-    df.to_csv(f"cause_list_courtroom_{courtroom_id}_{date.strftime('%Y-%m-%d')}.csv")
-```
-**Example Output**:
-```csv
-Date,Courtroom_ID,Case_ID,Case_Type,Stage,Purpose,Sequence_Number,Explanation
-2024-01-15,1,C002847,CRP,ARGUMENTS,HEARING,1,"HIGH URGENCY | arguments stage | assigned to Courtroom 1"
-2024-01-15,1,C005123,CA,ADMISSION,HEARING,2,"standard urgency | admission stage | assigned to Courtroom 1"
-2024-01-15,1,C001456,RSA,EVIDENCE,HEARING,3,"standard urgency | evidence stage | assigned to Courtroom 1"
-```
-### Step 4.2: Daily Metrics Tracking
-```python
-def record_daily_metrics(date, daily_result):
-    metrics = {
-        "date": date,
-        "scheduled": daily_result.total_scheduled,
-        "heard": calculate_heard_cases(daily_result),
-        "adjourned": calculate_adjourned_cases(daily_result),
-        "disposed": count_disposed_today(daily_result),
-        "utilization": daily_result.total_scheduled / (COURTROOMS * DEFAULT_DAILY_CAPACITY),
-        "gini_coefficient": calculate_gini_coefficient(courtroom_loads),
-        "ripeness_filtered": daily_result.ripeness_filtered_count
-    }
-    # Append to metrics.csv
-    append_to_csv("metrics.csv", metrics)
-```
-**Example metrics.csv**:
-```csv
-date,scheduled,heard,adjourned,disposed,utilization,gini_coefficient,ripeness_filtered
-2024-01-15,703,430,273,12,0.931,0.245,287
-2024-01-16,698,445,253,15,0.924,0.248,301
-2024-01-17,701,421,280,18,0.928,0.251,294
-```
----
-## Phase 5: Analysis & Reporting
-### Step 5.1: Simulation Summary Report
-**After all 384 days complete**:
-```python
-def generate_simulation_report():
-    total_hearings = sum(daily_metrics["scheduled"])
-    total_heard = sum(daily_metrics["heard"])
-    total_adjourned = sum(daily_metrics["adjourned"])
-    total_disposed = count_disposed_cases()
-    report = f"""
-SIMULATION SUMMARY
-Horizon: {start_date} → {end_date} ({simulation_days} days)
-Case Metrics:
-  Initial cases: {initial_case_count:,}
-  Cases disposed: {total_disposed:,} ({total_disposed/initial_case_count:.1%})
-  Cases remaining: {initial_case_count - total_disposed:,}
-Hearing Metrics:
-  Total hearings: {total_hearings:,}
-  Heard: {total_heard:,} ({total_heard/total_hearings:.1%})
-  Adjourned: {total_adjourned:,} ({total_adjourned/total_hearings:.1%})
-Efficiency Metrics:
-  Disposal rate: {total_disposed/initial_case_count:.1%}
-  Utilization: {avg_utilization:.1%}
-  Gini coefficient: {avg_gini:.3f}
-  Ripeness filtering: {avg_ripeness_filtered/avg_eligible:.1%}
-"""
-    with open("simulation_report.txt", "w") as f:
-        f.write(report)
-```
-### Step 5.2: Performance Analysis
-```python
-# Calculate key performance indicators
-disposal_rate = total_disposed / initial_cases  # Target: >70%
-load_balance = calculate_gini_coefficient(courtroom_loads)  # Target: <0.4
-case_coverage = scheduled_cases / eligible_cases  # Target: >95%
-bottleneck_efficiency = ripeness_filtered / total_cases  # Higher = better filtering
-print(f"PERFORMANCE RESULTS:")
-print(f"Disposal Rate: {disposal_rate:.1%} ({'✓' if disposal_rate > 0.70 else '✗'})")
-print(f"Load Balance: {load_balance:.3f} ({'✓' if load_balance < 0.40 else '✗'})")
-print(f"Case Coverage: {case_coverage:.1%} ({'✓' if case_coverage > 0.95 else '✗'})")
-```
----
-## Complete Example Walkthrough
-Let's trace a single case through the entire system:
-### Case: C002847 (Civil Revision Petition)
-**Day 0: Case Generation**
-```python
-case = Case(
-    case_id="C002847",
-    case_type="CRP",
-    filed_date=date(2022, 03, 15),
-    current_stage="ADMISSION",
-    is_urgent=True,  # Medical emergency
-    hearing_count=0,
-    last_hearing_date=None
-)
-```
-**Day 1: First Scheduling Attempt (2023-12-29)**
-```python
-# Checkpoint 1: Active? YES (status = PENDING)
-# Checkpoint 2: Updates
-case.age_days = 654  # Almost 2 years old
-case.readiness_score = 0.3  # Low (admission stage)
-# Checkpoint 3: Ripeness
-ripeness = classify(case, current_date)  # UNRIPE_SUMMONS (admission stage, 0 hearings)
-# Result: FILTERED OUT (not scheduled)
-```
-**Day 45: Second Attempt (2024-02-26)**
-```python
-# Case now has 3 hearings, still in admission but making progress
-case.hearing_count = 3
-case.current_stage = "ADMISSION"
-# Checkpoint 3: Ripeness
-ripeness = classify(case, current_date)  # RIPE (>3 hearings in admission)
-# Checkpoint 5: Priority Scoring
-age_component = min(689 / 365, 1.0) * 0.35 = 0.35
-readiness_component = 0.4 * 0.25 = 0.10
-urgency_component = 1.0 * 0.25 = 0.25  # HIGH URGENCY
-boost_component = 0.0 * 0.15 = 0.0
-case.priority_score = 0.70  # High priority
-# Checkpoint 7: Allocation
-# Assigned to Courtroom 1 (least loaded), Position 3
-# Result: SCHEDULED
-```
-**Daily Cause List Entry**:
-```csv
-2024-02-26,1,C002847,CRP,ADMISSION,HEARING,3,"HIGH URGENCY | admission stage | assigned to Courtroom 1"
-```
-**Hearing Outcome**:
-```python
-# Simulated outcome: Case heard successfully, progresses to ARGUMENTS
-case.current_stage = "ARGUMENTS"
-case.hearing_count = 4
-case.last_hearing_date = date(2024, 2, 26)
-case.history.append({
-    "date": date(2024, 2, 26),
-    "outcome": "HEARD",
-    "stage_progression": "ADMISSION → ARGUMENTS"
-})
-```
-**Day 125: Arguments Stage (2024-06-15)**
-```python
-# Case now in arguments, higher readiness
-case.current_stage = "ARGUMENTS"
-case.readiness_score = 0.8  # High (arguments stage)
-# Priority calculation
-age_component = 0.35  # Still max age
-readiness_component = 0.8 * 0.25 = 0.20  # Higher
-urgency_component = 0.25  # Still urgent
-boost_component = 0.0
-case.priority_score = 0.80  # Very high priority
-# Result: Scheduled in Position 1 (highest priority)
-```
-**Final Disposal (Day 200: 2024-09-15)**
-```python
-# After multiple hearings in arguments stage
-case.current_stage = "ORDERS / JUDGMENT"
-case.hearing_count = 12
-# Hearing outcome: Case disposed
-case.status = CaseStatus.DISPOSED
-case.disposal_date = date(2024, 9, 15)
-case.total_lifecycle_days = (disposal_date - filed_date).days  # 549 days
-```
----
-## Data Flow Pipeline
-### Complete Data Transformation Chain
-```
-1. Historical CSV Files (Raw Data)
-   ├── ISDMHack_Case.csv (134,699 rows × 24 columns)
-   └── ISDMHack_Hear.csv (739,670 rows × 31 columns)
-2. Parameter Extraction (EDA Analysis)
-   ├── case_type_distribution.json
-   ├── stage_transition_probabilities.json
-   ├── adjournment_rates_by_stage.json
-   └── daily_capacity_statistics.json
-3. Synthetic Case Generation
-   └── cases.csv (10,000 rows × 15 columns)
-       ├── Case_ID, Case_Type, Filed_Date
-       ├── Current_Stage, Is_Urgent, Hearing_Count
-       └── Last_Hearing_Date, Last_Purpose
-4. Daily Scheduling Loop (384 iterations)
-   ├── Day 1: cases.csv → ripeness_filter → 6,850 → eligible_filter → 5,200 → priority_sort → allocate → 703 scheduled
-   ├── Day 2: updated_cases → ripeness_filter → 6,820 → eligible_filter → 5,180 → priority_sort → allocate → 698 scheduled
-   └── Day 384: updated_cases → ripeness_filter → 2,100 → eligible_filter → 1,950 → priority_sort → allocate → 421 scheduled
-5. Daily Output Generation (per day × 5 courtrooms)
-   ├── cause_list_courtroom_1_2024-01-15.csv (140 rows)
-   ├── cause_list_courtroom_2_2024-01-15.csv (141 rows)
-   ├── cause_list_courtroom_3_2024-01-15.csv (140 rows)
-   ├── cause_list_courtroom_4_2024-01-15.csv (141 rows)
-   └── cause_list_courtroom_5_2024-01-15.csv (141 rows)
-6. Aggregated Metrics
-   ├── metrics.csv (384 rows × 8 columns)
-   ├── simulation_report.txt (summary statistics)
-   └── case_audit_trail.csv (complete hearing history)
-```
-### Data Volume at Each Stage
-- **Input**: 874K+ historical records
-- **Generated**: 10K synthetic cases
-- **Daily Processing**: ~6K cases evaluated daily
-- **Daily Output**: ~700 scheduled cases/day
-- **Total Output**: ~42K total cause list entries
-- **Final Reports**: 384 daily metrics + summary reports
----
-**Key Takeaways:**
-1. **Ripeness filtering** removes 40.8% of cases daily (most critical efficiency gain)
-2. **Priority scoring** ensures fairness while handling urgent cases
-3. **Load balancing** achieves near-perfect distribution (Gini 0.002)
-4. **Daily loop** processes 6,000+ cases in seconds with multi-objective optimization
-5. **Complete audit trail** tracks every case decision for transparency
----
-**Last Updated**: 2025-11-25
-**Version**: 1.0
-**Status**: Production Ready

TECHNICAL_IMPLEMENTATION.md DELETED Viewed

@@ -1,658 +0,0 @@
-# Court Scheduling System - Technical Implementation Documentation
-**Complete Implementation Guide for Code4Change Hackathon Submission**
----
-## Table of Contents
-1. [System Overview](#system-overview)
-2. [Architecture & Design](#architecture--design)
-3. [Configuration Management](#configuration-management)
-4. [Core Algorithms](#core-algorithms)
-5. [Data Models](#data-models)
-6. [Decision Logic](#decision-logic)
-7. [Input/Output Specifications](#inputoutput-specifications)
-8. [Deployment & Usage](#deployment--usage)
-9. [Assumptions & Constraints](#assumptions--constraints)
----
-## System Overview
-### Purpose
-Production-ready court scheduling system for Karnataka High Court that optimizes daily cause lists across multiple courtrooms while ensuring fairness, efficiency, and judicial control.
-### Key Achievements
-- **81.4% Disposal Rate** - Exceeds baseline expectations
-- **Perfect Load Balance** - Gini coefficient 0.002 across courtrooms
-- **97.7% Case Coverage** - Near-zero case abandonment
-- **Smart Bottleneck Detection** - 40.8% unripe cases filtered
-- **Complete Judge Control** - Override system with audit trails
-### Technology Stack
-```toml
-# Core Dependencies (from pyproject.toml)
-dependencies = [
-    "pandas>=2.2",      # Data manipulation
-    "polars>=1.30",     # High-performance data processing
-    "plotly>=6.0",      # Visualization
-    "numpy>=2.0",       # Numerical computing
-    "simpy>=4.1",       # Discrete event simulation
-    "typer>=0.12",      # CLI interface
-    "pydantic>=2.0",    # Data validation
-    "scipy>=1.14",      # Statistical algorithms
-    "streamlit>=1.28",  # Dashboard (future)
-]
-```
----
-## Architecture & Design
-### System Architecture
-```
-Court Scheduling System
-├── Core Domain Layer (scheduler/core/)
-│   ├── case.py           # Case entity with lifecycle management
-│   ├── courtroom.py      # Courtroom resource management
-│   ├── ripeness.py       # Bottleneck detection classifier
-│   ├── policy.py         # Scheduling policy interface
-│   └── algorithm.py      # Main scheduling algorithm
-├── Simulation Engine (scheduler/simulation/)
-│   ├── engine.py         # Discrete event simulation
-│   ├── allocator.py      # Multi-courtroom load balancer
-│   └── policies/         # FIFO, Age, Readiness policies
-├── Data Management (scheduler/data/)
-│   ├── param_loader.py   # Historical parameter loading
-│   ├── case_generator.py # Synthetic case generation
-│   └── config.py         # System configuration
-├── Control Systems (scheduler/control/)
-│   └── overrides.py      # Judge override & audit system
-├── Output Generation (scheduler/output/)
-│   └── cause_list.py     # Daily cause list CSV generation
-└── Analysis Tools (src/, scripts/)
-    ├── EDA pipeline      # Historical data analysis
-    └── Validation tools  # Performance verification
-```
-### Design Principles
-1. **Clean Architecture** - Domain-driven design with clear layer separation
-2. **Production Ready** - Type hints, error handling, comprehensive logging
-3. **Data-Driven** - All parameters extracted from 739K+ historical hearings
-4. **Judge Autonomy** - Complete override system with audit trails
-5. **Scalable** - Supports multiple courtrooms, thousands of cases
----
-## Configuration Management
-### Primary Configuration (scheduler/data/config.py)
-```python
-# Court Operational Constants
-WORKING_DAYS_PER_YEAR = 192        # Karnataka HC calendar
-COURTROOMS = 5                      # Number of courtrooms
-SIMULATION_DAYS = 384              # 2-year simulation period
-# Scheduling Constraints
-MIN_GAP_BETWEEN_HEARINGS = 14      # Days between hearings
-MAX_GAP_WITHOUT_ALERT = 90         # Alert threshold
-DEFAULT_DAILY_CAPACITY = 151       # Cases per courtroom per day
-# Case Type Distribution (from EDA)
-CASE_TYPE_DISTRIBUTION = {
-    "CRP": 0.201,  # Civil Revision Petition (most common)
-    "CA": 0.200,   # Civil Appeal
-    "RSA": 0.196,  # Regular Second Appeal
-    "RFA": 0.167,  # Regular First Appeal
-    "CCC": 0.111,  # Civil Contempt Petition
-    "CP": 0.096,   # Civil Petition
-    "CMP": 0.028,  # Civil Miscellaneous Petition
-}
-# Multi-objective Optimization Weights
-FAIRNESS_WEIGHT = 0.4   # Age-based fairness priority
-EFFICIENCY_WEIGHT = 0.3 # Readiness-based efficiency
-URGENCY_WEIGHT = 0.3    # High-priority case handling
-```
-### TOML Configuration Files
-#### Case Generation (configs/generate.sample.toml)
-```toml
-n_cases = 10000
-start = "2022-01-01"
-end = "2023-12-31"
-output = "data/generated/cases.csv"
-seed = 42
-```
-#### Simulation (configs/simulate.sample.toml)
-```toml
-cases = "data/generated/cases.csv"
-days = 384
-policy = "readiness"     # readiness|fifo|age
-seed = 42
-courtrooms = 5
-daily_capacity = 151
-```
-#### Parameter Sweep (configs/parameter_sweep.toml)
-```toml
-[sweep]
-simulation_days = 500
-policies = ["fifo", "age", "readiness"]
-# Dataset variations for comprehensive testing
-[[datasets]]
-name = "baseline"
-cases = 10000
-stage_mix_auto = true
-urgent_percentage = 0.10
-[[datasets]]
-name = "admission_heavy"
-cases = 10000
-stage_mix = { "ADMISSION" = 0.70, "ARGUMENTS" = 0.15 }
-urgent_percentage = 0.10
-```
----
-## Core Algorithms
-### 1. Ripeness Classification System
-#### Purpose
-Identifies cases with substantive bottlenecks to prevent wasteful scheduling of unready cases.
-#### Algorithm (scheduler/core/ripeness.py)
-```python
-def classify(case: Case, current_date: date) -> RipenessStatus:
-    """5-step hierarchical classifier"""
-    # Step 1: Check hearing purpose for explicit bottlenecks
-    if "SUMMONS" in last_hearing_purpose or "NOTICE" in last_hearing_purpose:
-        return UNRIPE_SUMMONS
-    if "STAY" in last_hearing_purpose or "PENDING" in last_hearing_purpose:
-        return UNRIPE_DEPENDENT
-    # Step 2: Stage analysis - Early admission cases likely unripe
-    if current_stage == "ADMISSION" and hearing_count < 3:
-        return UNRIPE_SUMMONS
-    # Step 3: Detect "stuck" cases (many hearings, no progress)
-    if hearing_count > 10 and avg_gap_days > 60:
-        return UNRIPE_PARTY
-    # Step 4: Stage-based classification
-    if current_stage in ["ARGUMENTS", "EVIDENCE", "ORDERS / JUDGMENT"]:
-        return RIPE
-    # Step 5: Conservative default
-    return RIPE
-```
-#### Ripeness Statuses
-| Status | Meaning | Impact |
-|--------|---------|---------|
-| `RIPE` | Ready for hearing | Eligible for scheduling |
-| `UNRIPE_SUMMONS` | Awaiting summons service | Blocked until served |
-| `UNRIPE_DEPENDENT` | Waiting for dependent case | Blocked until resolved |
-| `UNRIPE_PARTY` | Party/lawyer unavailable | Blocked until responsive |
-### 2. Multi-Courtroom Load Balancing
-#### Algorithm (scheduler/simulation/allocator.py)
-```python
-def allocate(cases: List[Case], current_date: date) -> Dict[str, int]:
-    """Dynamic load-balanced allocation"""
-    allocation = {}
-    courtroom_loads = {room.id: room.get_current_load() for room in courtrooms}
-    for case in cases:
-        # Find least-loaded courtroom
-        target_room = min(courtroom_loads.items(), key=lambda x: x[1])
-        # Assign case and update load
-        allocation[case.case_id] = target_room[0]
-        courtroom_loads[target_room[0]] += 1
-        # Respect capacity constraints
-        if courtroom_loads[target_room[0]] >= room.daily_capacity:
-            break
-    return allocation
-```
-#### Load Balancing Results
-- **Perfect Distribution**: Gini coefficient 0.002
-- **Courtroom Loads**: 67.6-68.3 cases/day (±0.5% variance)
-- **Zero Capacity Violations**: All constraints respected
-### 3. Intelligent Priority Scheduling
-#### Readiness-Based Policy (scheduler/simulation/policies/readiness.py)
-```python
-def prioritize(cases: List[Case], current_date: date) -> List[Case]:
-    """Multi-factor priority calculation"""
-    for case in cases:
-        # Age component (35%) - Fairness
-        age_score = min(case.age_days / 365, 1.0) * 0.35
-        # Readiness component (25%) - Efficiency
-        readiness_score = case.compute_readiness_score() * 0.25
-        # Urgency component (25%) - Critical cases
-        urgency_score = (1.0 if case.is_urgent else 0.5) * 0.25
-        # Adjournment boost (15%) - Prevent indefinite postponement
-        boost_score = case.get_adjournment_boost() * 0.15
-        case.priority_score = age_score + readiness_score + urgency_score + boost_score
-    return sorted(cases, key=lambda c: c.priority_score, reverse=True)
-```
-#### Adjournment Boost Calculation
-```python
-def get_adjournment_boost(self) -> float:
-    """Exponential decay boost for recently adjourned cases"""
-    if not self.last_hearing_date:
-        return 0.0
-    days_since = (current_date - self.last_hearing_date).days
-    return math.exp(-days_since / 21)  # 21-day half-life
-```
-### 4. Judge Override System
-#### Override Types (scheduler/control/overrides.py)
-```python
-class OverrideType(Enum):
-    RIPENESS = "ripeness"      # Override ripeness classification
-    PRIORITY = "priority"      # Adjust case priority
-    ADD_CASE = "add_case"      # Manually add case to list
-    REMOVE_CASE = "remove_case" # Remove case from list
-    REORDER = "reorder"        # Change hearing sequence
-    CAPACITY = "capacity"      # Adjust daily capacity
-```
-#### Validation Logic
-```python
-def validate(self, override: Override) -> bool:
-    """Comprehensive override validation"""
-    if override.override_type == OverrideType.RIPENESS:
-        return self.validate_ripeness_override(override)
-    elif override.override_type == OverrideType.CAPACITY:
-        return self.validate_capacity_override(override)
-    elif override.override_type == OverrideType.PRIORITY:
-        return 0 <= override.new_priority <= 1.0
-    return True
-```
----
-## Data Models
-### Core Case Entity (scheduler/core/case.py)
-```python
-@dataclass
-class Case:
-    # Core Identification
-    case_id: str
-    case_type: str                    # CRP, CA, RSA, etc.
-    filed_date: date
-    # Lifecycle Tracking
-    current_stage: str = "ADMISSION"
-    status: CaseStatus = CaseStatus.PENDING
-    hearing_count: int = 0
-    last_hearing_date: Optional[date] = None
-    # Scheduling Attributes
-    priority_score: float = 0.0
-    readiness_score: float = 0.0
-    is_urgent: bool = False
-    # Ripeness Classification
-    ripeness_status: str = "UNKNOWN"
-    bottleneck_reason: Optional[str] = None
-    ripeness_updated_at: Optional[datetime] = None
-    # No-Case-Left-Behind Tracking
-    last_scheduled_date: Optional[date] = None
-    days_since_last_scheduled: int = 0
-    # Audit Trail
-    history: List[dict] = field(default_factory=list)
-```
-### Override Entity
-```python
-@dataclass
-class Override:
-    # Core Fields
-    override_id: str
-    override_type: OverrideType
-    case_id: str
-    judge_id: str
-    timestamp: datetime
-    reason: str = ""
-    # Type-Specific Fields
-    make_ripe: Optional[bool] = None           # For RIPENESS
-    new_position: Optional[int] = None         # For REORDER/ADD_CASE
-    new_priority: Optional[float] = None       # For PRIORITY
-    new_capacity: Optional[int] = None         # For CAPACITY
-```
-### Scheduling Result
-```python
-@dataclass
-class SchedulingResult:
-    # Core Output
-    scheduled_cases: Dict[int, List[Case]]     # courtroom_id -> cases
-    # Transparency
-    explanations: Dict[str, SchedulingExplanation]
-    applied_overrides: List[Override]
-    # Diagnostics
-    unscheduled_cases: List[Tuple[Case, str]]
-    ripeness_filtered: int
-    capacity_limited: int
-    # Metadata
-    scheduling_date: date
-    policy_used: str
-    total_scheduled: int
-```
----
-## Decision Logic
-### Daily Scheduling Sequence
-```python
-def schedule_day(cases, courtrooms, current_date, overrides=None):
-    """Complete daily scheduling algorithm"""
-    # CHECKPOINT 1: Filter disposed cases
-    active_cases = [c for c in cases if c.status != DISPOSED]
-    # CHECKPOINT 2: Update case attributes
-    for case in active_cases:
-        case.update_age(current_date)
-        case.compute_readiness_score()
-    # CHECKPOINT 3: Ripeness filtering (CRITICAL)
-    ripe_cases = []
-    for case in active_cases:
-        ripeness = RipenessClassifier.classify(case, current_date)
-        if ripeness.is_ripe():
-            ripe_cases.append(case)
-        else:
-            # Track filtered cases for metrics
-            unripe_filtered_count += 1
-    # CHECKPOINT 4: Eligibility check (MIN_GAP_BETWEEN_HEARINGS)
-    eligible_cases = [c for c in ripe_cases
-                     if c.is_ready_for_scheduling(MIN_GAP_DAYS)]
-    # CHECKPOINT 5: Apply scheduling policy
-    prioritized_cases = policy.prioritize(eligible_cases, current_date)
-    # CHECKPOINT 6: Apply judge overrides
-    if overrides:
-        prioritized_cases = apply_overrides(prioritized_cases, overrides)
-    # CHECKPOINT 7: Allocate to courtrooms
-    allocation = allocator.allocate(prioritized_cases, current_date)
-    # CHECKPOINT 8: Generate explanations
-    explanations = generate_explanations(allocation, unscheduled_cases)
-    return SchedulingResult(...)
-```
-### Override Application Logic
-```python
-def apply_overrides(cases: List[Case], overrides: List[Override]) -> List[Case]:
-    """Apply judge overrides in priority order"""
-    result = cases.copy()
-    # 1. Apply ADD_CASE overrides (highest priority)
-    for override in [o for o in overrides if o.override_type == ADD_CASE]:
-        case_to_add = find_case_by_id(override.case_id)
-        if case_to_add and case_to_add not in result:
-            insert_position = override.new_position or 0
-            result.insert(insert_position, case_to_add)
-    # 2. Apply REMOVE_CASE overrides
-    for override in [o for o in overrides if o.override_type == REMOVE_CASE]:
-        result = [c for c in result if c.case_id != override.case_id]
-    # 3. Apply PRIORITY overrides
-    for override in [o for o in overrides if o.override_type == PRIORITY]:
-        case = find_case_in_list(result, override.case_id)
-        if case and override.new_priority is not None:
-            case.priority_score = override.new_priority
-    # 4. Re-sort by updated priorities
-    result.sort(key=lambda c: c.priority_score, reverse=True)
-    # 5. Apply REORDER overrides (final positioning)
-    for override in [o for o in overrides if o.override_type == REORDER]:
-        case = find_case_in_list(result, override.case_id)
-        if case and override.new_position is not None:
-            result.remove(case)
-            result.insert(override.new_position, case)
-    return result
-```
----
-## Input/Output Specifications
-### Input Data Requirements
-#### Historical Data (for parameter extraction)
-- **ISDMHack_Case.csv**: 134,699 cases with 24 attributes
-- **ISDMHack_Hear.csv**: 739,670 hearings with 31 attributes
-- Required fields: Case_ID, Type, Filed_Date, Current_Stage, Hearing_Date, Purpose_Of_Hearing
-#### Generated Case Data (for simulation)
-```python
-# Case generation schema
-Case(
-    case_id="C{:06d}",              # C000001, C000002, etc.
-    case_type=random_choice(types),  # CRP, CA, RSA, etc.
-    filed_date=random_date(range),   # Within specified period
-    current_stage=stage_from_mix,    # Based on distribution
-    is_urgent=random_bool(0.05),     # 5% urgent cases
-    last_hearing_purpose=purpose,    # For ripeness classification
-)
-```
-### Output Specifications
-#### Daily Cause Lists (CSV)
-```csv
-Date,Courtroom_ID,Case_ID,Case_Type,Stage,Purpose,Sequence_Number,Explanation
-2024-01-15,1,C000123,CRP,ARGUMENTS,HEARING,1,"HIGH URGENCY | ready for orders/judgment | assigned to Courtroom 1"
-2024-01-15,1,C000456,CA,ADMISSION,HEARING,2,"standard urgency | admission stage | assigned to Courtroom 1"
-```
-#### Simulation Report (report.txt)
-```
-SIMULATION SUMMARY
-Horizon: 2023-12-29 → 2024-03-21 (60 days)
-Hearing Metrics:
-  Total: 42,193
-  Heard: 26,245 (62.2%)
-  Adjourned: 15,948 (37.8%)
-Disposal Metrics:
-  Cases disposed: 4,401 (44.0%)
-  Gini coefficient: 0.255
-Efficiency:
-  Utilization: 93.1%
-  Avg hearings/day: 703.2
-```
-#### Metrics CSV (metrics.csv)
-```csv
-date,scheduled,heard,adjourned,disposed,utilization,gini_coefficient,ripeness_filtered
-2024-01-15,703,430,273,12,0.931,0.245,287
-2024-01-16,698,445,253,15,0.924,0.248,301
-```
----
-## Deployment & Usage
-### Installation
-```bash
-# Clone repository
-git clone git@github.com:RoyAalekh/hackathon_code4change.git
-cd hackathon_code4change
-# Setup environment
-uv sync
-# Verify installation
-uv run court-scheduler --help
-```
-### CLI Commands
-#### Quick Start
-```bash
-# Generate test cases
-uv run court-scheduler generate --cases 10000 --output data/cases.csv
-# Run simulation
-uv run court-scheduler simulate --cases data/cases.csv --days 384
-# Full pipeline
-uv run court-scheduler workflow --cases 10000 --days 384
-```
-#### Advanced Usage
-```bash
-# Custom policy simulation
-uv run court-scheduler simulate \
-    --cases data/cases.csv \
-    --days 384 \
-    --policy readiness \
-    --seed 42 \
-    --log-dir data/sim_runs/custom
-# Parameter sweep comparison
-uv run python scripts/compare_policies.py
-# Generate cause lists
-uv run python scripts/generate_all_cause_lists.py
-```
-### Configuration Override
-```bash
-# Use custom config file
-uv run court-scheduler simulate --config configs/custom.toml
-# Override specific parameters
-uv run court-scheduler simulate \
-    --cases data/cases.csv \
-    --days 60 \
-    --courtrooms 3 \
-    --daily-capacity 100
-```
----
-## Assumptions & Constraints
-### Operational Assumptions
-#### Court Operations
-1. **Working Days**: 192 days/year (Karnataka HC calendar)
-2. **Courtroom Availability**: 5 courtrooms, single-judge benches
-3. **Daily Capacity**: 151 hearings/courtroom/day (from historical data)
-4. **Hearing Duration**: Not modeled explicitly (capacity is count-based)
-#### Case Dynamics
-1. **Filing Rate**: Steady-state assumption (disposal ≈ filing)
-2. **Stage Progression**: Markovian (history-independent transitions)
-3. **Adjournment Rate**: 31-38% depending on stage and case type
-4. **Case Independence**: No inter-case dependencies modeled
-#### Scheduling Constraints
-1. **Minimum Gap**: 14 days between hearings (same case)
-2. **Maximum Gap**: 90 days triggers alert
-3. **Ripeness Re-evaluation**: Every 7 days
-4. **Judge Availability**: Assumed 100% (no vacation modeling)
-### Technical Constraints
-#### Performance Limits
-- **Case Volume**: Tested up to 15,000 cases
-- **Simulation Period**: Up to 500 working days
-- **Memory Usage**: <500MB for typical workload
-- **Execution Time**: ~30 seconds for 10K cases, 384 days
-#### Data Limitations
-- **No Real-time Integration**: Batch processing only
-- **Synthetic Ripeness Data**: Real purpose-of-hearing analysis needed
-- **Fixed Parameters**: No dynamic learning from outcomes
-- **Single Court Model**: No multi-court coordination
-### Validation Boundaries
-#### Tested Scenarios
-- **Baseline**: 10,000 cases, balanced distribution
-- **Admission Heavy**: 70% early-stage cases (backlog scenario)
-- **Advanced Heavy**: 70% late-stage cases (efficient court)
-- **High Urgency**: 20% urgent cases (medical/custodial heavy)
-- **Large Backlog**: 15,000 cases (capacity stress test)
-#### Success Criteria Met
-- **Disposal Rate**: 81.4% achieved (target: >70%)
-- **Load Balance**: Gini 0.002 (target: <0.4)
-- **Case Coverage**: 97.7% (target: >95%)
-- **Utilization**: 45% (realistic given constraints)
----
-## Performance Benchmarks
-### Execution Performance
-- **EDA Pipeline**: ~2 minutes for 739K hearings
-- **Case Generation**: ~5 seconds for 10K cases
-- **2-Year Simulation**: ~30 seconds for 10K cases
-- **Cause List Generation**: ~10 seconds for 42K hearings
-### Algorithm Efficiency
-- **Ripeness Classification**: O(n) per case, O(n²) total with re-evaluation
-- **Load Balancing**: O(n log k) where n=cases, k=courtrooms
-- **Priority Calculation**: O(n log n) sorting overhead
-- **Override Processing**: O(m·n) where m=overrides, n=cases
-### Memory Usage
-- **Case Objects**: ~1KB per case (10K cases = 10MB)
-- **Simulation State**: ~50MB working memory
-- **Output Generation**: ~100MB for full reports
-- **Total Peak**: <500MB for largest tested scenarios
----
-**Last Updated**: 2025-11-25
-**Version**: 1.0
-**Status**: Production Ready

docs/CONFIGURATION.md CHANGED Viewed

@@ -92,16 +92,15 @@ The codebase uses a layered configuration approach separating concerns by domain
 ```
 Pipeline Execution:
-├── PipelineConfig (workflow orchestration)
-│   ├── RLTrainingConfig (training hyperparameters)
-│   └── Data generation params
-│
-└── Per-Policy Simulation:
-    ├── CourtSimConfig (simulation settings)
-    │   └── rl_agent_path (from training output)
-    │
-    └── Policy instantiation:
-        └── PolicyConfig (policy-specific settings)
 ```
 ## Design Principles
@@ -174,21 +173,21 @@ policy = RLPolicy(agent_path=model_path, policy_config=strict_policy)
 ## Validation Rules
 All config classes validate in `__post_init__`:
-- Value ranges (0 < learning_rate ≤ 1)
 - Type consistency (convert strings to Path)
-- Cross-parameter constraints (max_gap ≥ min_gap)
 - Required file existence (rl_agent_path must exist)
 ## Anti-Patterns
 **DON'T**:
-- ❌ Hardcode magic numbers in algorithms
-- ❌ Use module-level mutable globals
-- ❌ Mix domain constants with tunable parameters
-- ❌ Create "god config" with everything in one class
 **DO**:
-- ✓ Separate by lifecycle and ownership
-- ✓ Validate early (constructor time)
-- ✓ Use dataclasses for immutability
-- ✓ Provide sensible defaults with named presets

 ```
 Pipeline Execution:
+|-- PipelineConfig (workflow orchestration)
+    |-- RLTrainingConfig (training hyperparameters)
+    |-- Data generation params
+|-- Per-Policy Simulation:
+    |-- CourtSimConfig (simulation settings)
+        |-- rl_agent_path (from training output)
+    |-- Policy instantiation:
+        |-- PolicyConfig (policy-specific settings)
 ```
 ## Design Principles
 ## Validation Rules
 All config classes validate in `__post_init__`:
+- Value ranges (0 < learning_rate <= 1)
 - Type consistency (convert strings to Path)
+- Cross-parameter constraints (max_gap >= min_gap)
 - Required file existence (rl_agent_path must exist)
 ## Anti-Patterns
 **DON'T**:
+- Hardcode magic numbers in algorithms
+- Use module-level mutable globals
+- Mix domain constants with tunable parameters
+- Create "god config" with everything in one class
 **DO**:
+- Separate by lifecycle and ownership
+- Validate early (constructor time)
+- Use dataclasses for immutability
+- Provide sensible defaults with named presets

reports/codebase_analysis_2024-07-01.md DELETED Viewed

@@ -1,30 +0,0 @@
-# Court Scheduling System – Comprehensive Codebase Analysis
-## Architecture Snapshot
-- **Unified CLI workflows**: `court_scheduler/cli.py` orchestrates EDA, synthetic case generation, and simulation runs with progress feedback, wiring together the data pipeline and scheduler from one entry point.【F:court_scheduler/cli.py†L1-L200】
-- **Scheduling core**: `SchedulingAlgorithm` remains the central coordinator for ripeness filtering, eligibility checks, prioritization, allocation, and explainability output via `SchedulingResult` dataclass.【F:scheduler/core/algorithm.py†L1-L200】
-- **EDA pipeline**: `src/run_eda.py` drives three stages—load/clean, exploratory visuals, and parameter extraction—by calling `eda_load_clean`, `eda_exploration`, and `eda_parameters` in sequence.【F:src/run_eda.py†L1-L23】 `eda_exploration` loads cleaned Parquet data, converts to pandas, and produces interactive Plotly HTML dashboards and CSV summaries for case mix, temporal trends, stage transitions, and gap distributions.【F:src/eda_exploration.py†L1-L120】
-- **Synthetic data + parameter sources**: `scheduler.data.case_generator` samples stage mixes (optionally from EDA-derived parameters), case types, and working-day seasonality to produce `Case` objects compatible with the scheduler and RL training.【F:scheduler/data/case_generator.py†L1-L120】
-- **RL training stack**: `rl/training.py` wraps a lightweight simulation to train the tabular Q-learning `TabularQAgent`, generating fresh cases per episode and stepping day-by-day to update rewards; `rl/simple_agent.py` encodes cases into 6-D discrete states with epsilon-greedy Q updates and reward shaping for urgency, ripeness, adjournments, and progression.【F:rl/training.py†L1-L200】【F:rl/simple_agent.py†L1-L200】
-## Strengths
-- **End-to-end operability**: The Typer CLI offers cohesive commands for EDA, data generation, and simulation, lowering friction for analysts and operators running the whole workflow.【F:court_scheduler/cli.py†L1-L200】
-- **Transparent scheduling outputs**: `SchedulingResult` captures scheduled cases, unscheduled reasons, ripeness filtering counts, applied overrides, and explanations, supporting audits and downstream dashboards.【F:scheduler/core/algorithm.py†L32-L200】
-- **Reproducible EDA artifacts**: The EDA module saves HTML plots and CSV summaries (e.g., stage durations, transitions) and writes them to versioned run directories, enabling offline review and parameter reuse.【F:src/eda_exploration.py†L1-L120】
-- **Configurable RL experiments**: The RL pipeline isolates hyperparameters in dataclasses and regenerates cases per episode, making it easy to tweak learning rates, epsilon decay, and episode lengths without touching training logic.【F:rl/training.py†L140-L200】【F:rl/simple_agent.py†L41-L160】
-## Risks and Quality Gaps
-1. **Override validation mutates inputs and leaks state across runs**. Invalid overrides are removed from the caller’s list and logged as `(None, reason)` while priority overrides set `_priority_override` on shared `Case` objects without cleanup, so repeated scheduling can inherit stale manual priorities and unscheduled entries with `None` cases complicate consumers.【F:scheduler/core/algorithm.py†L136-L200】
-2. **Ripeness defaults to optimistic**. When no bottleneck keyword or stage hint fires, the classifier returns `RIPE`, and admission-stage cases with ≥3 hearings are marked ripe without service/compliance proof, risking overscheduling unready matters.【F:scheduler/core/ripeness.py†L54-L129】
-3. **Eligibility omits calendar blocks and per-case gap rules**. `_filter_eligible` enforces only the global minimum gap, ignoring judge or courtroom block dates and any per-case gap overrides, so schedules may violate availability assumptions despite capacity adjustments.【F:scheduler/core/algorithm.py†L129-L200】【F:scheduler/control/overrides.py†L103-L169】
-4. **EDA scaling risks**. `eda_exploration` converts full Parquet datasets to pandas DataFrames before plotting, which can exhaust memory on larger extracts and lacks sampling/downcasting safeguards; renderer defaults to "browser", which can fail in headless batch environments.【F:src/eda_exploration.py†L38-L120】
-5. **Training–production gap for RL**. The Q-learning loop trains on a simplified simulation that bypasses the production `SchedulingAlgorithm`, ripeness classifier, and courtroom capacity logic, so learned policies may not transfer. Rewards are computed via a freshly instantiated agent inside the environment, divorcing reward shaping from the training agent’s evolving parameters.【F:rl/training.py†L19-L138】【F:rl/simple_agent.py†L188-L200】
-6. **Configuration robustness**. `get_latest_params_dir` still raises when no versioned params directory exists, blocking fresh environments from running simulations or RL without manual setup or bundled defaults.【F:scheduler/data/config.py†L1-L37】
-## Recommendations
-- Make override handling side-effect-free: validate into separate structures, preserve original override lists for auditing, and clear any temporary priority attributes after use.【F:scheduler/core/algorithm.py†L136-L200】
-- Require affirmative ripeness evidence or add an `UNKNOWN` state so ambiguous cases don’t default to `RIPE`; integrate service/compliance indicators and stage-specific checks before scheduling.【F:scheduler/core/ripeness.py†L54-L129】
-- Enforce calendar constraints and per-case gap overrides in eligibility and allocation to avoid scheduling on blocked dates or ignoring individualized spacing rules.【F:scheduler/core/algorithm.py†L129-L200】【F:scheduler/control/overrides.py†L103-L169】
-- Harden EDA for large datasets: stream or sample before `to_pandas`, allow a static image renderer in headless runs, and gate expensive plots behind flags to keep CLI runs reliable.【F:src/eda_exploration.py†L38-L120】
-- Align RL training with the production scheduler: reuse `SchedulingAlgorithm` or its readiness/ripeness filters inside the training environment, and compute rewards without re-instantiating agents so learning signals match deployed policy behavior.【F:rl/training.py†L19-L138】【F:rl/simple_agent.py†L188-L200】
-- Provide a fallback baseline parameters bundle or clearer setup guidance in `get_latest_params_dir` so simulations and RL can run out of the box.【F:scheduler/data/config.py†L1-L37】

test_phase1.py DELETED Viewed

@@ -1,326 +0,0 @@
-"""Phase 1 Validation Script - Test Foundation Components.
-This script validates that all Phase 1 components work correctly:
-- Configuration loading
-- Parameter loading from EDA outputs
-- Core entities (Case, Courtroom, Judge, Hearing)
-- Calendar utility
-Run this with: uv run python test_phase1.py
-"""
-from datetime import date, timedelta
-print("=" * 70)
-print("PHASE 1 VALIDATION - Court Scheduler Foundation")
-print("=" * 70)
-# Test 1: Configuration
-print("\n[1/6] Testing Configuration...")
-try:
-    from scheduler.data.config import (
-        WORKING_DAYS_PER_YEAR,
-        COURTROOMS,
-        SIMULATION_YEARS,
-        CASE_TYPE_DISTRIBUTION,
-        STAGES,
-        FAIRNESS_WEIGHT,
-        EFFICIENCY_WEIGHT,
-        URGENCY_WEIGHT,
-    )
-    print(f"  Working days/year: {WORKING_DAYS_PER_YEAR}")
-    print(f"  Courtrooms: {COURTROOMS}")
-    print(f"  Simulation years: {SIMULATION_YEARS}")
-    print(f"  Case types: {len(CASE_TYPE_DISTRIBUTION)}")
-    print(f"  Stages: {len(STAGES)}")
-    print(f"  Objective weights: Fairness={FAIRNESS_WEIGHT}, "
-          f"Efficiency={EFFICIENCY_WEIGHT}, "
-          f"Urgency={URGENCY_WEIGHT}")
-    print("  ✓ Configuration loaded successfully")
-except Exception as e:
-    print(f"  ✗ Configuration failed: {e}")
-    exit(1)
-# Test 2: Parameter Loader
-print("\n[2/6] Testing Parameter Loader...")
-try:
-    from scheduler.data.param_loader import load_parameters
-    params = load_parameters()
-    # Test transition probability
-    prob = params.get_transition_prob("ADMISSION", "ORDERS / JUDGMENT")
-    print(f"  P(ADMISSION → ORDERS/JUDGMENT): {prob:.4f}")
-    # Test stage duration
-    duration = params.get_stage_duration("ADMISSION", "median")
-    print(f"  ADMISSION median duration: {duration:.1f} days")
-    # Test capacity
-    print(f"  Daily capacity (median): {params.daily_capacity_median}")
-    # Test adjournment rate
-    adj_rate = params.get_adjournment_prob("ADMISSION", "RSA")
-    print(f"  RSA@ADMISSION adjournment rate: {adj_rate:.3f}")
-    print("  ✓ Parameter loader working correctly")
-except Exception as e:
-    print(f"  ✗ Parameter loader failed: {e}")
-    print(f"  Note: This requires EDA outputs to exist in reports/figures/")
-    # Don't exit, continue with other tests
-# Test 3: Case Entity
-print("\n[3/6] Testing Case Entity...")
-try:
-    from scheduler.core.case import Case, CaseStatus
-    # Create a sample case
-    case = Case(
-        case_id="RSA/2025/001",
-        case_type="RSA",
-        filed_date=date(2025, 1, 15),
-        current_stage="ADMISSION",
-        is_urgent=False,
-    )
-    print(f"  Created case: {case.case_id}")
-    print(f"  Type: {case.case_type}, Stage: {case.current_stage}")
-    print(f"  Status: {case.status.value}")
-    # Test methods
-    case.update_age(date(2025, 3, 1))
-    print(f"  Age after 45 days: {case.age_days} days")
-    # Record a hearing
-    case.record_hearing(date(2025, 2, 1), was_heard=True, outcome="Heard")
-    print(f"  Hearings recorded: {case.hearing_count}")
-    # Compute priority
-    priority = case.get_priority_score()
-    print(f"  Priority score: {priority:.3f}")
-    print("  ✓ Case entity working correctly")
-except Exception as e:
-    print(f"  ✗ Case entity failed: {e}")
-    exit(1)
-# Test 4: Courtroom Entity
-print("\n[4/6] Testing Courtroom Entity...")
-try:
-    from scheduler.core.courtroom import Courtroom
-    # Create a courtroom
-    courtroom = Courtroom(
-        courtroom_id=1,
-        judge_id="J001",
-        daily_capacity=151,
-    )
-    print(f"  Created courtroom {courtroom.courtroom_id} with Judge {courtroom.judge_id}")
-    print(f"  Daily capacity: {courtroom.daily_capacity}")
-    # Schedule some cases
-    test_date = date(2025, 2, 1)
-    case1_id = "RSA/2025/001"
-    case2_id = "CRP/2025/002"
-    courtroom.schedule_case(test_date, case1_id)
-    courtroom.schedule_case(test_date, case2_id)
-    scheduled = courtroom.get_daily_schedule(test_date)
-    print(f"  Scheduled {len(scheduled)} cases on {test_date}")
-    # Check utilization
-    utilization = courtroom.compute_utilization(test_date)
-    print(f"  Utilization: {utilization:.2%}")
-    print("  ✓ Courtroom entity working correctly")
-except Exception as e:
-    print(f"  ✗ Courtroom entity failed: {e}")
-    exit(1)
-# Test 5: Judge Entity
-print("\n[5/6] Testing Judge Entity...")
-try:
-    from scheduler.core.judge import Judge
-    # Create a judge
-    judge = Judge(
-        judge_id="J001",
-        name="Justice Smith",
-        courtroom_id=1,
-    )
-    judge.add_preferred_types("RSA", "CRP")
-    print(f"  Created {judge.name} (ID: {judge.judge_id})")
-    print(f"  Assigned to courtroom: {judge.courtroom_id}")
-    print(f"  Specializations: {judge.preferred_case_types}")
-    # Record workload
-    judge.record_daily_workload(date(2025, 2, 1), cases_heard=25, cases_adjourned=10)
-    avg_workload = judge.get_average_daily_workload()
-    adj_rate = judge.get_adjournment_rate()
-    print(f"  Average daily workload: {avg_workload:.1f} cases")
-    print(f"  Adjournment rate: {adj_rate:.2%}")
-    print("  ✓ Judge entity working correctly")
-except Exception as e:
-    print(f"  ✗ Judge entity failed: {e}")
-    exit(1)
-# Test 6: Hearing Entity
-print("\n[6/6] Testing Hearing Entity...")
-try:
-    from scheduler.core.hearing import Hearing, HearingOutcome
-    # Create a hearing
-    hearing = Hearing(
-        hearing_id="H001",
-        case_id="RSA/2025/001",
-        scheduled_date=date(2025, 2, 1),
-        courtroom_id=1,
-        judge_id="J001",
-        stage="ADMISSION",
-    )
-    print(f"  Created hearing {hearing.hearing_id} for case {hearing.case_id}")
-    print(f"  Scheduled: {hearing.scheduled_date}, Stage: {hearing.stage}")
-    print(f"  Initial outcome: {hearing.outcome.value}")
-    # Mark as heard
-    hearing.mark_as_heard()
-    print(f"  Outcome after hearing: {hearing.outcome.value}")
-    print(f"  Is successful: {hearing.is_successful()}")
-    print("  ✓ Hearing entity working correctly")
-except Exception as e:
-    print(f"  ✗ Hearing entity failed: {e}")
-    exit(1)
-# Test 7: Calendar Utility
-print("\n[7/7] Testing Calendar Utility...")
-try:
-    from scheduler.utils.calendar import CourtCalendar
-    calendar = CourtCalendar()
-    # Add some holidays
-    calendar.add_standard_holidays(2025)
-    print(f"  Calendar initialized with {len(calendar.holidays)} holidays")
-    # Test working day check
-    monday = date(2025, 2, 3)  # Monday
-    saturday = date(2025, 2, 1)  # Saturday
-    print(f"  Is {monday} (Mon) a working day? {calendar.is_working_day(monday)}")
-    print(f"  Is {saturday} (Sat) a working day? {calendar.is_working_day(saturday)}")
-    # Count working days
-    start = date(2025, 1, 1)
-    end = date(2025, 1, 31)
-    working_days = calendar.working_days_between(start, end)
-    print(f"  Working days in Jan 2025: {working_days}")
-    # Test seasonality
-    may_factor = calendar.get_seasonality_factor(date(2025, 5, 1))
-    feb_factor = calendar.get_seasonality_factor(date(2025, 2, 1))
-    print(f"  Seasonality factor for May: {may_factor} (vacation)")
-    print(f"  Seasonality factor for Feb: {feb_factor} (peak)")
-    print("  ✓ Calendar utility working correctly")
-except Exception as e:
-    print(f"  ✗ Calendar utility failed: {e}")
-    exit(1)
-# Integration Test
-print("\n" + "=" * 70)
-print("INTEGRATION TEST - Putting it all together")
-print("=" * 70)
-try:
-    # Create a mini simulation scenario
-    print("\nScenario: Schedule 3 cases across 2 courtrooms")
-    # Setup
-    calendar = CourtCalendar()
-    calendar.add_standard_holidays(2025)
-    courtroom1 = Courtroom(courtroom_id=1, judge_id="J001", daily_capacity=151)
-    courtroom2 = Courtroom(courtroom_id=2, judge_id="J002", daily_capacity=151)
-    judge1 = Judge(judge_id="J001", name="Justice A", courtroom_id=1)
-    judge2 = Judge(judge_id="J002", name="Justice B", courtroom_id=2)
-    # Create cases
-    cases = [
-        Case(case_id="RSA/2025/001", case_type="RSA", filed_date=date(2025, 1, 1),
-             current_stage="ADMISSION", is_urgent=True),
-        Case(case_id="CRP/2025/002", case_type="CRP", filed_date=date(2025, 1, 5),
-             current_stage="ADMISSION", is_urgent=False),
-        Case(case_id="CA/2025/003", case_type="CA", filed_date=date(2025, 1, 10),
-             current_stage="ORDERS / JUDGMENT", is_urgent=False),
-    ]
-    # Update ages
-    current_date = date(2025, 2, 1)
-    for case in cases:
-        case.update_age(current_date)
-    # Sort by priority
-    cases_sorted = sorted(cases, key=lambda c: c.get_priority_score(), reverse=True)
-    print(f"\nCases sorted by priority (as of {current_date}):")
-    for i, case in enumerate(cases_sorted, 1):
-        priority = case.get_priority_score()
-        print(f"  {i}. {case.case_id} - Priority: {priority:.3f}, "
-              f"Age: {case.age_days} days, Urgent: {case.is_urgent}")
-    # Schedule cases
-    hearing_date = calendar.next_working_day(current_date, 7)  # 7 days ahead
-    print(f"\nScheduling hearings for {hearing_date}:")
-    for i, case in enumerate(cases_sorted):
-        courtroom = courtroom1 if i % 2 == 0 else courtroom2
-        judge = judge1 if courtroom.courtroom_id == 1 else judge2
-        if courtroom.can_schedule(hearing_date, case.case_id):
-            courtroom.schedule_case(hearing_date, case.case_id)
-            hearing = Hearing(
-                hearing_id=f"H{i+1:03d}",
-                case_id=case.case_id,
-                scheduled_date=hearing_date,
-                courtroom_id=courtroom.courtroom_id,
-                judge_id=judge.judge_id,
-                stage=case.current_stage,
-            )
-            print(f"  ✓ {case.case_id} → Courtroom {courtroom.courtroom_id} (Judge {judge.judge_id})")
-    # Check courtroom schedules
-    print(f"\nCourtroom schedules for {hearing_date}:")
-    for courtroom in [courtroom1, courtroom2]:
-        schedule = courtroom.get_daily_schedule(hearing_date)
-        utilization = courtroom.compute_utilization(hearing_date)
-        print(f"  Courtroom {courtroom.courtroom_id}: {len(schedule)} cases scheduled "
-              f"(Utilization: {utilization:.2%})")
-    print("\n✓ Integration test passed!")
-except Exception as e:
-    print(f"\n✗ Integration test failed: {e}")
-    import traceback
-    traceback.print_exc()
-    exit(1)
-print("\n" + "=" * 70)
-print("ALL TESTS PASSED - Phase 1 Foundation is Solid!")
-print("=" * 70)
-print("\nNext: Phase 2 - Case Generation")
-print("  Implement case_generator.py to create 10,000 synthetic cases")
-print("=" * 70)

test_system.py DELETED Viewed

@@ -1,8 +0,0 @@
-"""Quick test to verify core system works before refactoring."""
-from scheduler.data.param_loader import load_parameters
-p = load_parameters()
-print("✓ Parameters loaded successfully")
-print(f"✓ Adjournment rate (ADMISSION, RSA): {p.get_adjournment_prob('ADMISSION', 'RSA'):.3f}")
-print("✓ Stage duration (ADMISSION, median): {:.0f} days".format(p.get_stage_duration('ADMISSION', 'median')))
-print("✓ Core system works!")