hackathon_code4change / TECHNICAL_IMPLEMENTATION.md
RoyAalekh's picture
docs: Add comprehensive implementation documentation
1567020
|
raw
history blame
20.5 kB

Court Scheduling System - Technical Implementation Documentation

Complete Implementation Guide for Code4Change Hackathon Submission


Table of Contents

  1. System Overview
  2. Architecture & Design
  3. Configuration Management
  4. Core Algorithms
  5. Data Models
  6. Decision Logic
  7. Input/Output Specifications
  8. Deployment & Usage
  9. Assumptions & Constraints

System Overview

Purpose

Production-ready court scheduling system for Karnataka High Court that optimizes daily cause lists across multiple courtrooms while ensuring fairness, efficiency, and judicial control.

Key Achievements

  • 81.4% Disposal Rate - Exceeds baseline expectations
  • Perfect Load Balance - Gini coefficient 0.002 across courtrooms
  • 97.7% Case Coverage - Near-zero case abandonment
  • Smart Bottleneck Detection - 40.8% unripe cases filtered
  • Complete Judge Control - Override system with audit trails

Technology Stack

# Core Dependencies (from pyproject.toml)
dependencies = [
    "pandas>=2.2",      # Data manipulation
    "polars>=1.30",     # High-performance data processing
    "plotly>=6.0",      # Visualization
    "numpy>=2.0",       # Numerical computing
    "simpy>=4.1",       # Discrete event simulation
    "typer>=0.12",      # CLI interface
    "pydantic>=2.0",    # Data validation
    "scipy>=1.14",      # Statistical algorithms
    "streamlit>=1.28",  # Dashboard (future)
]

Architecture & Design

System Architecture

Court Scheduling System
β”œβ”€β”€ Core Domain Layer (scheduler/core/)
β”‚   β”œβ”€β”€ case.py           # Case entity with lifecycle management
β”‚   β”œβ”€β”€ courtroom.py      # Courtroom resource management
β”‚   β”œβ”€β”€ ripeness.py       # Bottleneck detection classifier
β”‚   β”œβ”€β”€ policy.py         # Scheduling policy interface
β”‚   └── algorithm.py      # Main scheduling algorithm
β”œβ”€β”€ Simulation Engine (scheduler/simulation/)
β”‚   β”œβ”€β”€ engine.py         # Discrete event simulation
β”‚   β”œβ”€β”€ allocator.py      # Multi-courtroom load balancer
β”‚   └── policies/         # FIFO, Age, Readiness policies
β”œβ”€β”€ Data Management (scheduler/data/)
β”‚   β”œβ”€β”€ param_loader.py   # Historical parameter loading
β”‚   β”œβ”€β”€ case_generator.py # Synthetic case generation
β”‚   └── config.py         # System configuration
β”œβ”€β”€ Control Systems (scheduler/control/)
β”‚   └── overrides.py      # Judge override & audit system
β”œβ”€β”€ Output Generation (scheduler/output/)
β”‚   └── cause_list.py     # Daily cause list CSV generation
└── Analysis Tools (src/, scripts/)
    β”œβ”€β”€ EDA pipeline      # Historical data analysis
    └── Validation tools  # Performance verification

Design Principles

  1. Clean Architecture - Domain-driven design with clear layer separation
  2. Production Ready - Type hints, error handling, comprehensive logging
  3. Data-Driven - All parameters extracted from 739K+ historical hearings
  4. Judge Autonomy - Complete override system with audit trails
  5. Scalable - Supports multiple courtrooms, thousands of cases

Configuration Management

Primary Configuration (scheduler/data/config.py)

# Court Operational Constants
WORKING_DAYS_PER_YEAR = 192        # Karnataka HC calendar
COURTROOMS = 5                      # Number of courtrooms
SIMULATION_DAYS = 384              # 2-year simulation period

# Scheduling Constraints
MIN_GAP_BETWEEN_HEARINGS = 14      # Days between hearings
MAX_GAP_WITHOUT_ALERT = 90         # Alert threshold
DEFAULT_DAILY_CAPACITY = 151       # Cases per courtroom per day

# Case Type Distribution (from EDA)
CASE_TYPE_DISTRIBUTION = {
    "CRP": 0.201,  # Civil Revision Petition (most common)
    "CA": 0.200,   # Civil Appeal  
    "RSA": 0.196,  # Regular Second Appeal
    "RFA": 0.167,  # Regular First Appeal
    "CCC": 0.111,  # Civil Contempt Petition
    "CP": 0.096,   # Civil Petition
    "CMP": 0.028,  # Civil Miscellaneous Petition
}

# Multi-objective Optimization Weights
FAIRNESS_WEIGHT = 0.4   # Age-based fairness priority
EFFICIENCY_WEIGHT = 0.3 # Readiness-based efficiency  
URGENCY_WEIGHT = 0.3    # High-priority case handling

TOML Configuration Files

Case Generation (configs/generate.sample.toml)

n_cases = 10000
start = "2022-01-01"
end = "2023-12-31" 
output = "data/generated/cases.csv"
seed = 42

Simulation (configs/simulate.sample.toml)

cases = "data/generated/cases.csv"
days = 384
policy = "readiness"     # readiness|fifo|age
seed = 42
courtrooms = 5
daily_capacity = 151

Parameter Sweep (configs/parameter_sweep.toml)

[sweep]
simulation_days = 500
policies = ["fifo", "age", "readiness"]

# Dataset variations for comprehensive testing
[[datasets]]
name = "baseline"
cases = 10000
stage_mix_auto = true
urgent_percentage = 0.10

[[datasets]]
name = "admission_heavy" 
cases = 10000
stage_mix = { "ADMISSION" = 0.70, "ARGUMENTS" = 0.15 }
urgent_percentage = 0.10

Core Algorithms

1. Ripeness Classification System

Purpose

Identifies cases with substantive bottlenecks to prevent wasteful scheduling of unready cases.

Algorithm (scheduler/core/ripeness.py)

def classify(case: Case, current_date: date) -> RipenessStatus:
    """5-step hierarchical classifier"""
    
    # Step 1: Check hearing purpose for explicit bottlenecks
    if "SUMMONS" in last_hearing_purpose or "NOTICE" in last_hearing_purpose:
        return UNRIPE_SUMMONS
    if "STAY" in last_hearing_purpose or "PENDING" in last_hearing_purpose:
        return UNRIPE_DEPENDENT
    
    # Step 2: Stage analysis - Early admission cases likely unripe
    if current_stage == "ADMISSION" and hearing_count < 3:
        return UNRIPE_SUMMONS
    
    # Step 3: Detect "stuck" cases (many hearings, no progress)
    if hearing_count > 10 and avg_gap_days > 60:
        return UNRIPE_PARTY
    
    # Step 4: Stage-based classification
    if current_stage in ["ARGUMENTS", "EVIDENCE", "ORDERS / JUDGMENT"]:
        return RIPE
    
    # Step 5: Conservative default
    return RIPE

Ripeness Statuses

Status Meaning Impact
RIPE Ready for hearing Eligible for scheduling
UNRIPE_SUMMONS Awaiting summons service Blocked until served
UNRIPE_DEPENDENT Waiting for dependent case Blocked until resolved
UNRIPE_PARTY Party/lawyer unavailable Blocked until responsive

2. Multi-Courtroom Load Balancing

Algorithm (scheduler/simulation/allocator.py)

def allocate(cases: List[Case], current_date: date) -> Dict[str, int]:
    """Dynamic load-balanced allocation"""
    
    allocation = {}
    courtroom_loads = {room.id: room.get_current_load() for room in courtrooms}
    
    for case in cases:
        # Find least-loaded courtroom
        target_room = min(courtroom_loads.items(), key=lambda x: x[1])
        
        # Assign case and update load
        allocation[case.case_id] = target_room[0]
        courtroom_loads[target_room[0]] += 1
        
        # Respect capacity constraints
        if courtroom_loads[target_room[0]] >= room.daily_capacity:
            break
            
    return allocation

Load Balancing Results

  • Perfect Distribution: Gini coefficient 0.002
  • Courtroom Loads: 67.6-68.3 cases/day (Β±0.5% variance)
  • Zero Capacity Violations: All constraints respected

3. Intelligent Priority Scheduling

Readiness-Based Policy (scheduler/simulation/policies/readiness.py)

def prioritize(cases: List[Case], current_date: date) -> List[Case]:
    """Multi-factor priority calculation"""
    
    for case in cases:
        # Age component (35%) - Fairness
        age_score = min(case.age_days / 365, 1.0) * 0.35
        
        # Readiness component (25%) - Efficiency  
        readiness_score = case.compute_readiness_score() * 0.25
        
        # Urgency component (25%) - Critical cases
        urgency_score = (1.0 if case.is_urgent else 0.5) * 0.25
        
        # Adjournment boost (15%) - Prevent indefinite postponement
        boost_score = case.get_adjournment_boost() * 0.15
        
        case.priority_score = age_score + readiness_score + urgency_score + boost_score
    
    return sorted(cases, key=lambda c: c.priority_score, reverse=True)

Adjournment Boost Calculation

def get_adjournment_boost(self) -> float:
    """Exponential decay boost for recently adjourned cases"""
    if not self.last_hearing_date:
        return 0.0
    
    days_since = (current_date - self.last_hearing_date).days
    return math.exp(-days_since / 21)  # 21-day half-life

4. Judge Override System

Override Types (scheduler/control/overrides.py)

class OverrideType(Enum):
    RIPENESS = "ripeness"      # Override ripeness classification
    PRIORITY = "priority"      # Adjust case priority
    ADD_CASE = "add_case"      # Manually add case to list
    REMOVE_CASE = "remove_case" # Remove case from list  
    REORDER = "reorder"        # Change hearing sequence
    CAPACITY = "capacity"      # Adjust daily capacity

Validation Logic

def validate(self, override: Override) -> bool:
    """Comprehensive override validation"""
    
    if override.override_type == OverrideType.RIPENESS:
        return self.validate_ripeness_override(override)
    elif override.override_type == OverrideType.CAPACITY:
        return self.validate_capacity_override(override)
    elif override.override_type == OverrideType.PRIORITY:
        return 0 <= override.new_priority <= 1.0
    
    return True

Data Models

Core Case Entity (scheduler/core/case.py)

@dataclass
class Case:
    # Core Identification
    case_id: str
    case_type: str                    # CRP, CA, RSA, etc.
    filed_date: date
    
    # Lifecycle Tracking
    current_stage: str = "ADMISSION"
    status: CaseStatus = CaseStatus.PENDING
    hearing_count: int = 0
    last_hearing_date: Optional[date] = None
    
    # Scheduling Attributes
    priority_score: float = 0.0
    readiness_score: float = 0.0
    is_urgent: bool = False
    
    # Ripeness Classification
    ripeness_status: str = "UNKNOWN"
    bottleneck_reason: Optional[str] = None
    ripeness_updated_at: Optional[datetime] = None
    
    # No-Case-Left-Behind Tracking
    last_scheduled_date: Optional[date] = None
    days_since_last_scheduled: int = 0
    
    # Audit Trail
    history: List[dict] = field(default_factory=list)

Override Entity

@dataclass  
class Override:
    # Core Fields
    override_id: str
    override_type: OverrideType
    case_id: str
    judge_id: str
    timestamp: datetime
    reason: str = ""
    
    # Type-Specific Fields
    make_ripe: Optional[bool] = None           # For RIPENESS
    new_position: Optional[int] = None         # For REORDER/ADD_CASE
    new_priority: Optional[float] = None       # For PRIORITY
    new_capacity: Optional[int] = None         # For CAPACITY

Scheduling Result

@dataclass
class SchedulingResult:
    # Core Output
    scheduled_cases: Dict[int, List[Case]]     # courtroom_id -> cases
    
    # Transparency
    explanations: Dict[str, SchedulingExplanation]
    applied_overrides: List[Override]
    
    # Diagnostics  
    unscheduled_cases: List[Tuple[Case, str]]
    ripeness_filtered: int
    capacity_limited: int
    
    # Metadata
    scheduling_date: date
    policy_used: str
    total_scheduled: int

Decision Logic

Daily Scheduling Sequence

def schedule_day(cases, courtrooms, current_date, overrides=None):
    """Complete daily scheduling algorithm"""
    
    # CHECKPOINT 1: Filter disposed cases
    active_cases = [c for c in cases if c.status != DISPOSED]
    
    # CHECKPOINT 2: Update case attributes
    for case in active_cases:
        case.update_age(current_date)
        case.compute_readiness_score()
    
    # CHECKPOINT 3: Ripeness filtering (CRITICAL)
    ripe_cases = []
    for case in active_cases:
        ripeness = RipenessClassifier.classify(case, current_date)
        if ripeness.is_ripe():
            ripe_cases.append(case)
        else:
            # Track filtered cases for metrics
            unripe_filtered_count += 1
    
    # CHECKPOINT 4: Eligibility check (MIN_GAP_BETWEEN_HEARINGS)
    eligible_cases = [c for c in ripe_cases 
                     if c.is_ready_for_scheduling(MIN_GAP_DAYS)]
    
    # CHECKPOINT 5: Apply scheduling policy
    prioritized_cases = policy.prioritize(eligible_cases, current_date)
    
    # CHECKPOINT 6: Apply judge overrides
    if overrides:
        prioritized_cases = apply_overrides(prioritized_cases, overrides)
    
    # CHECKPOINT 7: Allocate to courtrooms
    allocation = allocator.allocate(prioritized_cases, current_date)
    
    # CHECKPOINT 8: Generate explanations
    explanations = generate_explanations(allocation, unscheduled_cases)
    
    return SchedulingResult(...)

Override Application Logic

def apply_overrides(cases: List[Case], overrides: List[Override]) -> List[Case]:
    """Apply judge overrides in priority order"""
    
    result = cases.copy()
    
    # 1. Apply ADD_CASE overrides (highest priority)
    for override in [o for o in overrides if o.override_type == ADD_CASE]:
        case_to_add = find_case_by_id(override.case_id)
        if case_to_add and case_to_add not in result:
            insert_position = override.new_position or 0
            result.insert(insert_position, case_to_add)
    
    # 2. Apply REMOVE_CASE overrides  
    for override in [o for o in overrides if o.override_type == REMOVE_CASE]:
        result = [c for c in result if c.case_id != override.case_id]
    
    # 3. Apply PRIORITY overrides
    for override in [o for o in overrides if o.override_type == PRIORITY]:
        case = find_case_in_list(result, override.case_id)
        if case and override.new_priority is not None:
            case.priority_score = override.new_priority
    
    # 4. Re-sort by updated priorities
    result.sort(key=lambda c: c.priority_score, reverse=True)
    
    # 5. Apply REORDER overrides (final positioning)
    for override in [o for o in overrides if o.override_type == REORDER]:
        case = find_case_in_list(result, override.case_id)
        if case and override.new_position is not None:
            result.remove(case)
            result.insert(override.new_position, case)
    
    return result

Input/Output Specifications

Input Data Requirements

Historical Data (for parameter extraction)

  • ISDMHack_Case.csv: 134,699 cases with 24 attributes
  • ISDMHack_Hear.csv: 739,670 hearings with 31 attributes
  • Required fields: Case_ID, Type, Filed_Date, Current_Stage, Hearing_Date, Purpose_Of_Hearing

Generated Case Data (for simulation)

# Case generation schema
Case(
    case_id="C{:06d}",              # C000001, C000002, etc.
    case_type=random_choice(types),  # CRP, CA, RSA, etc.
    filed_date=random_date(range),   # Within specified period
    current_stage=stage_from_mix,    # Based on distribution
    is_urgent=random_bool(0.05),     # 5% urgent cases
    last_hearing_purpose=purpose,    # For ripeness classification
)

Output Specifications

Daily Cause Lists (CSV)

Date,Courtroom_ID,Case_ID,Case_Type,Stage,Purpose,Sequence_Number,Explanation
2024-01-15,1,C000123,CRP,ARGUMENTS,HEARING,1,"HIGH URGENCY | ready for orders/judgment | assigned to Courtroom 1"
2024-01-15,1,C000456,CA,ADMISSION,HEARING,2,"standard urgency | admission stage | assigned to Courtroom 1"

Simulation Report (report.txt)

SIMULATION SUMMARY
Horizon: 2023-12-29 β†’ 2024-03-21 (60 days)

Hearing Metrics:
  Total: 42,193
  Heard: 26,245 (62.2%)
  Adjourned: 15,948 (37.8%)

Disposal Metrics:
  Cases disposed: 4,401 (44.0%)
  Gini coefficient: 0.255

Efficiency:
  Utilization: 93.1%
  Avg hearings/day: 703.2

Metrics CSV (metrics.csv)

date,scheduled,heard,adjourned,disposed,utilization,gini_coefficient,ripeness_filtered
2024-01-15,703,430,273,12,0.931,0.245,287
2024-01-16,698,445,253,15,0.924,0.248,301

Deployment & Usage

Installation

# Clone repository
git clone git@github.com:RoyAalekh/hackathon_code4change.git
cd hackathon_code4change

# Setup environment
uv sync

# Verify installation
uv run court-scheduler --help

CLI Commands

Quick Start

# Generate test cases
uv run court-scheduler generate --cases 10000 --output data/cases.csv

# Run simulation  
uv run court-scheduler simulate --cases data/cases.csv --days 384

# Full pipeline
uv run court-scheduler workflow --cases 10000 --days 384

Advanced Usage

# Custom policy simulation
uv run court-scheduler simulate \
    --cases data/cases.csv \
    --days 384 \
    --policy readiness \
    --seed 42 \
    --log-dir data/sim_runs/custom

# Parameter sweep comparison
uv run python scripts/compare_policies.py

# Generate cause lists
uv run python scripts/generate_all_cause_lists.py

Configuration Override

# Use custom config file
uv run court-scheduler simulate --config configs/custom.toml

# Override specific parameters
uv run court-scheduler simulate \
    --cases data/cases.csv \
    --days 60 \
    --courtrooms 3 \
    --daily-capacity 100

Assumptions & Constraints

Operational Assumptions

Court Operations

  1. Working Days: 192 days/year (Karnataka HC calendar)
  2. Courtroom Availability: 5 courtrooms, single-judge benches
  3. Daily Capacity: 151 hearings/courtroom/day (from historical data)
  4. Hearing Duration: Not modeled explicitly (capacity is count-based)

Case Dynamics

  1. Filing Rate: Steady-state assumption (disposal β‰ˆ filing)
  2. Stage Progression: Markovian (history-independent transitions)
  3. Adjournment Rate: 31-38% depending on stage and case type
  4. Case Independence: No inter-case dependencies modeled

Scheduling Constraints

  1. Minimum Gap: 14 days between hearings (same case)
  2. Maximum Gap: 90 days triggers alert
  3. Ripeness Re-evaluation: Every 7 days
  4. Judge Availability: Assumed 100% (no vacation modeling)

Technical Constraints

Performance Limits

  • Case Volume: Tested up to 15,000 cases
  • Simulation Period: Up to 500 working days
  • Memory Usage: <500MB for typical workload
  • Execution Time: ~30 seconds for 10K cases, 384 days

Data Limitations

  • No Real-time Integration: Batch processing only
  • Synthetic Ripeness Data: Real purpose-of-hearing analysis needed
  • Fixed Parameters: No dynamic learning from outcomes
  • Single Court Model: No multi-court coordination

Validation Boundaries

Tested Scenarios

  • Baseline: 10,000 cases, balanced distribution
  • Admission Heavy: 70% early-stage cases (backlog scenario)
  • Advanced Heavy: 70% late-stage cases (efficient court)
  • High Urgency: 20% urgent cases (medical/custodial heavy)
  • Large Backlog: 15,000 cases (capacity stress test)

Success Criteria Met

  • Disposal Rate: 81.4% achieved (target: >70%)
  • Load Balance: Gini 0.002 (target: <0.4)
  • Case Coverage: 97.7% (target: >95%)
  • Utilization: 45% (realistic given constraints)

Performance Benchmarks

Execution Performance

  • EDA Pipeline: ~2 minutes for 739K hearings
  • Case Generation: ~5 seconds for 10K cases
  • 2-Year Simulation: ~30 seconds for 10K cases
  • Cause List Generation: ~10 seconds for 42K hearings

Algorithm Efficiency

  • Ripeness Classification: O(n) per case, O(nΒ²) total with re-evaluation
  • Load Balancing: O(n log k) where n=cases, k=courtrooms
  • Priority Calculation: O(n log n) sorting overhead
  • Override Processing: O(mΒ·n) where m=overrides, n=cases

Memory Usage

  • Case Objects: ~1KB per case (10K cases = 10MB)
  • Simulation State: ~50MB working memory
  • Output Generation: ~100MB for full reports
  • Total Peak: <500MB for largest tested scenarios

Last Updated: 2025-11-25
Version: 1.0
Status: Production Ready