RoyAalekh commited on
Commit
746f66f
·
1 Parent(s): c92a716

Expand comprehensive codebase analysis

Browse files
reports/codebase_analysis_2024-07-01.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Court Scheduling System – Comprehensive Codebase Analysis
2
+
3
+ ## Architecture Snapshot
4
+ - **Unified CLI workflows**: `court_scheduler/cli.py` orchestrates EDA, synthetic case generation, and simulation runs with progress feedback, wiring together the data pipeline and scheduler from one entry point.【F:court_scheduler/cli.py†L1-L200】
5
+ - **Scheduling core**: `SchedulingAlgorithm` remains the central coordinator for ripeness filtering, eligibility checks, prioritization, allocation, and explainability output via `SchedulingResult` dataclass.【F:scheduler/core/algorithm.py†L1-L200】
6
+ - **EDA pipeline**: `src/run_eda.py` drives three stages—load/clean, exploratory visuals, and parameter extraction—by calling `eda_load_clean`, `eda_exploration`, and `eda_parameters` in sequence.【F:src/run_eda.py†L1-L23】 `eda_exploration` loads cleaned Parquet data, converts to pandas, and produces interactive Plotly HTML dashboards and CSV summaries for case mix, temporal trends, stage transitions, and gap distributions.【F:src/eda_exploration.py†L1-L120】
7
+ - **Synthetic data + parameter sources**: `scheduler.data.case_generator` samples stage mixes (optionally from EDA-derived parameters), case types, and working-day seasonality to produce `Case` objects compatible with the scheduler and RL training.【F:scheduler/data/case_generator.py†L1-L120】
8
+ - **RL training stack**: `rl/training.py` wraps a lightweight simulation to train the tabular Q-learning `TabularQAgent`, generating fresh cases per episode and stepping day-by-day to update rewards; `rl/simple_agent.py` encodes cases into 6-D discrete states with epsilon-greedy Q updates and reward shaping for urgency, ripeness, adjournments, and progression.【F:rl/training.py†L1-L200】【F:rl/simple_agent.py†L1-L200】
9
+
10
+ ## Strengths
11
+ - **End-to-end operability**: The Typer CLI offers cohesive commands for EDA, data generation, and simulation, lowering friction for analysts and operators running the whole workflow.【F:court_scheduler/cli.py†L1-L200】
12
+ - **Transparent scheduling outputs**: `SchedulingResult` captures scheduled cases, unscheduled reasons, ripeness filtering counts, applied overrides, and explanations, supporting audits and downstream dashboards.【F:scheduler/core/algorithm.py†L32-L200】
13
+ - **Reproducible EDA artifacts**: The EDA module saves HTML plots and CSV summaries (e.g., stage durations, transitions) and writes them to versioned run directories, enabling offline review and parameter reuse.【F:src/eda_exploration.py†L1-L120】
14
+ - **Configurable RL experiments**: The RL pipeline isolates hyperparameters in dataclasses and regenerates cases per episode, making it easy to tweak learning rates, epsilon decay, and episode lengths without touching training logic.【F:rl/training.py†L140-L200】【F:rl/simple_agent.py†L41-L160】
15
+
16
+ ## Risks and Quality Gaps
17
+ 1. **Override validation mutates inputs and leaks state across runs**. Invalid overrides are removed from the caller’s list and logged as `(None, reason)` while priority overrides set `_priority_override` on shared `Case` objects without cleanup, so repeated scheduling can inherit stale manual priorities and unscheduled entries with `None` cases complicate consumers.【F:scheduler/core/algorithm.py†L136-L200】
18
+ 2. **Ripeness defaults to optimistic**. When no bottleneck keyword or stage hint fires, the classifier returns `RIPE`, and admission-stage cases with ≥3 hearings are marked ripe without service/compliance proof, risking overscheduling unready matters.【F:scheduler/core/ripeness.py†L54-L129】
19
+ 3. **Eligibility omits calendar blocks and per-case gap rules**. `_filter_eligible` enforces only the global minimum gap, ignoring judge or courtroom block dates and any per-case gap overrides, so schedules may violate availability assumptions despite capacity adjustments.【F:scheduler/core/algorithm.py†L129-L200】【F:scheduler/control/overrides.py†L103-L169】
20
+ 4. **EDA scaling risks**. `eda_exploration` converts full Parquet datasets to pandas DataFrames before plotting, which can exhaust memory on larger extracts and lacks sampling/downcasting safeguards; renderer defaults to "browser", which can fail in headless batch environments.【F:src/eda_exploration.py†L38-L120】
21
+ 5. **Training–production gap for RL**. The Q-learning loop trains on a simplified simulation that bypasses the production `SchedulingAlgorithm`, ripeness classifier, and courtroom capacity logic, so learned policies may not transfer. Rewards are computed via a freshly instantiated agent inside the environment, divorcing reward shaping from the training agent’s evolving parameters.【F:rl/training.py†L19-L138】【F:rl/simple_agent.py†L188-L200】
22
+ 6. **Configuration robustness**. `get_latest_params_dir` still raises when no versioned params directory exists, blocking fresh environments from running simulations or RL without manual setup or bundled defaults.【F:scheduler/data/config.py†L1-L37】
23
+
24
+ ## Recommendations
25
+ - Make override handling side-effect-free: validate into separate structures, preserve original override lists for auditing, and clear any temporary priority attributes after use.【F:scheduler/core/algorithm.py†L136-L200】
26
+ - Require affirmative ripeness evidence or add an `UNKNOWN` state so ambiguous cases don’t default to `RIPE`; integrate service/compliance indicators and stage-specific checks before scheduling.【F:scheduler/core/ripeness.py†L54-L129】
27
+ - Enforce calendar constraints and per-case gap overrides in eligibility and allocation to avoid scheduling on blocked dates or ignoring individualized spacing rules.【F:scheduler/core/algorithm.py†L129-L200】【F:scheduler/control/overrides.py†L103-L169】
28
+ - Harden EDA for large datasets: stream or sample before `to_pandas`, allow a static image renderer in headless runs, and gate expensive plots behind flags to keep CLI runs reliable.【F:src/eda_exploration.py†L38-L120】
29
+ - Align RL training with the production scheduler: reuse `SchedulingAlgorithm` or its readiness/ripeness filters inside the training environment, and compute rewards without re-instantiating agents so learning signals match deployed policy behavior.【F:rl/training.py†L19-L138】【F:rl/simple_agent.py†L188-L200】
30
+ - Provide a fallback baseline parameters bundle or clearer setup guidance in `get_latest_params_dir` so simulations and RL can run out of the box.【F:scheduler/data/config.py†L1-L37】