Spaces:
Sleeping
Configuration Architecture
Overview
The codebase uses a layered configuration approach separating concerns by domain and lifecycle.
Configuration Layers
1. Domain Constants (scheduler/data/config.py)
Purpose: Immutable domain knowledge that never changes.
Contains:
STAGES- Legal case lifecycle stages from domain knowledgeTERMINAL_STAGES- Stages indicating case disposalCASE_TYPES- Valid case type taxonomyCASE_TYPE_DISTRIBUTION- Historical distribution from EDAWORKING_DAYS_PER_YEAR- Court calendar constant (192 days)
When to use: Values derived from legal/institutional domain that are facts, not tunable parameters.
2. RL Training Configuration (rl/config.py)
Purpose: Hyperparameters affecting RL agent learning behavior.
Class: RLTrainingConfig
Parameters:
episodes: Number of training episodescases_per_episode: Cases generated per episodeepisode_length_days: Simulation horizon per episodelearning_rate: Q-learning alpha parameterdiscount_factor: Q-learning gamma parameterinitial_epsilon: Starting exploration rateepsilon_decay: Exploration decay factormin_epsilon: Minimum exploration threshold
Presets:
DEFAULT_RL_TRAINING_CONFIG- Standard training (100 episodes)QUICK_DEMO_RL_CONFIG- Fast testing (20 episodes)
When to use: Experimenting with RL training convergence and exploration strategies.
3. Policy Configuration (rl/config.py)
Purpose: Policy-specific filtering and prioritization behavior.
Class: PolicyConfig
Parameters:
min_gap_days: Minimum days between hearings (fairness constraint)max_gap_alert_days: Maximum gap before triggering alertsold_case_threshold_days: Age threshold for priority boostskip_unripe_cases: Whether to filter unripe casesallow_old_unripe_cases: Allow scheduling very old unripe cases
When to use: Tuning policy filtering logic without changing core algorithm.
4. Simulation Configuration (scheduler/simulation/engine.py)
Purpose: Per-simulation operational parameters.
Class: CourtSimConfig
Parameters:
start: Simulation start datedays: Duration in daysseed: Random seed for reproducibilitycourtrooms: Number of courtrooms to simulatedaily_capacity: Cases per courtroom per daypolicy: Scheduling policy name (fifo,age,readiness,rl)duration_percentile: EDA percentile for stage durationsrl_agent_path: Path to trained RL model (required ifpolicy="rl")log_dir: Output directory for metrics
Validation: __post_init__ validates RL requirements and path types.
When to use: Each simulation run (different policies, time periods, or capacities).
5. Pipeline Configuration (court_scheduler_rl.py)
Purpose: Orchestrating multi-step workflow execution.
Class: PipelineConfig
Parameters:
n_cases: Cases to generate for trainingstart_date/end_date: Training data time windowrl_training: RLTrainingConfig instancesim_days: Simulation durationpolicies: List of policies to compareoutput_dir: Results output locationgenerate_cause_lists/generate_visualizations: Output options
When to use: Running complete trainingβsimulationβanalysis workflows.
Configuration Flow
Pipeline Execution:
βββ PipelineConfig (workflow orchestration)
β βββ RLTrainingConfig (training hyperparameters)
β βββ Data generation params
β
βββ Per-Policy Simulation:
βββ CourtSimConfig (simulation settings)
β βββ rl_agent_path (from training output)
β
βββ Policy instantiation:
βββ PolicyConfig (policy-specific settings)
Design Principles
- Separation of Concerns: Each config class owns one domain
- Type Safety: Dataclasses with validation in
__post_init__ - No Magic: Explicit parameters, no hidden defaults
- Immutability: Domain constants never change
- Composition: Configs nest (PipelineConfig contains RLTrainingConfig)
Examples
Quick Demo
from rl.config import QUICK_DEMO_RL_CONFIG
config = PipelineConfig(
n_cases=10000,
rl_training=QUICK_DEMO_RL_CONFIG, # 20 episodes
sim_days=90,
output_dir="data/quick_demo"
)
Custom Training
from rl.config import RLTrainingConfig
custom_rl = RLTrainingConfig(
episodes=500,
learning_rate=0.1,
initial_epsilon=0.3,
epsilon_decay=0.995
)
config = PipelineConfig(
n_cases=50000,
rl_training=custom_rl,
sim_days=730
)
Policy Tuning
from rl.config import PolicyConfig
strict_policy = PolicyConfig(
min_gap_days=14, # More conservative
skip_unripe_cases=True,
allow_old_unripe_cases=False # Strict ripeness enforcement
)
# Pass to RLPolicy
policy = RLPolicy(agent_path=model_path, policy_config=strict_policy)
Migration Guide
Adding New Configuration
- Determine layer (domain constant vs. tunable parameter)
- Add to appropriate config class
- Update
__post_init__validation if needed - Document in this file
Deprecating Parameters
- Move to config class first (keep old path working)
- Add deprecation warning
- Remove old path after one release cycle
Validation Rules
All config classes validate in __post_init__:
- Value ranges (0 < learning_rate β€ 1)
- Type consistency (convert strings to Path)
- Cross-parameter constraints (max_gap β₯ min_gap)
- Required file existence (rl_agent_path must exist)
Anti-Patterns
DON'T:
- β Hardcode magic numbers in algorithms
- β Use module-level mutable globals
- β Mix domain constants with tunable parameters
- β Create "god config" with everything in one class
DO:
- β Separate by lifecycle and ownership
- β Validate early (constructor time)
- β Use dataclasses for immutability
- β Provide sensible defaults with named presets