hackathon_code4change / docs /OUTPUT_REFACTORING.md
RoyAalekh's picture
feat: Implement OutputManager for clean output directory structure
6a714c3
|
raw
history blame
3 kB

Output Directory Refactoring - Implementation Status

Completed

1. Created OutputManager class

  • File: scheduler/utils/output_manager.py
  • Features:
    • Single run directory with timestamp-based ID
    • Clean hierarchy: eda/ training/ simulation/ reports/
    • Property-based access to all output paths
    • Config saved to run root for reproducibility

2. Integrated into Pipeline

  • File: court_scheduler_rl.py
  • Changes:
    • PipelineConfig no longer has output_dir field
    • InteractivePipeline uses OutputManager instance
    • All self.output_dir references replaced with self.output.{property}
    • Pipeline compiles successfully

Completed Tasks

1. Remove Duplicate Model Saving (DONE)

  • Removed duplicate model save in court_scheduler_rl.py
  • Implemented OutputManager.create_model_symlink() method
  • Model saved once to outputs/runs/{run_id}/training/agent.pkl
  • Symlink created at models/latest.pkl

2. Update EDA Output Paths (DONE)

  • Modified src/eda_config.py with:
    • set_output_paths() function to configure from OutputManager
    • Private getter functions (_get_run_dir(), _get_params_dir(), etc.)
    • Fallback to legacy paths when running standalone
  • Updated all EDA modules (eda_load_clean.py, eda_exploration.py, eda_parameters.py)
  • Pipeline calls set_output_paths() before running EDA steps
  • EDA outputs now write to outputs/runs/{run_id}/eda/

3. Fix Import Errors (DONE)

  • Fixed syntax errors in EDA imports (removed parentheses from function names)
  • All modules compile without errors

4. Test End-to-End (DONE)

uv run python court_scheduler_rl.py quick

Status: SUCCESS (Exit code: 0)

  • All outputs in outputs/runs/run_20251126_055943/
  • No scattered files
  • Models symlinked correctly at models/latest.pkl
  • Pipeline runs without errors
  • Clean directory structure verified with tree command

New Directory Structure

outputs/
└── runs/
    └── run_20251126_123456/
        β”œβ”€β”€ config.json
        β”œβ”€β”€ eda/
        β”‚   β”œβ”€β”€ figures/
        β”‚   β”œβ”€β”€ params/
        β”‚   └── data/
        β”œβ”€β”€ training/
        β”‚   β”œβ”€β”€ cases.csv
        β”‚   β”œβ”€β”€ agent.pkl
        β”‚   └── stats.json
        β”œβ”€β”€ simulation/
        β”‚   β”œβ”€β”€ readiness/
        β”‚   └── rl/
        └── reports/
            β”œβ”€β”€ EXECUTIVE_SUMMARY.md
            β”œβ”€β”€ COMPARISON_REPORT.md
            └── visualizations/

models/
└── latest.pkl -> ../outputs/runs/run_20251126_123456/training/agent.pkl

Benefits Achieved

  1. Single source of truth: All run artifacts in one directory
  2. Reproducibility: Config saved with outputs
  3. No duplication: Files written once, not copied
  4. Clear hierarchy: Logical organization by pipeline phase
  5. Easy cleanup: Delete entire run directory
  6. Version control: Run IDs sortable by timestamp