Spaces:

Arpit-Bansal
/

train-schedule-optimization

Running

App Files Files Community

Arpit-Bansal commited on Nov 28, 2025

Commit

01795ee

1 Parent(s): 51de58f

nsga-II tuning

Browse files

Files changed (5) hide show

api/greedyoptim_api.py +1 -1
docs/OPTIMIZER_TUNING.md +183 -0
docs/block_optimization_fix.md +71 -0
greedyOptim/evaluator.py +29 -11
greedyOptim/hybrid_optimizers.py +117 -19

api/greedyoptim_api.py CHANGED Viewed

@@ -791,4 +791,4 @@ async def validate_data(request: ScheduleOptimizationRequest):
 if __name__ == "__main__":
     import uvicorn
-    uvicorn.run(app, host="0.0.0.0", port=7860)

 if __name__ == "__main__":
     import uvicorn
+    uvicorn.run("api.greedyoptim_api:app", host="0.0.0.0", port=7860, reload=True)

docs/OPTIMIZER_TUNING.md ADDED Viewed

	@@ -0,0 +1,183 @@

+# Optimizer Tuning Guide
+This document describes the tuning changes made to optimize service train selection across all optimization methods.
+## Problem Statement
+The optimizers were initially selecting too few trains for service (as low as 1-13 trains), when 21-22 healthy trainsets were available. This was due to:
+1. **Synthetic data issues**: Only 12% of trainsets were healthy
+2. **Fitness function priorities**: Branding compliance weighted too heavily
+3. **NSGA-II specific issues**: No elitism, random initialization, equal objective weights
+## Changes Made
+### 1. Synthetic Data Generator (`DataService/enhanced_generator.py`)
+**Problem**: Only 3/25 trainsets (12%) were healthy enough for service.
+**Solution**: Increased healthy trainset ratio to 85%.
+```python
+# Before: Equal probability of healthy/unhealthy components
+# After: 85% healthy trainsets with wear capped at 60% of threshold
+healthy_trainset_count = int(self.num_trainsets * 0.85)  # 85% healthy
+max_healthy_wear = comp_info["wear_threshold"] * 0.60    # 60% cap
+```
+### 2. Fitness Function Weights (`greedyOptim/evaluator.py`)
+**Problem**: Branding compliance was weighted too heavily, causing optimizers to prefer fewer trains with better branding over more trains in service.
+**Solution**: Rebalanced weights to prioritize operational needs.
+| Objective | Old Weight | New Weight | Priority |
+|-----------|------------|------------|----------|
+| service_availability | 2.0 | **5.0** | HIGHEST |
+| constraint_penalty | 5.0 | **10.0** | CRITICAL |
+| mileage_balance | 1.5 | 1.5 | Medium |
+| maintenance_cost | 1.0 | 1.0 | Medium |
+| branding_compliance | 1.5 | **0.2** | LOW |
+**Buffer Bonus**: Added bonus for trains beyond minimum requirement:
+```python
+# Reward having more than minimum trains for smooth operations
+buffer = max(0, len(service_trains) - self.config.required_service_trains)
+objectives['service_availability'] += buffer * 3.0  # Bonus per extra train
+```
+### 3. NSGA-II Optimizer (`greedyOptim/hybrid_optimizers.py`)
+#### 3.1 Weighted Dominance Comparison
+**Problem**: All objectives treated equally in Pareto dominance.
+**Solution**: Apply weights to objectives before dominance comparison.
+```python
+self.objective_weights = {
+    'service_availability': 5.0,   # HIGHEST
+    'mileage_balance': 1.5,
+    'maintenance_cost': 1.0,
+    'branding_compliance': 0.2,    # LOW
+    'constraint_penalty': 10.0     # CRITICAL
+}
+# In dominates():
+obj1 = [
+    -solution1['service_availability'] * w['service_availability'],
+    # ... other objectives with weights
+]
+```
+#### 3.2 Smart Initialization
+**Problem**: Random initialization created many invalid solutions.
+**Solution**: Seed population with constraint-aware solutions.
+```python
+def _create_smart_initial_solution(self):
+    solution = np.zeros(self.n_genes, dtype=int)  # All service
+    for i, ts_id in enumerate(self.evaluator.trainsets):
+        valid, _ = self.evaluator.check_hard_constraints(ts_id)
+        if not valid:
+            solution[i] = 2  # Maintenance for invalid
+        elif standby_count < self.config.min_standby:
+            solution[i] = 1  # Reserve for standby
+    return solution
+# Mix: 20% smart solutions + 80% biased random
+```
+#### 3.3 Biased Random Solutions
+**Problem**: Equal probability for service/depot/maintenance (33% each).
+**Solution**: Bias toward service assignment.
+```python
+# Initial population: 65% service, 20% depot, 15% maintenance
+solution = np.random.choice([0, 1, 2], size=n, p=[0.65, 0.20, 0.15])
+# Mutation: 55% service, 30% depot, 15% maintenance
+child[i] = np.random.choice([0, 1, 2], p=[0.55, 0.30, 0.15])
+```
+#### 3.4 Elitism with Combined Population
+**Problem**: Offspring replaced parents completely, losing good solutions.
+**Solution**: Combine parents and offspring, then select best via non-dominated sorting.
+```python
+# Combine parents and offspring
+combined_population = new_population + offspring
+# Re-evaluate and sort
+combined_fronts = self.fast_non_dominated_sort(combined_objectives)
+# Select best from combined (preserves good solutions)
+for front in combined_fronts:
+    # Add to next generation up to population_size
+```
+#### 3.5 Service-Prioritized Final Selection
+**Problem**: Random selection from Pareto front.
+**Solution**: Explicitly select solution with highest service availability.
+```python
+# Among zero-penalty solutions, choose highest service_availability
+valid_solutions = [(i, sol, obj) for i, (sol, obj) in enumerate(best_solutions)
+                  if obj.get('constraint_penalty', 0) == 0]
+if valid_solutions:
+    best_idx = max(valid_solutions,
+                  key=lambda x: x[2].get('service_availability', 0))[0]
+```
+## Results
+### Before Tuning
+| Method | Service Trains | Notes |
+|--------|---------------|-------|
+| GA | 1-13 | Poor due to unhealthy data |
+| PSO | 1-13 | Same issue |
+| SA | 1-13 | Same issue |
+| CMA-ES | 1-13 | Same issue |
+| NSGA2 | 12-13 | Worst performer |
+### After Tuning
+| Method | Service Trains | Notes |
+|--------|---------------|-------|
+| GA | 21-22 | Excellent |
+| SA | 21-22 | Excellent |
+| CMA-ES | 19-20 | Good |
+| NSGA2 | 21-22 | **Fixed!** |
+| PSO | 15-18 | Acceptable |
+## Recommendations
+1. **Use GA or SA** for best results in single-objective optimization
+2. **Use NSGA2** when you need to explore trade-offs between objectives
+3. **PSO** may need further tuning for this problem domain
+4. **CMA-ES** provides good balance between quality and exploration
+## Configuration Parameters
+Recommended settings for Kochi Metro (25 trainsets, 106 blocks):
+```python
+config = OptimizationConfig(
+    required_service_trains=15,    # Minimum for service
+    min_standby=2,                 # Safety buffer
+    population_size=50,            # Larger = better but slower
+    generations=100,               # More = better convergence
+    mutation_rate=0.1,             # Standard
+    crossover_rate=0.8,            # Standard
+    optimize_block_assignment=True # Enable block optimization
+)
+```

docs/block_optimization_fix.md ADDED Viewed

	@@ -0,0 +1,71 @@

+# Block Optimization Fix Summary
+## The Problem
+NSGA-II optimizer was only producing **33-42 blocks** instead of the expected **106 blocks**.
+## Root Causes
+### 1. Reference vs Copy Issue
+When storing best solutions from the Pareto front, we stored references instead of copies:
+```python
+# WRONG - stores references that get overwritten
+best_solutions = [(population[i], objectives[i]) for i in fronts[0]]
+best_block_solutions = [block_population[i] for i in fronts[0]]
+```
+Since `population` and `block_population` are replaced each generation with `offspring`, the stored references pointed to stale/corrupted data.
+### 2. Block-Trainset Mismatch
+Even with copies, the stored block assignments were created for a *different* trainset selection. When the best solution evolved to have different service trainsets, the old block assignment still mapped to old trainset indices.
+Example:
+- Generation 50: Best solution has trainsets [0, 2, 5] → blocks assigned to indices 0, 2, 5
+- Generation 150: Best solution evolves to trainsets [1, 3, 7] → but block assignment still references 0, 2, 5
+- Result: Many blocks map to non-service trainsets → lost blocks
+## The Fix
+**Always create fresh block assignments for the final best solution:**
+```python
+# Select best solution from Pareto front
+if best_solutions:
+    best_idx = min(range(len(best_solutions)),
+                  key=lambda i: self.evaluator.fitness_function(best_solutions[i][0]))
+    best_solution, best_objectives = best_solutions[best_idx]
+    if self.optimize_blocks:
+        # Always create fresh block assignment for the best solution
+        # to ensure all 106 blocks are properly assigned
+        best_block_sol = self._create_block_assignment(best_solution)
+```
+The `_create_block_assignment` distributes all blocks evenly across current service trainsets:
+```python
+def _create_block_assignment(self, trainset_sol: np.ndarray) -> np.ndarray:
+    service_indices = np.where(trainset_sol == 0)[0]
+    if len(service_indices) == 0:
+        return np.full(self.n_blocks, -1, dtype=int)
+    # Distribute blocks evenly across service trains
+    block_sol = np.zeros(self.n_blocks, dtype=int)
+    for i in range(self.n_blocks):
+        block_sol[i] = service_indices[i % len(service_indices)]
+    return block_sol
+```
+## Result
+| Optimizer | Before Fix | After Fix |
+|-----------|-----------|-----------|
+| GA        | 106 ✓     | 106 ✓     |
+| CMA-ES    | 106 ✓     | 106 ✓     |
+| PSO       | 106 ✓     | 106 ✓     |
+| SA        | 106 ✓     | 106 ✓     |
+| NSGA-II   | 33-42 ✗   | 106 ✓     |
+All optimizers now correctly assign all 106 service blocks.

greedyOptim/evaluator.py CHANGED Viewed

@@ -219,10 +219,19 @@ class TrainsetSchedulingEvaluator:
                     maint_trains.append(ts_id)
             # Objective 1: Service Availability (maximize)
-            availability = len(service_trains) / self.config.required_service_trains
-            if len(service_trains) < self.config.required_service_trains:
-                objectives['constraint_penalty'] += (self.config.required_service_trains - len(service_trains)) * 100.0
-            objectives['service_availability'] = min(availability, 1.0) * 100.0
             # Objective 2: Mileage Balance (maximize via minimizing std dev)
             mileages = [self.status_map[ts].get('total_mileage_km', 0) for ts in service_trains]
@@ -232,7 +241,7 @@ class TrainsetSchedulingEvaluator:
             else:
                 objectives['mileage_balance'] = 100.0
-            # Objective 3: Branding Compliance (maximize)
             brand_scores = []
             for ts_id in service_trains:
                 if ts_id in self.brand_map:
@@ -270,16 +279,25 @@ class TrainsetSchedulingEvaluator:
         return objectives
     def fitness_function(self, solution: np.ndarray) -> float:
-        """Aggregate fitness function for minimization."""
         obj = self.calculate_objectives(solution)
         # Weighted sum (convert maximization objectives to minimization)
         fitness = (
-            -obj['service_availability'] * 2.0 +      # Maximize (negative weight)
-            -obj['branding_compliance'] * 1.5 +        # Maximize
-            -obj['mileage_balance'] * 1.0 +            # Maximize
-            -obj['maintenance_cost'] * 1.0 +           # Maximize
-            obj['constraint_penalty'] * 5.0            # Minimize (positive weight)
         )
         return fitness

                     maint_trains.append(ts_id)
             # Objective 1: Service Availability (maximize)
+            # Reward having MORE than minimum required (smooth operations)
+            num_service = len(service_trains)
+            if num_service < self.config.required_service_trains:
+                # Heavy penalty for not meeting minimum
+                objectives['constraint_penalty'] += (self.config.required_service_trains - num_service) * 200.0
+                objectives['service_availability'] = (num_service / self.config.required_service_trains) * 100.0
+            else:
+                # Reward additional trains beyond minimum (up to 50% more for full fleet coverage)
+                # This encourages smooth operations with more trains available
+                bonus_trains = num_service - self.config.required_service_trains
+                max_bonus = int(self.config.required_service_trains * 0.5)  # Up to 50% more
+                bonus_score = min(bonus_trains / max_bonus, 1.0) * 20.0 if max_bonus > 0 else 0
+                objectives['service_availability'] = 100.0 + bonus_score
             # Objective 2: Mileage Balance (maximize via minimizing std dev)
             mileages = [self.status_map[ts].get('total_mileage_km', 0) for ts in service_trains]
             else:
                 objectives['mileage_balance'] = 100.0
+            # Objective 3: Branding Compliance (low priority - nice to have)
             brand_scores = []
             for ts_id in service_trains:
                 if ts_id in self.brand_map:
         return objectives
     def fitness_function(self, solution: np.ndarray) -> float:
+        """Aggregate fitness function for minimization.
+        Priority order (highest to lowest):
+        1. Meeting minimum service trains (hard constraint)
+        2. Having MORE trains for smooth operations
+        3. Mileage balance across fleet
+        4. Maintenance cost optimization
+        5. Branding compliance (low priority, nice-to-have)
+        """
         obj = self.calculate_objectives(solution)
         # Weighted sum (convert maximization objectives to minimization)
+        # Higher weight = more important
         fitness = (
+            -obj['service_availability'] * 5.0 +      # HIGHEST: Maximize trains in service
+            -obj['mileage_balance'] * 1.5 +            # Medium: Fleet wear balance
+            -obj['maintenance_cost'] * 1.0 +           # Medium: Avoid overdue maintenance
+            -obj['branding_compliance'] * 0.2 +        # LOW: Branding is nice-to-have
+            obj['constraint_penalty'] * 10.0           # CRITICAL: Hard constraints must be met
         )
         return fitness

greedyOptim/hybrid_optimizers.py CHANGED Viewed

@@ -24,15 +24,38 @@ class MultiObjectiveOptimizer:
         self.n_blocks = evaluator.num_blocks
         self.optimize_blocks = self.config.optimize_block_assignment
     def dominates(self, solution1: Dict[str, float], solution2: Dict[str, float]) -> bool:
-        """Check if solution1 dominates solution2 in multi-objective sense."""
         # Convert maximization objectives to minimization (lower is better)
-        obj1 = [-solution1['service_availability'], -solution1['branding_compliance'],
-                -solution1['mileage_balance'], -solution1['maintenance_cost'],
-                solution1['constraint_penalty']]
-        obj2 = [-solution2['service_availability'], -solution2['branding_compliance'],
-                -solution2['mileage_balance'], -solution2['maintenance_cost'],
-                solution2['constraint_penalty']]
         # Check if all objectives are better or equal, with at least one strictly better
         all_better_equal = all(o1 <= o2 for o1, o2 in zip(obj1, obj2))
@@ -133,13 +156,44 @@ class MultiObjectiveOptimizer:
         return mutated
     def optimize(self) -> OptimizationResult:
         """Run NSGA-II multi-objective optimization."""
         # Initialize population with trainset solutions and block assignments
         population = []
         block_population = []
-        for _ in range(self.config.population_size):
-            solution = np.random.randint(0, 3, self.n_genes)
             population.append(solution)
             if self.optimize_blocks:
                 block_sol = self._create_block_assignment(solution)
@@ -210,10 +264,11 @@ class MultiObjectiveOptimizer:
                     else:
                         child = parent1.copy()
-                    # Mutation
                     for i in range(self.n_genes):
                         if random.random() < self.config.mutation_rate:
-                            child[i] = random.randint(0, 2)
                     offspring.append(child)
@@ -238,23 +293,66 @@ class MultiObjectiveOptimizer:
                         offspring_blocks.append(block_child)
-                population = offspring
-                if self.optimize_blocks:
-                    block_population = offspring_blocks
                 if gen % 50 == 0:
-                    print(f"Generation {gen}: {len(fronts)} fronts, best front size: {len(fronts[0]) if fronts else 0}")
             except Exception as e:
                 print(f"Error in NSGA-II generation {gen}: {e}")
                 break
-        # Select best solution from Pareto front
         best_block_sol = None
         if best_solutions:
-            # Choose solution with best overall fitness
-            best_idx = min(range(len(best_solutions)),
-                          key=lambda i: self.evaluator.fitness_function(best_solutions[i][0]))
             best_solution, best_objectives = best_solutions[best_idx]
             if self.optimize_blocks:
                 # Always create fresh block assignment for the best solution

         self.n_blocks = evaluator.num_blocks
         self.optimize_blocks = self.config.optimize_block_assignment
+        # Objective weights for dominance comparison
+        # Higher weight = more important in determining dominance
+        self.objective_weights = {
+            'service_availability': 5.0,   # HIGHEST: More trains = better operations
+            'mileage_balance': 1.5,        # Medium: Fleet wear balance
+            'maintenance_cost': 1.0,       # Medium: Avoid overdue maintenance
+            'branding_compliance': 0.2,    # LOW: Nice-to-have
+            'constraint_penalty': 10.0     # CRITICAL: Hard constraints
+        }
     def dominates(self, solution1: Dict[str, float], solution2: Dict[str, float]) -> bool:
+        """Check if solution1 dominates solution2 in multi-objective sense.
+        Uses weighted objectives to prioritize service availability over branding.
+        """
         # Convert maximization objectives to minimization (lower is better)
+        # Apply weights to emphasize important objectives
+        w = self.objective_weights
+        obj1 = [
+            -solution1['service_availability'] * w['service_availability'],
+            -solution1['mileage_balance'] * w['mileage_balance'],
+            -solution1['maintenance_cost'] * w['maintenance_cost'],
+            -solution1['branding_compliance'] * w['branding_compliance'],
+            solution1['constraint_penalty'] * w['constraint_penalty']
+        ]
+        obj2 = [
+            -solution2['service_availability'] * w['service_availability'],
+            -solution2['mileage_balance'] * w['mileage_balance'],
+            -solution2['maintenance_cost'] * w['maintenance_cost'],
+            -solution2['branding_compliance'] * w['branding_compliance'],
+            solution2['constraint_penalty'] * w['constraint_penalty']
+        ]
         # Check if all objectives are better or equal, with at least one strictly better
         all_better_equal = all(o1 <= o2 for o1, o2 in zip(obj1, obj2))
         return mutated
+    def _create_smart_initial_solution(self) -> np.ndarray:
+        """Create a smart initial solution that respects constraints."""
+        solution = np.zeros(self.n_genes, dtype=int)  # Start with all service
+        standby_count = 0
+        for i, ts_id in enumerate(self.evaluator.trainsets):
+            valid, _ = self.evaluator.check_hard_constraints(ts_id)
+            if not valid:
+                solution[i] = 2  # Put constraint-violating trainsets in maintenance
+            elif standby_count < self.config.min_standby:
+                solution[i] = 1  # Reserve some healthy ones for standby
+                standby_count += 1
+        return solution
     def optimize(self) -> OptimizationResult:
         """Run NSGA-II multi-objective optimization."""
         # Initialize population with trainset solutions and block assignments
+        # Mix of smart and random solutions for diversity
         population = []
         block_population = []
+        # First, add some smart solutions (constraint-aware)
+        num_smart = min(10, self.config.population_size // 5)
+        for _ in range(num_smart):
+            solution = self._create_smart_initial_solution()
+            # Add some random mutation to create diversity
+            for i in range(self.n_genes):
+                if np.random.random() < 0.1:  # 10% mutation
+                    solution[i] = np.random.choice([0, 1, 2], p=[0.70, 0.20, 0.10])
+            population.append(solution)
+            if self.optimize_blocks:
+                block_sol = self._create_block_assignment(solution)
+                block_population.append(block_sol)
+        # Fill rest with biased random (favor service)
+        for _ in range(self.config.population_size - num_smart):
+            solution = np.random.choice([0, 1, 2], size=self.n_genes, p=[0.65, 0.20, 0.15])
             population.append(solution)
             if self.optimize_blocks:
                 block_sol = self._create_block_assignment(solution)
                     else:
                         child = parent1.copy()
+                    # Mutation with bias towards service (0)
                     for i in range(self.n_genes):
                         if random.random() < self.config.mutation_rate:
+                            # 55% chance to mutate to service, 30% depot, 15% maintenance
+                            child[i] = np.random.choice([0, 1, 2], p=[0.55, 0.30, 0.15])
                     offspring.append(child)
                         offspring_blocks.append(block_child)
+                # ELITISM: Combine parents and offspring, then select best
+                combined_population = new_population + offspring
+                combined_blocks = (new_block_population + offspring_blocks) if self.optimize_blocks else None
+                # Evaluate combined population
+                combined_objectives = []
+                for sol in combined_population:
+                    combined_objectives.append(self.evaluator.calculate_objectives(sol))
+                # Non-dominated sorting on combined population
+                combined_fronts = self.fast_non_dominated_sort(combined_objectives)
+                # Select best individuals for next generation
+                population = []
+                block_population = [] if self.optimize_blocks else None
+                for front in combined_fronts:
+                    if len(population) + len(front) <= self.config.population_size:
+                        population.extend([combined_population[i].copy() for i in front])
+                        if self.optimize_blocks:
+                            block_population.extend([combined_blocks[i].copy() for i in front])
+                    else:
+                        # Use crowding distance for this front
+                        distances = self.crowding_distance(front, combined_objectives)
+                        sorted_front = sorted(zip(front, distances), key=lambda x: x[1], reverse=True)
+                        remaining = self.config.population_size - len(population)
+                        population.extend([combined_population[i].copy() for i, _ in sorted_front[:remaining]])
+                        if self.optimize_blocks:
+                            block_population.extend([combined_blocks[i].copy() for i, _ in sorted_front[:remaining]])
+                        break
                 if gen % 50 == 0:
+                    best_service = max(obj.get('service_availability', 0) for obj in combined_objectives)
+                    min_penalty = min(obj.get('constraint_penalty', 9999) for obj in combined_objectives)
+                    print(f"Generation {gen}: {len(combined_fronts)} fronts, best service: {best_service:.1f}, min penalty: {min_penalty:.0f}")
             except Exception as e:
                 print(f"Error in NSGA-II generation {gen}: {e}")
                 break
+        # Select best solution from Pareto front - prioritize service availability
         best_block_sol = None
         if best_solutions:
+            # First, find solutions with zero constraint penalty
+            valid_solutions = [(i, sol, obj) for i, (sol, obj) in enumerate(best_solutions)
+                              if obj.get('constraint_penalty', 0) == 0]
+            if valid_solutions:
+                # Among valid solutions, choose the one with highest service_availability
+                # (which means more trains in service)
+                best_idx = max(valid_solutions,
+                              key=lambda x: x[2].get('service_availability', 0))[0]
+            else:
+                # Fall back to lowest constraint penalty + highest service
+                best_idx = max(range(len(best_solutions)),
+                              key=lambda i: (
+                                  -best_solutions[i][1].get('constraint_penalty', float('inf')),
+                                  best_solutions[i][1].get('service_availability', 0)
+                              ))
             best_solution, best_objectives = best_solutions[best_idx]
             if self.optimize_blocks:
                 # Always create fresh block assignment for the best solution