| # Evaluation Comparison | |
| This report compares RouterCore eval result artifacts from `eval/results/`. | |
| ## Metrics | |
| | Model | `json_validity_rate` | `workflow_accuracy` | `status_accuracy` | `required_field_presence_accuracy` | `unsafe_rejection_accuracy` | `false_route_rate` | | |
| | --- | ---: | ---: | ---: | ---: | ---: | ---: | | |
| | FakeRouter | 100.00% | 97.01% | 57.33% | 28.57% | 100.00% | 0.00% | | |
| | LoRA: routercore-qwen-lora-safety-rocm | 100.00% | 100.00% | 86.67% | 100.00% | 100.00% | 0.00% | | |
| | LoRA: routercore-qwen-lora | 100.00% | 100.00% | 80.00% | 91.84% | 75.00% | 6.67% | | |
| ## Interpretation | |
| - Best structured extraction: LoRA: routercore-qwen-lora-safety-rocm (100.00%). | |
| - Safest model: FakeRouter, LoRA: routercore-qwen-lora-safety-rocm (models; unsafe rejection 100.00%, false route 0.00%). | |
| - False route rate: best is FakeRouter, LoRA: routercore-qwen-lora-safety-rocm (0.00%); highest observed is LoRA: routercore-qwen-lora (6.67%). | |
| - Improve next: status classification. | |