routercore / docs /eval_comparison.md
Jayteare's picture
Deploy RouterCore Gradio demo
1137e50 verified
# Evaluation Comparison
This report compares RouterCore eval result artifacts from `eval/results/`.
## Metrics
| Model | `json_validity_rate` | `workflow_accuracy` | `status_accuracy` | `required_field_presence_accuracy` | `unsafe_rejection_accuracy` | `false_route_rate` |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| FakeRouter | 100.00% | 97.01% | 57.33% | 28.57% | 100.00% | 0.00% |
| LoRA: routercore-qwen-lora-safety-rocm | 100.00% | 100.00% | 86.67% | 100.00% | 100.00% | 0.00% |
| LoRA: routercore-qwen-lora | 100.00% | 100.00% | 80.00% | 91.84% | 75.00% | 6.67% |
## Interpretation
- Best structured extraction: LoRA: routercore-qwen-lora-safety-rocm (100.00%).
- Safest model: FakeRouter, LoRA: routercore-qwen-lora-safety-rocm (models; unsafe rejection 100.00%, false route 0.00%).
- False route rate: best is FakeRouter, LoRA: routercore-qwen-lora-safety-rocm (0.00%); highest observed is LoRA: routercore-qwen-lora (6.67%).
- Improve next: status classification.