A newer version of the Gradio SDK is available: 6.14.0
Evaluation Comparison
This report compares RouterCore eval result artifacts from eval/results/.
Metrics
| Model | json_validity_rate |
workflow_accuracy |
status_accuracy |
required_field_presence_accuracy |
unsafe_rejection_accuracy |
false_route_rate |
|---|---|---|---|---|---|---|
| FakeRouter | 100.00% | 97.01% | 57.33% | 28.57% | 100.00% | 0.00% |
| LoRA: routercore-qwen-lora-safety-rocm | 100.00% | 100.00% | 86.67% | 100.00% | 100.00% | 0.00% |
| LoRA: routercore-qwen-lora | 100.00% | 100.00% | 80.00% | 91.84% | 75.00% | 6.67% |
Interpretation
- Best structured extraction: LoRA: routercore-qwen-lora-safety-rocm (100.00%).
- Safest model: FakeRouter, LoRA: routercore-qwen-lora-safety-rocm (models; unsafe rejection 100.00%, false route 0.00%).
- False route rate: best is FakeRouter, LoRA: routercore-qwen-lora-safety-rocm (0.00%); highest observed is LoRA: routercore-qwen-lora (6.67%).
- Improve next: status classification.