A newer version of the Gradio SDK is available: 6.15.2
AMD Round 2 Safety Plan
The first AMD Developer Cloud / ROCm LoRA run proved that fine-tuning improves structured routing quality:
| Metric | FakeRouter | AMD LoRA Round 1 |
|---|---|---|
workflow_accuracy |
97.01% | 100.00% |
status_accuracy |
57.33% | 80.00% |
required_field_presence_accuracy |
28.57% | 91.84% |
unsafe_rejection_accuracy |
100.00% | 75.00% |
false_route_rate |
0.00% | 6.67% |
Round 2 focuses on recovering safety while preserving the LoRA extraction gains.
Objective
Improve unsafe request rejection and reduce false routes without losing the required-field extraction improvement from round 1.
Target direction:
- Keep
required_field_presence_accuracyabove 85%. - Keep
status_accuracyat or above 80%. - Push
unsafe_rejection_accuracyback toward 100%. - Push
false_route_rateback toward 0%.
Safety-Augmented Dataset
Generate the regular eval set plus a safety-heavy training split:
python3 -m training.generate_dataset --safety-augmented
Format the safety split for instruction tuning:
python3 -m training.format_dataset \
--train-input data/train_safety.jsonl \
--eval-input data/eval.jsonl \
--train-output data/routercore_train_safety_instruct.jsonl \
--eval-output data/routercore_eval_instruct.jsonl
The safety split increases adversarial examples for:
- Owner/admin IAM requests
- Broad-scope production permissions
- Production monitoring disablement
- Security bypass requests
- Destructive production operations
AMD ROCm Training Command
Run this on the AMD Developer Cloud GPU VM:
python3 -m training.train_lora \
--model Qwen/Qwen2.5-0.5B-Instruct \
--train-file data/routercore_train_safety_instruct.jsonl \
--eval-file data/routercore_eval_instruct.jsonl \
--output-dir outputs/routercore-qwen-lora-safety \
--max-steps 150 \
--batch-size 1 \
--gradient-accumulation-steps 8 \
--learning-rate 2e-4 \
--max-seq-length 1024
Evaluate the round 2 adapter:
python3 -m eval.run_lora_eval \
--base-model Qwen/Qwen2.5-0.5B-Instruct \
--adapter outputs/routercore-qwen-lora-safety \
--limit 75
python3 -m eval.compare_results
What To Look For
Round 2 is successful if the comparison report shows that the safety-tuned LoRA adapter keeps most of the structured extraction gain while lowering false routes and improving unsafe rejection accuracy.
The key submission story becomes stronger if the results show iteration:
- Deterministic baseline is safe but weak at extraction.
- AMD LoRA round 1 improves extraction but reveals safety regression.
- Safety-augmented AMD LoRA round 2 reduces that regression.
Confirmed ROCm Result
The safety-tuned round 2 adapter was trained and evaluated on AMD Developer Cloud with ROCm PyTorch.
Environment proof:
torch: 2.9.1+rocm6.4
torch.cuda.is_available(): True
torch.version.hip: 6.4.43484-123eb5128
device: AMD Instinct MI300X VF
Training runtime improved from the earlier CPU-backed run of about 1121s to about 113s on ROCm.
| Metric | FakeRouter | LoRA Round 1 | Safety LoRA ROCm |
|---|---|---|---|
workflow_accuracy |
97.01% | 100.00% | 100.00% |
status_accuracy |
57.33% | 80.00% | 86.67% |
required_field_presence_accuracy |
28.57% | 91.84% | 100.00% |
unsafe_rejection_accuracy |
100.00% | 75.00% | 100.00% |
false_route_rate |
0.00% | 6.67% | 0.00% |
Round 2 achieved the desired outcome: it preserved the extraction gains from fine-tuning while recovering the safety metrics.