A newer version of the Gradio SDK is available: 6.14.0
title: RouterCore
sdk: gradio
app_file: app.py
python_version: '3.11'
RouterCore
RouterCore is a focused proof-of-concept for the AMD Developer Hackathon. It shows how a lightweight routing model can make agentic systems safer and more reliable by converting messy natural-language requests into validated workflow routes, structured parameters, and policy-aware handoff previews.
The project fits Track 2, Fine-Tuning on AMD GPUs, while still presenting a Track 1-style agent workflow demo. The MVP uses a deterministic FakeRouter so the app works immediately, and includes a LoRA fine-tuning/evaluation path that was run on AMD Developer Cloud with ROCm.
Core Thesis
RouterCore demonstrates safe routing, not just routing. It focuses on the step before agent execution: deciding whether a request should be routed, clarified, confirmed, rejected, or escalated before any orchestrator or tool can act on it.
The router is only a recommender. The validator and policy layer provide redundant checks so malformed, low-confidence, ambiguous, or unsafe requests do not become confident agent actions.
AMD Hackathon Fit
RouterCore is designed for Track 2: Fine-Tuning on AMD GPUs. A compact Qwen router was fine-tuned with LoRA on AMD Developer Cloud using ROCm, then evaluated against the deterministic router baseline.
It also demonstrates a Track 1-style agentic workflow pattern through the router, validator, policy layer, clarification loop, and orchestrator preview. The demo stays intentionally scoped: it previews execution plans but does not run cloud or infrastructure actions.
Current confirmed ROCm result: a safety-tuned LoRA run on AMD Developer Cloud improved required-field presence from 28.57% to 100.00%, workflow accuracy from 97.01% to 100.00%, and status accuracy from 57.33% to 86.67%, while preserving 100.00% unsafe rejection accuracy and 0.00% false route rate.
What It Demonstrates
- Workflow routing from natural language
- JSON schema-style workflow validation
- Policy redundancy after model/router output
- Iterative clarification for missing or uncertain fields
- Execution preview handoff without real cloud actions
- Evaluation and training hooks for future fine-tuning
RouterCore is intentionally not a cloud execution platform. It never creates infrastructure, changes IAM, or executes destructive actions.
Mentor / Submission Docs
- Mentor Pitch
- Demo Script
- Submission Notes
- Evaluation Comparison
- Architecture Diagram
- AMD Round 2 Safety Plan
Evaluation Plan
RouterCore can compare deterministic, prompted, and fine-tuned routers using:
- JSON validity
- Workflow accuracy
- Status accuracy
- Required-field accuracy
- Unsafe request rejection accuracy
- False route rate
False route rate measures how often the system confidently routes a request that should have been clarified, confirmed, or rejected.
Dataset and Evaluation
training/generate_dataset.py creates deterministic synthetic data/train.jsonl and data/eval.jsonl files across success, missing-field, ambiguous, risky-rejected, and confirmation-required cases. The dataset is designed to train and evaluate the router output contract without calling external LLM APIs.
The current baseline is FakeRouter, evaluated through the same router, validator, policy, and orchestrator decision path used by the app. The AMD LoRA run uses the same eval set and metrics, making the before/after comparison direct.
False route rate matters because safe agent systems should avoid confidently handing off requests that needed clarification, confirmation, or rejection. A router that looks accurate but has a high false route rate is unsafe for agent execution.
See Baseline Evaluation for the current FakeRouter metrics and mentor-facing interpretation.
Generate a comparison report for all available eval artifacts with:
python -m eval.compare_results
Prompted Model Baseline
RouterCore can optionally evaluate a local Hugging Face causal language model as a prompted baseline before LoRA fine-tuning:
python -m eval.run_model_eval --model Qwen/Qwen2.5-0.5B-Instruct --limit 10
This path is optional and local-friendly. It does not call paid APIs, and it is skipped gracefully if transformers or torch are not installed. The goal is to establish a second baseline between FakeRouter and a future fine-tuned router.
LoRA Fine-Tuning
RouterCore includes an optional LoRA training path for AMD Developer Cloud / ROCm, and it can also run anywhere PyTorch supports the selected model. The included routercore-qwen-lora-safety-rocm evaluation artifact was produced from an AMD Developer Cloud ROCm run on an AMD Instinct MI300X VM.
python -m training.format_dataset
python -m training.train_lora \
--model Qwen/Qwen2.5-0.5B-Instruct \
--train-file data/routercore_train_instruct.jsonl \
--eval-file data/routercore_eval_instruct.jsonl \
--output-dir outputs/routercore-qwen-lora \
--max-steps 100
python -m eval.run_lora_eval \
--base-model Qwen/Qwen2.5-0.5B-Instruct \
--adapter outputs/routercore-qwen-lora \
--limit 25
This fine-tunes a compact open-source model to emit the RouterCore JSON contract from natural-language DevOps requests, then compares the LoRA adapter against FakeRouter and the prompted base model path.
For the next safety-focused AMD iteration, generate a safety-augmented training split and train a second adapter:
python -m training.generate_dataset --safety-augmented
python -m training.format_dataset \
--train-input data/train_safety.jsonl \
--eval-input data/eval.jsonl \
--train-output data/routercore_train_safety_instruct.jsonl \
--eval-output data/routercore_eval_instruct.jsonl
See AMD Round 2 Safety Plan for the full ROCm command sequence.
Example Flow
Input:
Grant John owner access to production.
The router extracts grant_iam_role with parameters such as principal=John, role=owner, and scope=production. The policy layer rejects the request because owner/admin grants are blocked and high-risk production IAM changes are not allowed to proceed as normal routes.
Architecture
FakeRouterproposes a workflow, confidence score, parameters, candidates, and clarification hints.validatorchecks the route against workflow schema files indata/schemas.policymakes the authoritative decision, including blocked values, confidence thresholds, unsafe phrase rejection, and high-risk confirmation.statepreserves the original request, accumulated clarification context, attempts, and latest decisions.orchestratorcreates a human-readable execution preview for accepted or confirmed routes only.
The router proposes; validation and policy decide. Clarification loops gather missing context and route again. Rejected requests stop without execution, fallback requests move to manual review or a larger orchestrator, and accepted or confirmed routes generate previews only.
Workflows
create_web_appcreate_storage_bucketcreate_service_accountgrant_iam_rolecreate_scheduler_job
Run Locally
pip install -r requirements.txt
python -m app.gradio_app
Then open the local Gradio URL printed by the command.
Hugging Face Space
Live demo: https://lablab-ai-amd-developer-hackathon-routercore.hf.space
Run Tests
pytest
Fine-Tuning Result
The current router is deterministic on purpose. The LoRA experiment fine-tunes a compact model to emit the same router output contract:
{
"status": "routed",
"workflow": "create_web_app",
"confidence": 0.92,
"parameters": {},
"missing_fields": [],
"candidate_workflows": [],
"failure_reasons": [],
"clarifying_question": null
}
The training/ folder includes dataset formatting, LoRA training, inference, and LoRA evaluation scripts. The confirmed ROCm run used torch 2.9.1+rocm6.4, torch.version.hip 6.4.43484-123eb5128, and an AMD Instinct MI300X VF. The safety-tuned adapter improved structured routing quality while preserving the safety metrics that matter for agent handoff.
Why Policy Redundancy Matters
Fine-tuned routers can be useful but should not be trusted as the final authority. RouterCore separates recommendation from enforcement:
- Validation catches missing and invalid parameters.
- Policy rejects unsafe requests such as destructive production changes.
- IAM owner/admin grants are blocked even when the router extracts them correctly.
- Medium-confidence and high-risk workflows require confirmation.
- The orchestrator previews actions but does not execute them.
This makes RouterCore a compact demo of safer agent handoff design.