| --- |
| title: RouterCore |
| sdk: gradio |
| app_file: app.py |
| python_version: "3.11" |
| --- |
| |
| # RouterCore |
|
|
| RouterCore is a focused proof-of-concept for the AMD Developer Hackathon. It shows how a lightweight routing model can make agentic systems safer and more reliable by converting messy natural-language requests into validated workflow routes, structured parameters, and policy-aware handoff previews. |
|
|
| The project fits Track 2, Fine-Tuning on AMD GPUs, while still presenting a Track 1-style agent workflow demo. The MVP uses a deterministic `FakeRouter` so the app works immediately, and includes a LoRA fine-tuning/evaluation path that was run on AMD Developer Cloud with ROCm. |
|
|
| ## Core Thesis |
|
|
| RouterCore demonstrates safe routing, not just routing. It focuses on the step before agent execution: deciding whether a request should be routed, clarified, confirmed, rejected, or escalated before any orchestrator or tool can act on it. |
|
|
| The router is only a recommender. The validator and policy layer provide redundant checks so malformed, low-confidence, ambiguous, or unsafe requests do not become confident agent actions. |
|
|
| ## AMD Hackathon Fit |
|
|
| RouterCore is designed for Track 2: Fine-Tuning on AMD GPUs. A compact Qwen router was fine-tuned with LoRA on AMD Developer Cloud using ROCm, then evaluated against the deterministic router baseline. |
|
|
| It also demonstrates a Track 1-style agentic workflow pattern through the router, validator, policy layer, clarification loop, and orchestrator preview. The demo stays intentionally scoped: it previews execution plans but does not run cloud or infrastructure actions. |
|
|
| Current confirmed ROCm result: a safety-tuned LoRA run on AMD Developer Cloud improved required-field presence from `28.57%` to `100.00%`, workflow accuracy from `97.01%` to `100.00%`, and status accuracy from `57.33%` to `86.67%`, while preserving `100.00%` unsafe rejection accuracy and `0.00%` false route rate. |
|
|
| ## What It Demonstrates |
|
|
| - Workflow routing from natural language |
| - JSON schema-style workflow validation |
| - Policy redundancy after model/router output |
| - Iterative clarification for missing or uncertain fields |
| - Execution preview handoff without real cloud actions |
| - Evaluation and training hooks for future fine-tuning |
|
|
| RouterCore is intentionally not a cloud execution platform. It never creates infrastructure, changes IAM, or executes destructive actions. |
|
|
| ## Mentor / Submission Docs |
|
|
| - [Mentor Pitch](docs/mentor_pitch.md) |
| - [Demo Script](docs/demo_script.md) |
| - [Submission Notes](docs/submission_notes.md) |
| - [Evaluation Comparison](docs/eval_comparison.md) |
| - [Architecture Diagram](docs/architecture.md) |
| - [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md) |
|
|
| ## Evaluation Plan |
|
|
| RouterCore can compare deterministic, prompted, and fine-tuned routers using: |
|
|
| - JSON validity |
| - Workflow accuracy |
| - Status accuracy |
| - Required-field accuracy |
| - Unsafe request rejection accuracy |
| - False route rate |
|
|
| False route rate measures how often the system confidently routes a request that should have been clarified, confirmed, or rejected. |
|
|
| ## Dataset and Evaluation |
|
|
| `training/generate_dataset.py` creates deterministic synthetic `data/train.jsonl` and `data/eval.jsonl` files across success, missing-field, ambiguous, risky-rejected, and confirmation-required cases. The dataset is designed to train and evaluate the router output contract without calling external LLM APIs. |
|
|
| The current baseline is `FakeRouter`, evaluated through the same router, validator, policy, and orchestrator decision path used by the app. The AMD LoRA run uses the same eval set and metrics, making the before/after comparison direct. |
|
|
| False route rate matters because safe agent systems should avoid confidently handing off requests that needed clarification, confirmation, or rejection. A router that looks accurate but has a high false route rate is unsafe for agent execution. |
|
|
| See [Baseline Evaluation](docs/baseline_eval.md) for the current FakeRouter metrics and mentor-facing interpretation. |
|
|
| Generate a comparison report for all available eval artifacts with: |
|
|
| ```bash |
| python -m eval.compare_results |
| ``` |
|
|
| ## Prompted Model Baseline |
|
|
| RouterCore can optionally evaluate a local Hugging Face causal language model as a prompted baseline before LoRA fine-tuning: |
|
|
| ```bash |
| python -m eval.run_model_eval --model Qwen/Qwen2.5-0.5B-Instruct --limit 10 |
| ``` |
|
|
| This path is optional and local-friendly. It does not call paid APIs, and it is skipped gracefully if `transformers` or `torch` are not installed. The goal is to establish a second baseline between `FakeRouter` and a future fine-tuned router. |
|
|
| ## LoRA Fine-Tuning |
|
|
| RouterCore includes an optional LoRA training path for AMD Developer Cloud / ROCm, and it can also run anywhere PyTorch supports the selected model. The included `routercore-qwen-lora-safety-rocm` evaluation artifact was produced from an AMD Developer Cloud ROCm run on an AMD Instinct MI300X VM. |
|
|
| ```bash |
| python -m training.format_dataset |
| ``` |
|
|
| ```bash |
| python -m training.train_lora \ |
| --model Qwen/Qwen2.5-0.5B-Instruct \ |
| --train-file data/routercore_train_instruct.jsonl \ |
| --eval-file data/routercore_eval_instruct.jsonl \ |
| --output-dir outputs/routercore-qwen-lora \ |
| --max-steps 100 |
| ``` |
|
|
| ```bash |
| python -m eval.run_lora_eval \ |
| --base-model Qwen/Qwen2.5-0.5B-Instruct \ |
| --adapter outputs/routercore-qwen-lora \ |
| --limit 25 |
| ``` |
|
|
| This fine-tunes a compact open-source model to emit the RouterCore JSON contract from natural-language DevOps requests, then compares the LoRA adapter against `FakeRouter` and the prompted base model path. |
|
|
| For the next safety-focused AMD iteration, generate a safety-augmented training split and train a second adapter: |
|
|
| ```bash |
| python -m training.generate_dataset --safety-augmented |
| python -m training.format_dataset \ |
| --train-input data/train_safety.jsonl \ |
| --eval-input data/eval.jsonl \ |
| --train-output data/routercore_train_safety_instruct.jsonl \ |
| --eval-output data/routercore_eval_instruct.jsonl |
| ``` |
|
|
| See [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md) for the full ROCm command sequence. |
|
|
| ## Example Flow |
|
|
| Input: |
|
|
| ```text |
| Grant John owner access to production. |
| ``` |
|
|
| The router extracts `grant_iam_role` with parameters such as `principal=John`, `role=owner`, and `scope=production`. The policy layer rejects the request because owner/admin grants are blocked and high-risk production IAM changes are not allowed to proceed as normal routes. |
|
|
| ## Architecture |
|
|
| 1. `FakeRouter` proposes a workflow, confidence score, parameters, candidates, and clarification hints. |
| 2. `validator` checks the route against workflow schema files in `data/schemas`. |
| 3. `policy` makes the authoritative decision, including blocked values, confidence thresholds, unsafe phrase rejection, and high-risk confirmation. |
| 4. `state` preserves the original request, accumulated clarification context, attempts, and latest decisions. |
| 5. `orchestrator` creates a human-readable execution preview for accepted or confirmed routes only. |
|
|
| The router proposes; validation and policy decide. Clarification loops gather missing context and route again. Rejected requests stop without execution, fallback requests move to manual review or a larger orchestrator, and accepted or confirmed routes generate previews only. |
|
|
| ## Workflows |
|
|
| - `create_web_app` |
| - `create_storage_bucket` |
| - `create_service_account` |
| - `grant_iam_role` |
| - `create_scheduler_job` |
|
|
| ## Run Locally |
|
|
| ```bash |
| pip install -r requirements.txt |
| python -m app.gradio_app |
| ``` |
|
|
| Then open the local Gradio URL printed by the command. |
|
|
| ## Hugging Face Space |
|
|
| Live demo: |
| [https://lablab-ai-amd-developer-hackathon-routercore.hf.space](https://lablab-ai-amd-developer-hackathon-routercore.hf.space) |
|
|
| ## Run Tests |
|
|
| ```bash |
| pytest |
| ``` |
|
|
| ## Fine-Tuning Result |
|
|
| The current router is deterministic on purpose. The LoRA experiment fine-tunes a compact model to emit the same router output contract: |
|
|
| ```json |
| { |
| "status": "routed", |
| "workflow": "create_web_app", |
| "confidence": 0.92, |
| "parameters": {}, |
| "missing_fields": [], |
| "candidate_workflows": [], |
| "failure_reasons": [], |
| "clarifying_question": null |
| } |
| ``` |
|
|
| The `training/` folder includes dataset formatting, LoRA training, inference, and LoRA evaluation scripts. The confirmed ROCm run used `torch 2.9.1+rocm6.4`, `torch.version.hip 6.4.43484-123eb5128`, and an `AMD Instinct MI300X VF`. The safety-tuned adapter improved structured routing quality while preserving the safety metrics that matter for agent handoff. |
|
|
| ## Why Policy Redundancy Matters |
|
|
| Fine-tuned routers can be useful but should not be trusted as the final authority. RouterCore separates recommendation from enforcement: |
|
|
| - Validation catches missing and invalid parameters. |
| - Policy rejects unsafe requests such as destructive production changes. |
| - IAM owner/admin grants are blocked even when the router extracts them correctly. |
| - Medium-confidence and high-risk workflows require confirmation. |
| - The orchestrator previews actions but does not execute them. |
|
|
| This makes RouterCore a compact demo of safer agent handoff design. |
|
|