--- title: RouterCore sdk: gradio app_file: app.py python_version: "3.11" --- # RouterCore RouterCore is a focused proof-of-concept for the AMD Developer Hackathon. It shows how a lightweight routing model can make agentic systems safer and more reliable by converting messy natural-language requests into validated workflow routes, structured parameters, and policy-aware handoff previews. The project fits Track 2, Fine-Tuning on AMD GPUs, while still presenting a Track 1-style agent workflow demo. The MVP uses a deterministic `FakeRouter` so the app works immediately, and includes a LoRA fine-tuning/evaluation path that was run on AMD Developer Cloud with ROCm. ## Core Thesis RouterCore demonstrates safe routing, not just routing. It focuses on the step before agent execution: deciding whether a request should be routed, clarified, confirmed, rejected, or escalated before any orchestrator or tool can act on it. The router is only a recommender. The validator and policy layer provide redundant checks so malformed, low-confidence, ambiguous, or unsafe requests do not become confident agent actions. ## AMD Hackathon Fit RouterCore is designed for Track 2: Fine-Tuning on AMD GPUs. A compact Qwen router was fine-tuned with LoRA on AMD Developer Cloud using ROCm, then evaluated against the deterministic router baseline. It also demonstrates a Track 1-style agentic workflow pattern through the router, validator, policy layer, clarification loop, and orchestrator preview. The demo stays intentionally scoped: it previews execution plans but does not run cloud or infrastructure actions. Current confirmed ROCm result: a safety-tuned LoRA run on AMD Developer Cloud improved required-field presence from `28.57%` to `100.00%`, workflow accuracy from `97.01%` to `100.00%`, and status accuracy from `57.33%` to `86.67%`, while preserving `100.00%` unsafe rejection accuracy and `0.00%` false route rate. ## What It Demonstrates - Workflow routing from natural language - JSON schema-style workflow validation - Policy redundancy after model/router output - Iterative clarification for missing or uncertain fields - Execution preview handoff without real cloud actions - Evaluation and training hooks for future fine-tuning RouterCore is intentionally not a cloud execution platform. It never creates infrastructure, changes IAM, or executes destructive actions. ## Mentor / Submission Docs - [Mentor Pitch](docs/mentor_pitch.md) - [Demo Script](docs/demo_script.md) - [Submission Notes](docs/submission_notes.md) - [Evaluation Comparison](docs/eval_comparison.md) - [Architecture Diagram](docs/architecture.md) - [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md) ## Evaluation Plan RouterCore can compare deterministic, prompted, and fine-tuned routers using: - JSON validity - Workflow accuracy - Status accuracy - Required-field accuracy - Unsafe request rejection accuracy - False route rate False route rate measures how often the system confidently routes a request that should have been clarified, confirmed, or rejected. ## Dataset and Evaluation `training/generate_dataset.py` creates deterministic synthetic `data/train.jsonl` and `data/eval.jsonl` files across success, missing-field, ambiguous, risky-rejected, and confirmation-required cases. The dataset is designed to train and evaluate the router output contract without calling external LLM APIs. The current baseline is `FakeRouter`, evaluated through the same router, validator, policy, and orchestrator decision path used by the app. The AMD LoRA run uses the same eval set and metrics, making the before/after comparison direct. False route rate matters because safe agent systems should avoid confidently handing off requests that needed clarification, confirmation, or rejection. A router that looks accurate but has a high false route rate is unsafe for agent execution. See [Baseline Evaluation](docs/baseline_eval.md) for the current FakeRouter metrics and mentor-facing interpretation. Generate a comparison report for all available eval artifacts with: ```bash python -m eval.compare_results ``` ## Prompted Model Baseline RouterCore can optionally evaluate a local Hugging Face causal language model as a prompted baseline before LoRA fine-tuning: ```bash python -m eval.run_model_eval --model Qwen/Qwen2.5-0.5B-Instruct --limit 10 ``` This path is optional and local-friendly. It does not call paid APIs, and it is skipped gracefully if `transformers` or `torch` are not installed. The goal is to establish a second baseline between `FakeRouter` and a future fine-tuned router. ## LoRA Fine-Tuning RouterCore includes an optional LoRA training path for AMD Developer Cloud / ROCm, and it can also run anywhere PyTorch supports the selected model. The included `routercore-qwen-lora-safety-rocm` evaluation artifact was produced from an AMD Developer Cloud ROCm run on an AMD Instinct MI300X VM. ```bash python -m training.format_dataset ``` ```bash python -m training.train_lora \ --model Qwen/Qwen2.5-0.5B-Instruct \ --train-file data/routercore_train_instruct.jsonl \ --eval-file data/routercore_eval_instruct.jsonl \ --output-dir outputs/routercore-qwen-lora \ --max-steps 100 ``` ```bash python -m eval.run_lora_eval \ --base-model Qwen/Qwen2.5-0.5B-Instruct \ --adapter outputs/routercore-qwen-lora \ --limit 25 ``` This fine-tunes a compact open-source model to emit the RouterCore JSON contract from natural-language DevOps requests, then compares the LoRA adapter against `FakeRouter` and the prompted base model path. For the next safety-focused AMD iteration, generate a safety-augmented training split and train a second adapter: ```bash python -m training.generate_dataset --safety-augmented python -m training.format_dataset \ --train-input data/train_safety.jsonl \ --eval-input data/eval.jsonl \ --train-output data/routercore_train_safety_instruct.jsonl \ --eval-output data/routercore_eval_instruct.jsonl ``` See [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md) for the full ROCm command sequence. ## Example Flow Input: ```text Grant John owner access to production. ``` The router extracts `grant_iam_role` with parameters such as `principal=John`, `role=owner`, and `scope=production`. The policy layer rejects the request because owner/admin grants are blocked and high-risk production IAM changes are not allowed to proceed as normal routes. ## Architecture 1. `FakeRouter` proposes a workflow, confidence score, parameters, candidates, and clarification hints. 2. `validator` checks the route against workflow schema files in `data/schemas`. 3. `policy` makes the authoritative decision, including blocked values, confidence thresholds, unsafe phrase rejection, and high-risk confirmation. 4. `state` preserves the original request, accumulated clarification context, attempts, and latest decisions. 5. `orchestrator` creates a human-readable execution preview for accepted or confirmed routes only. The router proposes; validation and policy decide. Clarification loops gather missing context and route again. Rejected requests stop without execution, fallback requests move to manual review or a larger orchestrator, and accepted or confirmed routes generate previews only. ## Workflows - `create_web_app` - `create_storage_bucket` - `create_service_account` - `grant_iam_role` - `create_scheduler_job` ## Run Locally ```bash pip install -r requirements.txt python -m app.gradio_app ``` Then open the local Gradio URL printed by the command. ## Hugging Face Space Live demo: [https://lablab-ai-amd-developer-hackathon-routercore.hf.space](https://lablab-ai-amd-developer-hackathon-routercore.hf.space) ## Run Tests ```bash pytest ``` ## Fine-Tuning Result The current router is deterministic on purpose. The LoRA experiment fine-tunes a compact model to emit the same router output contract: ```json { "status": "routed", "workflow": "create_web_app", "confidence": 0.92, "parameters": {}, "missing_fields": [], "candidate_workflows": [], "failure_reasons": [], "clarifying_question": null } ``` The `training/` folder includes dataset formatting, LoRA training, inference, and LoRA evaluation scripts. The confirmed ROCm run used `torch 2.9.1+rocm6.4`, `torch.version.hip 6.4.43484-123eb5128`, and an `AMD Instinct MI300X VF`. The safety-tuned adapter improved structured routing quality while preserving the safety metrics that matter for agent handoff. ## Why Policy Redundancy Matters Fine-tuned routers can be useful but should not be trusted as the final authority. RouterCore separates recommendation from enforcement: - Validation catches missing and invalid parameters. - Policy rejects unsafe requests such as destructive production changes. - IAM owner/admin grants are blocked even when the router extracts them correctly. - Medium-confidence and high-risk workflows require confirmation. - The orchestrator previews actions but does not execute them. This makes RouterCore a compact demo of safer agent handoff design.