File size: 9,070 Bytes
25c1e96 1137e50 25c1e96 1137e50 25c1e96 1137e50 9222f5c 1137e50 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | ---
title: RouterCore
sdk: gradio
app_file: app.py
python_version: "3.11"
---
# RouterCore
RouterCore is a focused proof-of-concept for the AMD Developer Hackathon. It shows how a lightweight routing model can make agentic systems safer and more reliable by converting messy natural-language requests into validated workflow routes, structured parameters, and policy-aware handoff previews.
The project fits Track 2, Fine-Tuning on AMD GPUs, while still presenting a Track 1-style agent workflow demo. The MVP uses a deterministic `FakeRouter` so the app works immediately, and includes a LoRA fine-tuning/evaluation path that was run on AMD Developer Cloud with ROCm.
## Core Thesis
RouterCore demonstrates safe routing, not just routing. It focuses on the step before agent execution: deciding whether a request should be routed, clarified, confirmed, rejected, or escalated before any orchestrator or tool can act on it.
The router is only a recommender. The validator and policy layer provide redundant checks so malformed, low-confidence, ambiguous, or unsafe requests do not become confident agent actions.
## AMD Hackathon Fit
RouterCore is designed for Track 2: Fine-Tuning on AMD GPUs. A compact Qwen router was fine-tuned with LoRA on AMD Developer Cloud using ROCm, then evaluated against the deterministic router baseline.
It also demonstrates a Track 1-style agentic workflow pattern through the router, validator, policy layer, clarification loop, and orchestrator preview. The demo stays intentionally scoped: it previews execution plans but does not run cloud or infrastructure actions.
Current confirmed ROCm result: a safety-tuned LoRA run on AMD Developer Cloud improved required-field presence from `28.57%` to `100.00%`, workflow accuracy from `97.01%` to `100.00%`, and status accuracy from `57.33%` to `86.67%`, while preserving `100.00%` unsafe rejection accuracy and `0.00%` false route rate.
## What It Demonstrates
- Workflow routing from natural language
- JSON schema-style workflow validation
- Policy redundancy after model/router output
- Iterative clarification for missing or uncertain fields
- Execution preview handoff without real cloud actions
- Evaluation and training hooks for future fine-tuning
RouterCore is intentionally not a cloud execution platform. It never creates infrastructure, changes IAM, or executes destructive actions.
## Mentor / Submission Docs
- [Mentor Pitch](docs/mentor_pitch.md)
- [Demo Script](docs/demo_script.md)
- [Submission Notes](docs/submission_notes.md)
- [Evaluation Comparison](docs/eval_comparison.md)
- [Architecture Diagram](docs/architecture.md)
- [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md)
## Evaluation Plan
RouterCore can compare deterministic, prompted, and fine-tuned routers using:
- JSON validity
- Workflow accuracy
- Status accuracy
- Required-field accuracy
- Unsafe request rejection accuracy
- False route rate
False route rate measures how often the system confidently routes a request that should have been clarified, confirmed, or rejected.
## Dataset and Evaluation
`training/generate_dataset.py` creates deterministic synthetic `data/train.jsonl` and `data/eval.jsonl` files across success, missing-field, ambiguous, risky-rejected, and confirmation-required cases. The dataset is designed to train and evaluate the router output contract without calling external LLM APIs.
The current baseline is `FakeRouter`, evaluated through the same router, validator, policy, and orchestrator decision path used by the app. The AMD LoRA run uses the same eval set and metrics, making the before/after comparison direct.
False route rate matters because safe agent systems should avoid confidently handing off requests that needed clarification, confirmation, or rejection. A router that looks accurate but has a high false route rate is unsafe for agent execution.
See [Baseline Evaluation](docs/baseline_eval.md) for the current FakeRouter metrics and mentor-facing interpretation.
Generate a comparison report for all available eval artifacts with:
```bash
python -m eval.compare_results
```
## Prompted Model Baseline
RouterCore can optionally evaluate a local Hugging Face causal language model as a prompted baseline before LoRA fine-tuning:
```bash
python -m eval.run_model_eval --model Qwen/Qwen2.5-0.5B-Instruct --limit 10
```
This path is optional and local-friendly. It does not call paid APIs, and it is skipped gracefully if `transformers` or `torch` are not installed. The goal is to establish a second baseline between `FakeRouter` and a future fine-tuned router.
## LoRA Fine-Tuning
RouterCore includes an optional LoRA training path for AMD Developer Cloud / ROCm, and it can also run anywhere PyTorch supports the selected model. The included `routercore-qwen-lora-safety-rocm` evaluation artifact was produced from an AMD Developer Cloud ROCm run on an AMD Instinct MI300X VM.
```bash
python -m training.format_dataset
```
```bash
python -m training.train_lora \
--model Qwen/Qwen2.5-0.5B-Instruct \
--train-file data/routercore_train_instruct.jsonl \
--eval-file data/routercore_eval_instruct.jsonl \
--output-dir outputs/routercore-qwen-lora \
--max-steps 100
```
```bash
python -m eval.run_lora_eval \
--base-model Qwen/Qwen2.5-0.5B-Instruct \
--adapter outputs/routercore-qwen-lora \
--limit 25
```
This fine-tunes a compact open-source model to emit the RouterCore JSON contract from natural-language DevOps requests, then compares the LoRA adapter against `FakeRouter` and the prompted base model path.
For the next safety-focused AMD iteration, generate a safety-augmented training split and train a second adapter:
```bash
python -m training.generate_dataset --safety-augmented
python -m training.format_dataset \
--train-input data/train_safety.jsonl \
--eval-input data/eval.jsonl \
--train-output data/routercore_train_safety_instruct.jsonl \
--eval-output data/routercore_eval_instruct.jsonl
```
See [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md) for the full ROCm command sequence.
## Example Flow
Input:
```text
Grant John owner access to production.
```
The router extracts `grant_iam_role` with parameters such as `principal=John`, `role=owner`, and `scope=production`. The policy layer rejects the request because owner/admin grants are blocked and high-risk production IAM changes are not allowed to proceed as normal routes.
## Architecture
1. `FakeRouter` proposes a workflow, confidence score, parameters, candidates, and clarification hints.
2. `validator` checks the route against workflow schema files in `data/schemas`.
3. `policy` makes the authoritative decision, including blocked values, confidence thresholds, unsafe phrase rejection, and high-risk confirmation.
4. `state` preserves the original request, accumulated clarification context, attempts, and latest decisions.
5. `orchestrator` creates a human-readable execution preview for accepted or confirmed routes only.
The router proposes; validation and policy decide. Clarification loops gather missing context and route again. Rejected requests stop without execution, fallback requests move to manual review or a larger orchestrator, and accepted or confirmed routes generate previews only.
## Workflows
- `create_web_app`
- `create_storage_bucket`
- `create_service_account`
- `grant_iam_role`
- `create_scheduler_job`
## Run Locally
```bash
pip install -r requirements.txt
python -m app.gradio_app
```
Then open the local Gradio URL printed by the command.
## Hugging Face Space
Live demo:
[https://lablab-ai-amd-developer-hackathon-routercore.hf.space](https://lablab-ai-amd-developer-hackathon-routercore.hf.space)
## Run Tests
```bash
pytest
```
## Fine-Tuning Result
The current router is deterministic on purpose. The LoRA experiment fine-tunes a compact model to emit the same router output contract:
```json
{
"status": "routed",
"workflow": "create_web_app",
"confidence": 0.92,
"parameters": {},
"missing_fields": [],
"candidate_workflows": [],
"failure_reasons": [],
"clarifying_question": null
}
```
The `training/` folder includes dataset formatting, LoRA training, inference, and LoRA evaluation scripts. The confirmed ROCm run used `torch 2.9.1+rocm6.4`, `torch.version.hip 6.4.43484-123eb5128`, and an `AMD Instinct MI300X VF`. The safety-tuned adapter improved structured routing quality while preserving the safety metrics that matter for agent handoff.
## Why Policy Redundancy Matters
Fine-tuned routers can be useful but should not be trusted as the final authority. RouterCore separates recommendation from enforcement:
- Validation catches missing and invalid parameters.
- Policy rejects unsafe requests such as destructive production changes.
- IAM owner/admin grants are blocked even when the router extracts them correctly.
- Medium-confidence and high-risk workflows require confirmation.
- The orchestrator previews actions but does not execute them.
This makes RouterCore a compact demo of safer agent handoff design.
|