routercore / README.md
Jayteare's picture
Add live demo link to README
9222f5c verified
---
title: RouterCore
sdk: gradio
app_file: app.py
python_version: "3.11"
---
# RouterCore
RouterCore is a focused proof-of-concept for the AMD Developer Hackathon. It shows how a lightweight routing model can make agentic systems safer and more reliable by converting messy natural-language requests into validated workflow routes, structured parameters, and policy-aware handoff previews.
The project fits Track 2, Fine-Tuning on AMD GPUs, while still presenting a Track 1-style agent workflow demo. The MVP uses a deterministic `FakeRouter` so the app works immediately, and includes a LoRA fine-tuning/evaluation path that was run on AMD Developer Cloud with ROCm.
## Core Thesis
RouterCore demonstrates safe routing, not just routing. It focuses on the step before agent execution: deciding whether a request should be routed, clarified, confirmed, rejected, or escalated before any orchestrator or tool can act on it.
The router is only a recommender. The validator and policy layer provide redundant checks so malformed, low-confidence, ambiguous, or unsafe requests do not become confident agent actions.
## AMD Hackathon Fit
RouterCore is designed for Track 2: Fine-Tuning on AMD GPUs. A compact Qwen router was fine-tuned with LoRA on AMD Developer Cloud using ROCm, then evaluated against the deterministic router baseline.
It also demonstrates a Track 1-style agentic workflow pattern through the router, validator, policy layer, clarification loop, and orchestrator preview. The demo stays intentionally scoped: it previews execution plans but does not run cloud or infrastructure actions.
Current confirmed ROCm result: a safety-tuned LoRA run on AMD Developer Cloud improved required-field presence from `28.57%` to `100.00%`, workflow accuracy from `97.01%` to `100.00%`, and status accuracy from `57.33%` to `86.67%`, while preserving `100.00%` unsafe rejection accuracy and `0.00%` false route rate.
## What It Demonstrates
- Workflow routing from natural language
- JSON schema-style workflow validation
- Policy redundancy after model/router output
- Iterative clarification for missing or uncertain fields
- Execution preview handoff without real cloud actions
- Evaluation and training hooks for future fine-tuning
RouterCore is intentionally not a cloud execution platform. It never creates infrastructure, changes IAM, or executes destructive actions.
## Mentor / Submission Docs
- [Mentor Pitch](docs/mentor_pitch.md)
- [Demo Script](docs/demo_script.md)
- [Submission Notes](docs/submission_notes.md)
- [Evaluation Comparison](docs/eval_comparison.md)
- [Architecture Diagram](docs/architecture.md)
- [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md)
## Evaluation Plan
RouterCore can compare deterministic, prompted, and fine-tuned routers using:
- JSON validity
- Workflow accuracy
- Status accuracy
- Required-field accuracy
- Unsafe request rejection accuracy
- False route rate
False route rate measures how often the system confidently routes a request that should have been clarified, confirmed, or rejected.
## Dataset and Evaluation
`training/generate_dataset.py` creates deterministic synthetic `data/train.jsonl` and `data/eval.jsonl` files across success, missing-field, ambiguous, risky-rejected, and confirmation-required cases. The dataset is designed to train and evaluate the router output contract without calling external LLM APIs.
The current baseline is `FakeRouter`, evaluated through the same router, validator, policy, and orchestrator decision path used by the app. The AMD LoRA run uses the same eval set and metrics, making the before/after comparison direct.
False route rate matters because safe agent systems should avoid confidently handing off requests that needed clarification, confirmation, or rejection. A router that looks accurate but has a high false route rate is unsafe for agent execution.
See [Baseline Evaluation](docs/baseline_eval.md) for the current FakeRouter metrics and mentor-facing interpretation.
Generate a comparison report for all available eval artifacts with:
```bash
python -m eval.compare_results
```
## Prompted Model Baseline
RouterCore can optionally evaluate a local Hugging Face causal language model as a prompted baseline before LoRA fine-tuning:
```bash
python -m eval.run_model_eval --model Qwen/Qwen2.5-0.5B-Instruct --limit 10
```
This path is optional and local-friendly. It does not call paid APIs, and it is skipped gracefully if `transformers` or `torch` are not installed. The goal is to establish a second baseline between `FakeRouter` and a future fine-tuned router.
## LoRA Fine-Tuning
RouterCore includes an optional LoRA training path for AMD Developer Cloud / ROCm, and it can also run anywhere PyTorch supports the selected model. The included `routercore-qwen-lora-safety-rocm` evaluation artifact was produced from an AMD Developer Cloud ROCm run on an AMD Instinct MI300X VM.
```bash
python -m training.format_dataset
```
```bash
python -m training.train_lora \
--model Qwen/Qwen2.5-0.5B-Instruct \
--train-file data/routercore_train_instruct.jsonl \
--eval-file data/routercore_eval_instruct.jsonl \
--output-dir outputs/routercore-qwen-lora \
--max-steps 100
```
```bash
python -m eval.run_lora_eval \
--base-model Qwen/Qwen2.5-0.5B-Instruct \
--adapter outputs/routercore-qwen-lora \
--limit 25
```
This fine-tunes a compact open-source model to emit the RouterCore JSON contract from natural-language DevOps requests, then compares the LoRA adapter against `FakeRouter` and the prompted base model path.
For the next safety-focused AMD iteration, generate a safety-augmented training split and train a second adapter:
```bash
python -m training.generate_dataset --safety-augmented
python -m training.format_dataset \
--train-input data/train_safety.jsonl \
--eval-input data/eval.jsonl \
--train-output data/routercore_train_safety_instruct.jsonl \
--eval-output data/routercore_eval_instruct.jsonl
```
See [AMD Round 2 Safety Plan](docs/amd_round2_safety_plan.md) for the full ROCm command sequence.
## Example Flow
Input:
```text
Grant John owner access to production.
```
The router extracts `grant_iam_role` with parameters such as `principal=John`, `role=owner`, and `scope=production`. The policy layer rejects the request because owner/admin grants are blocked and high-risk production IAM changes are not allowed to proceed as normal routes.
## Architecture
1. `FakeRouter` proposes a workflow, confidence score, parameters, candidates, and clarification hints.
2. `validator` checks the route against workflow schema files in `data/schemas`.
3. `policy` makes the authoritative decision, including blocked values, confidence thresholds, unsafe phrase rejection, and high-risk confirmation.
4. `state` preserves the original request, accumulated clarification context, attempts, and latest decisions.
5. `orchestrator` creates a human-readable execution preview for accepted or confirmed routes only.
The router proposes; validation and policy decide. Clarification loops gather missing context and route again. Rejected requests stop without execution, fallback requests move to manual review or a larger orchestrator, and accepted or confirmed routes generate previews only.
## Workflows
- `create_web_app`
- `create_storage_bucket`
- `create_service_account`
- `grant_iam_role`
- `create_scheduler_job`
## Run Locally
```bash
pip install -r requirements.txt
python -m app.gradio_app
```
Then open the local Gradio URL printed by the command.
## Hugging Face Space
Live demo:
[https://lablab-ai-amd-developer-hackathon-routercore.hf.space](https://lablab-ai-amd-developer-hackathon-routercore.hf.space)
## Run Tests
```bash
pytest
```
## Fine-Tuning Result
The current router is deterministic on purpose. The LoRA experiment fine-tunes a compact model to emit the same router output contract:
```json
{
"status": "routed",
"workflow": "create_web_app",
"confidence": 0.92,
"parameters": {},
"missing_fields": [],
"candidate_workflows": [],
"failure_reasons": [],
"clarifying_question": null
}
```
The `training/` folder includes dataset formatting, LoRA training, inference, and LoRA evaluation scripts. The confirmed ROCm run used `torch 2.9.1+rocm6.4`, `torch.version.hip 6.4.43484-123eb5128`, and an `AMD Instinct MI300X VF`. The safety-tuned adapter improved structured routing quality while preserving the safety metrics that matter for agent handoff.
## Why Policy Redundancy Matters
Fine-tuned routers can be useful but should not be trusted as the final authority. RouterCore separates recommendation from enforcement:
- Validation catches missing and invalid parameters.
- Policy rejects unsafe requests such as destructive production changes.
- IAM owner/admin grants are blocked even when the router extracts them correctly.
- Medium-confidence and high-risk workflows require confirmation.
- The orchestrator previews actions but does not execute them.
This makes RouterCore a compact demo of safer agent handoff design.