Spaces:

lablab-ai-amd-developer-hackathon
/

routercore

Running

App Files Files Community

routercore / README.md

Jayteare

Add live demo link to README

9222f5c verified 9 days ago

preview code

raw

history blame contribute delete

9.07 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: RouterCore
sdk: gradio
app_file: app.py
python_version: '3.11'

RouterCore

RouterCore is a focused proof-of-concept for the AMD Developer Hackathon. It shows how a lightweight routing model can make agentic systems safer and more reliable by converting messy natural-language requests into validated workflow routes, structured parameters, and policy-aware handoff previews.

The project fits Track 2, Fine-Tuning on AMD GPUs, while still presenting a Track 1-style agent workflow demo. The MVP uses a deterministic FakeRouter so the app works immediately, and includes a LoRA fine-tuning/evaluation path that was run on AMD Developer Cloud with ROCm.

Core Thesis

RouterCore demonstrates safe routing, not just routing. It focuses on the step before agent execution: deciding whether a request should be routed, clarified, confirmed, rejected, or escalated before any orchestrator or tool can act on it.

The router is only a recommender. The validator and policy layer provide redundant checks so malformed, low-confidence, ambiguous, or unsafe requests do not become confident agent actions.

AMD Hackathon Fit

RouterCore is designed for Track 2: Fine-Tuning on AMD GPUs. A compact Qwen router was fine-tuned with LoRA on AMD Developer Cloud using ROCm, then evaluated against the deterministic router baseline.

It also demonstrates a Track 1-style agentic workflow pattern through the router, validator, policy layer, clarification loop, and orchestrator preview. The demo stays intentionally scoped: it previews execution plans but does not run cloud or infrastructure actions.

Current confirmed ROCm result: a safety-tuned LoRA run on AMD Developer Cloud improved required-field presence from 28.57% to 100.00%, workflow accuracy from 97.01% to 100.00%, and status accuracy from 57.33% to 86.67%, while preserving 100.00% unsafe rejection accuracy and 0.00% false route rate.

What It Demonstrates

Workflow routing from natural language
JSON schema-style workflow validation
Policy redundancy after model/router output
Iterative clarification for missing or uncertain fields
Execution preview handoff without real cloud actions
Evaluation and training hooks for future fine-tuning

RouterCore is intentionally not a cloud execution platform. It never creates infrastructure, changes IAM, or executes destructive actions.

Mentor / Submission Docs

Evaluation Plan

RouterCore can compare deterministic, prompted, and fine-tuned routers using:

JSON validity
Workflow accuracy
Status accuracy
Required-field accuracy
Unsafe request rejection accuracy
False route rate

False route rate measures how often the system confidently routes a request that should have been clarified, confirmed, or rejected.

Dataset and Evaluation

training/generate_dataset.py creates deterministic synthetic data/train.jsonl and data/eval.jsonl files across success, missing-field, ambiguous, risky-rejected, and confirmation-required cases. The dataset is designed to train and evaluate the router output contract without calling external LLM APIs.

The current baseline is FakeRouter, evaluated through the same router, validator, policy, and orchestrator decision path used by the app. The AMD LoRA run uses the same eval set and metrics, making the before/after comparison direct.

False route rate matters because safe agent systems should avoid confidently handing off requests that needed clarification, confirmation, or rejection. A router that looks accurate but has a high false route rate is unsafe for agent execution.

See Baseline Evaluation for the current FakeRouter metrics and mentor-facing interpretation.

Generate a comparison report for all available eval artifacts with:

python -m eval.compare_results

Prompted Model Baseline

RouterCore can optionally evaluate a local Hugging Face causal language model as a prompted baseline before LoRA fine-tuning:

python -m eval.run_model_eval --model Qwen/Qwen2.5-0.5B-Instruct --limit 10

This path is optional and local-friendly. It does not call paid APIs, and it is skipped gracefully if transformers or torch are not installed. The goal is to establish a second baseline between FakeRouter and a future fine-tuned router.

LoRA Fine-Tuning

RouterCore includes an optional LoRA training path for AMD Developer Cloud / ROCm, and it can also run anywhere PyTorch supports the selected model. The included routercore-qwen-lora-safety-rocm evaluation artifact was produced from an AMD Developer Cloud ROCm run on an AMD Instinct MI300X VM.

python -m training.format_dataset

python -m training.train_lora \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --train-file data/routercore_train_instruct.jsonl \
  --eval-file data/routercore_eval_instruct.jsonl \
  --output-dir outputs/routercore-qwen-lora \
  --max-steps 100

python -m eval.run_lora_eval \
  --base-model Qwen/Qwen2.5-0.5B-Instruct \
  --adapter outputs/routercore-qwen-lora \
  --limit 25

This fine-tunes a compact open-source model to emit the RouterCore JSON contract from natural-language DevOps requests, then compares the LoRA adapter against FakeRouter and the prompted base model path.

For the next safety-focused AMD iteration, generate a safety-augmented training split and train a second adapter:

python -m training.generate_dataset --safety-augmented
python -m training.format_dataset \
  --train-input data/train_safety.jsonl \
  --eval-input data/eval.jsonl \
  --train-output data/routercore_train_safety_instruct.jsonl \
  --eval-output data/routercore_eval_instruct.jsonl

See AMD Round 2 Safety Plan for the full ROCm command sequence.

Example Flow

Input:

Grant John owner access to production.

The router extracts grant_iam_role with parameters such as principal=John, role=owner, and scope=production. The policy layer rejects the request because owner/admin grants are blocked and high-risk production IAM changes are not allowed to proceed as normal routes.

Architecture

FakeRouter proposes a workflow, confidence score, parameters, candidates, and clarification hints.
validator checks the route against workflow schema files in data/schemas.
policy makes the authoritative decision, including blocked values, confidence thresholds, unsafe phrase rejection, and high-risk confirmation.
state preserves the original request, accumulated clarification context, attempts, and latest decisions.
orchestrator creates a human-readable execution preview for accepted or confirmed routes only.

The router proposes; validation and policy decide. Clarification loops gather missing context and route again. Rejected requests stop without execution, fallback requests move to manual review or a larger orchestrator, and accepted or confirmed routes generate previews only.

Workflows

create_web_app
create_storage_bucket
create_service_account
grant_iam_role
create_scheduler_job

Run Locally

pip install -r requirements.txt
python -m app.gradio_app

Then open the local Gradio URL printed by the command.

Hugging Face Space

Live demo: https://lablab-ai-amd-developer-hackathon-routercore.hf.space

Run Tests

pytest

Fine-Tuning Result

The current router is deterministic on purpose. The LoRA experiment fine-tunes a compact model to emit the same router output contract:

{
  "status": "routed",
  "workflow": "create_web_app",
  "confidence": 0.92,
  "parameters": {},
  "missing_fields": [],
  "candidate_workflows": [],
  "failure_reasons": [],
  "clarifying_question": null
}

The training/ folder includes dataset formatting, LoRA training, inference, and LoRA evaluation scripts. The confirmed ROCm run used torch 2.9.1+rocm6.4, torch.version.hip 6.4.43484-123eb5128, and an AMD Instinct MI300X VF. The safety-tuned adapter improved structured routing quality while preserving the safety metrics that matter for agent handoff.

Why Policy Redundancy Matters

Fine-tuned routers can be useful but should not be trusted as the final authority. RouterCore separates recommendation from enforcement:

Validation catches missing and invalid parameters.
Policy rejects unsafe requests such as destructive production changes.
IAM owner/admin grants are blocked even when the router extracts them correctly.
Medium-confidence and high-risk workflows require confirmation.
The orchestrator previews actions but does not execute them.

This makes RouterCore a compact demo of safer agent handoff design.