agentic-intent-classifier / roadmap_issue_drafts.md
manikumargouni's picture
Upload folder using huggingface_hub
0584798 verified

Roadmap Issue Drafts

These are the next three roadmap issues to open in GitHub once authenticated issue creation is available.

1. Build External Demo UI For Decision Envelope

Suggested title:

Build external demo UI for query -> model_output -> system_decision

Suggested body:

## Goal

Add a simple external-facing demo interface on top of `/classify` so a user can paste a query and see the full decision envelope in a clean, understandable format.

## Scope

- add a lightweight UI for entering a raw query
- render `model_output.classification.intent`
- render fallback state when present
- render `system_decision.policy`
- render `system_decision.opportunity`
- include a few preloaded demo prompts

## Why

The current JSON API is enough for engineering validation, but not enough for partner demos or taxonomy walkthroughs.

## Done When

- someone can run the demo locally and inspect the full output without using curl
- the UI clearly shows query -> classification -> system decision

2. Add Better Support Handling To Intent-Type Layer

Suggested title:

Add dedicated support handling to reduce personal_reflection fallback on account-help prompts

Suggested body:

## Goal

Reduce the current failure mode where support-like prompts such as login and billing issues collapse into `personal_reflection` or low-confidence fallback behavior.

## Scope

- review support-like prompts in the current benchmark
- decide whether to add a dedicated `support` intent-type head or a rule-based override layer
- add a fixed support-oriented evaluation set
- document the chosen approach in `known_limitations.md`

## Why

The `decision_phase` head can already separate `support` reasonably well, but the `intent_type` layer still underperforms on these cases.

## Done When

- support prompts are no longer commonly labeled as `personal_reflection`
- the combined envelope fails safe for support queries with clearer semantics

3. Add Evaluation Harness And Canonical Benchmark Runner

Suggested title:

Add canonical benchmark runner for demo prompts and regression checks

Suggested body:

## Goal

Turn the current prompt suite and canonical examples into a repeatable regression harness.

## Scope

- add a script that runs the fixed demo prompts through `combined_inference.py`
- save outputs to a machine-readable artifact
- compare current outputs against expected behavior notes
- flag meaningful regressions in fallback behavior and phase classification

## Why

The repo now has frozen `v0.1` baselines. A benchmark runner is the clean way to protect demo quality without returning to ad hoc tuning.

## Done When

- one command runs the prompt suite end to end
- current outputs are easy to inspect and compare over time
- demo regressions become visible before external sharing