agentic-intent-classifier / roadmap_issue_drafts.md

Upload folder using huggingface_hub

0584798 verified 8 days ago

2.84 kB

	# Roadmap Issue Drafts

	These are the next three roadmap issues to open in GitHub once authenticated issue creation is available.

	## 1. Build External Demo UI For Decision Envelope

	Suggested title:

	`Build external demo UI for query -> model_output -> system_decision`

	Suggested body:

	```md
	## Goal

	Add a simple external-facing demo interface on top of `/classify` so a user can paste a query and see the full decision envelope in a clean, understandable format.

	## Scope

	- add a lightweight UI for entering a raw query
	- render `model_output.classification.intent`
	- render fallback state when present
	- render `system_decision.policy`
	- render `system_decision.opportunity`
	- include a few preloaded demo prompts

	## Why

	The current JSON API is enough for engineering validation, but not enough for partner demos or taxonomy walkthroughs.

	## Done When

	- someone can run the demo locally and inspect the full output without using curl
	- the UI clearly shows query -> classification -> system decision
	```

	## 2. Add Better Support Handling To Intent-Type Layer

	Suggested title:

	`Add dedicated support handling to reduce personal_reflection fallback on account-help prompts`

	Suggested body:

	```md
	## Goal

	Reduce the current failure mode where support-like prompts such as login and billing issues collapse into `personal_reflection` or low-confidence fallback behavior.

	## Scope

	- review support-like prompts in the current benchmark
	- decide whether to add a dedicated `support` intent-type head or a rule-based override layer
	- add a fixed support-oriented evaluation set
	- document the chosen approach in `known_limitations.md`

	## Why

	The `decision_phase` head can already separate `support` reasonably well, but the `intent_type` layer still underperforms on these cases.

	## Done When

	- support prompts are no longer commonly labeled as `personal_reflection`
	- the combined envelope fails safe for support queries with clearer semantics
	```

	## 3. Add Evaluation Harness And Canonical Benchmark Runner

	Suggested title:

	`Add canonical benchmark runner for demo prompts and regression checks`

	Suggested body:

	```md
	## Goal

	Turn the current prompt suite and canonical examples into a repeatable regression harness.

	## Scope

	- add a script that runs the fixed demo prompts through `combined_inference.py`
	- save outputs to a machine-readable artifact
	- compare current outputs against expected behavior notes
	- flag meaningful regressions in fallback behavior and phase classification

	## Why

	The repo now has frozen `v0.1` baselines. A benchmark runner is the clean way to protect demo quality without returning to ad hoc tuning.

	## Done When

	- one command runs the prompt suite end to end
	- current outputs are easy to inspect and compare over time
	- demo regressions become visible before external sharing
	```