Known Limitations
Model Limits
intent_type v0.1now uses a manually tuned fallback threshold, but some ambiguous prompts still clear as non-safe intent labels.intent_subtype v0.1is useful for commercial patterns, but still weak on sparse support and price-seeking cases.decision_phase v0.1is behaviorally useful but still weak in aggregate metrics.iab_contentnow uses local embedding retrieval over taxonomy nodes, but exact-path quality still depends on how well the embedding model separates close sibling categories.- the composed output is still gated by the weaker calibrated components, so action-style queries can over-fallback.
Boundary Confusion
supportandpost_purchaseare cleaner than before, but still fragile on phrasing that mixes setup with a broken flow.considerationanddecisioncan still collapse when a buying query is phrased like a general question.awarenessandresearchremain close for “help me understand” style prompts.deal_seekingandeducationstill blur on price questions phrased as general explanations.- some broad software topics still need nearest-equivalent IAB mappings because the taxonomy does not have exact SaaS product nodes.
- CRM-style prompts are still a weak spot because retrieval can drift toward adjacent business software categories without a reranker.
Intent-Type Gaps
- the expanded
intent_typetaxonomy now includessupport,exploratory,creative_generation,chit_chat, andprohibited, but these new classes are still data-light until the next full retraining pass. - short contextual follow-ups are still sensitive to wording and may be over-fallbacked.
- price-seeking prompts can still flip between
commercial,informational, and fallback depending on phrasing.
System-Layer Caveats
- the current policy and opportunity logic is still rule-based even though it now uses subtype.
commercial_scoreis heuristic, not learned.- the current
intent_typethreshold is manually sweep-tuned rather than learned jointly with the system layer. - fallback is still intentionally conservative, which is useful for demos but suppresses some strong action signals.
- subtype is now part of the system decision, but the system still trusts
decision_phasemore than subtype on support safety. - IAB routing uses the local 3.0 taxonomy TSV plus a local embedding index; exact path quality still depends on the retrieval model and the candidate text stored for each taxonomy node.
Regression Tracking
- structured known-failure tracking now lives in known_failure_cases.json.
- acceptable weaknesses are now separated from
must_fixbehaviors and can be rerun via run_regression_suite.py. - current acceptable weaknesses include signup over-fallback, price-seeking underclassification, and support subtype bleed into
emotional_reflection. - IAB behavior-lock checks now live in iab_behavior_lock_cases.json and can be rerun via run_iab_mapping_suite.py.
- curated IAB quality targets live in iab_mapping_cases.json, with broader target coverage in iab_cross_vertical_mapping_cases.json and run_iab_quality_suite.py.
Product Limits
- no multi-turn memory is used in the combined classifier.
- no real-traffic labeled IAB dataset is produced yet.
- no audit store, trace viewer, or UI is included beyond the local JSON demo endpoint.