File size: 4,169 Bytes
0584798 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | # Known Limitations
## Model Limits
- `intent_type v0.1` now uses a manually tuned fallback threshold, but some ambiguous prompts still clear as non-safe intent labels.
- `intent_subtype v0.1` is useful for commercial patterns, but still weak on sparse support and price-seeking cases.
- `decision_phase v0.1` is behaviorally useful but still weak in aggregate metrics.
- `iab_content` now uses local embedding retrieval over taxonomy nodes, but exact-path quality still depends on how well the embedding model separates close sibling categories.
- the composed output is still gated by the weaker calibrated components, so action-style queries can over-fallback.
## Boundary Confusion
- `support` and `post_purchase` are cleaner than before, but still fragile on phrasing that mixes setup with a broken flow.
- `consideration` and `decision` can still collapse when a buying query is phrased like a general question.
- `awareness` and `research` remain close for “help me understand” style prompts.
- `deal_seeking` and `education` still blur on price questions phrased as general explanations.
- some broad software topics still need nearest-equivalent IAB mappings because the taxonomy does not have exact SaaS product nodes.
- CRM-style prompts are still a weak spot because retrieval can drift toward adjacent business software categories without a reranker.
## Intent-Type Gaps
- the expanded `intent_type` taxonomy now includes `support`, `exploratory`, `creative_generation`, `chit_chat`, and `prohibited`, but these new classes are still data-light until the next full retraining pass.
- short contextual follow-ups are still sensitive to wording and may be over-fallbacked.
- price-seeking prompts can still flip between `commercial`, `informational`, and fallback depending on phrasing.
## System-Layer Caveats
- the current policy and opportunity logic is still rule-based even though it now uses subtype.
- `commercial_score` is heuristic, not learned.
- the current `intent_type` threshold is manually sweep-tuned rather than learned jointly with the system layer.
- fallback is still intentionally conservative, which is useful for demos but suppresses some strong action signals.
- subtype is now part of the system decision, but the system still trusts `decision_phase` more than subtype on support safety.
- IAB routing uses the local 3.0 taxonomy TSV plus a local embedding index; exact path quality still depends on the retrieval model and the candidate text stored for each taxonomy node.
## Regression Tracking
- structured known-failure tracking now lives in [known_failure_cases.json](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/examples/known_failure_cases.json).
- acceptable weaknesses are now separated from `must_fix` behaviors and can be rerun via [run_regression_suite.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/evaluation/run_regression_suite.py).
- current acceptable weaknesses include signup over-fallback, price-seeking underclassification, and support subtype bleed into `emotional_reflection`.
- IAB behavior-lock checks now live in [iab_behavior_lock_cases.json](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/examples/iab_behavior_lock_cases.json) and can be rerun via [run_iab_mapping_suite.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/evaluation/run_iab_mapping_suite.py).
- curated IAB quality targets live in [iab_mapping_cases.json](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/examples/iab_mapping_cases.json), with broader target coverage in [iab_cross_vertical_mapping_cases.json](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/examples/iab_cross_vertical_mapping_cases.json) and [run_iab_quality_suite.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/evaluation/run_iab_quality_suite.py).
## Product Limits
- no multi-turn memory is used in the combined classifier.
- no real-traffic labeled IAB dataset is produced yet.
- no audit store, trace viewer, or UI is included beyond the local JSON demo endpoint.
|