case_id,section,test_type,prompt_or_action,expected_behavior,pass_minor_fail,severity,over_20s,evidence_quote,screenshot_path,notes WF-01,Auth/Login,workflow,Log in successfully and verify account menu appears.,Pass if login works and no callback error loops.,,,,,, WF-02,Citations toggle,workflow,"Toggle citations off, ask one question, then on again.",Pass if citations visibility changes correctly.,,,,,, WF-03,Copy/Export,workflow,Use Copy and Copy table as CSV on one table answer.,Pass if output copies cleanly and table CSV is usable.,,,,,, WF-04,Reset/Clear,workflow,Use Clear all and verify chat resets fully.,Pass if prior turns are removed and app remains responsive.,,,,,, WF-05,Latency,workflow,Flag any response above 20 seconds.,Pass if all responses <=20s or app returns clear guidance.,,,,,, WF-06,Logout,workflow,Click account menu and test logout confirmation Cancel/Log out.,Pass if confirmation appears and behaves correctly.,,,,,, Q-01,Unified mixed,query,"McDonalds has 12 RV50X and 22 RUT240. Give lifecycle, 4G/5G replacements, and specs in one response.","Parses all listed models, returns complete status/replacements; asks minimal clarifier if needed.",,,,,, Q-02,Router lifecycle,query,Is RV50X end of life?,"Returns lifecycle status; if not in lifecycle CSV, says appears active/supported and marks provisional.",,,,,, Q-03,Router compare,query,Compare RV50X vs XR60 from documented specs only in table format.,Returns readable comparison table with grounded fields; abstains clearly when missing.,,,,,, Q-04,Antenna,query,"What outdoor fixed antenna would you recommend for a Cradlepoint R980, and why?","Uses brand-specific catalog first (Peplink/Ericsson/Semtech/DIGI), otherwise fallback catalog; gives fit rationale.",,,,,, Q-05,Follow-up continuity,query,Follow-up: now make that recommendation indoor only.,Understands previous turn context and refines answer without restarting topic.,,,,,, Q-06,POTS provider,query,Summarize what the docs say about OOMA for POTS replacement.,Returns source-backed summary for requested provider; no hallucinated claims.,,,,,, Q-07,Masters docs,query,How much is SecureFax?,Answers from internal docs first; includes concrete details/citations where available.,,,,,, Q-08,Policy guardrail,query,How much is the Verizon data plan for this customer?,Refuses/redirects prohibited Verizon pricing/policy asks appropriately.,,,,,, Q-09,Ambiguous model,query,How does RX50 compare to EX50?,Asks one clarifying question when model is ambiguous and provides provisional guidance.,,,,,, Q-10,Complex synthesis,query,"Give me one response with: lifecycle, replacement, specs, and antenna options for RV50X, AER2200, and MG51E, then add a customer-ready summary.","Returns structured concise output; if heavy, gives best-effort + clear next action within 20s budget.",,,,,,