Spaces:
Running
Running
| # Experiment Log | |
| Record each experiment with: hypothesis, method, result, decision. | |
| --- | |
| ## E0 β Data Availability | |
| - **Hypothesis:** Public datasets (SAMSum, Enron, HF tickets) can be assembled into 20-40 reproducible case bundles with metadata and weak labels. | |
| - **Status:** Not started | |
| ## E1 β Structuring Feasibility | |
| - **Hypothesis:** Root-cause L1/L2 classification + evidence citation schema pass rate is stable (>= 98%). | |
| - **Status:** Not started | |
| ## E2 β Risk Gate | |
| - **Hypothesis:** review_required rules capture low-confidence / high-risk samples with precision >= 0.8 and recall >= 0.9. | |
| - **Status:** Not started | |
| ## E3 β Business Insight | |
| - **Hypothesis:** VIP x root-cause churn correlation produces actionable, explainable conclusions. | |
| - **Status:** Not started | |
| ## E4 β Iteration Loop | |
| - **Hypothesis:** Human review feedback improves specific failure modes (e.g., root cause confusion). | |
| - **Status:** Not started | |