forward-deployed-ai-sim / docs /experiment_log.md
bobaoxu2001
Deploy forward-deployed AI simulation dashboard
c4fe0a4

Experiment Log

Record each experiment with: hypothesis, method, result, decision.


E0 — Data Availability

  • Hypothesis: Public datasets (SAMSum, Enron, HF tickets) can be assembled into 20-40 reproducible case bundles with metadata and weak labels.
  • Status: Not started

E1 — Structuring Feasibility

  • Hypothesis: Root-cause L1/L2 classification + evidence citation schema pass rate is stable (>= 98%).
  • Status: Not started

E2 — Risk Gate

  • Hypothesis: review_required rules capture low-confidence / high-risk samples with precision >= 0.8 and recall >= 0.9.
  • Status: Not started

E3 — Business Insight

  • Hypothesis: VIP x root-cause churn correlation produces actionable, explainable conclusions.
  • Status: Not started

E4 — Iteration Loop

  • Hypothesis: Human review feedback improves specific failure modes (e.g., root cause confusion).
  • Status: Not started