Spaces:

ax2183
/

forward-deployed-ai-sim

Running

bobaoxu2001

Deploy forward-deployed AI simulation dashboard

c4fe0a4 10 days ago

938 Bytes

	# Experiment Log

	Record each experiment with: hypothesis, method, result, decision.

	---

	## E0 — Data Availability
	- Hypothesis: Public datasets (SAMSum, Enron, HF tickets) can be assembled into 20-40 reproducible case bundles with metadata and weak labels.
	- Status: Not started

	## E1 — Structuring Feasibility
	- Hypothesis: Root-cause L1/L2 classification + evidence citation schema pass rate is stable (>= 98%).
	- Status: Not started

	## E2 — Risk Gate
	- Hypothesis: review_required rules capture low-confidence / high-risk samples with precision >= 0.8 and recall >= 0.9.
	- Status: Not started

	## E3 — Business Insight
	- Hypothesis: VIP x root-cause churn correlation produces actionable, explainable conclusions.
	- Status: Not started

	## E4 — Iteration Loop
	- Hypothesis: Human review feedback improves specific failure modes (e.g., root cause confusion).
	- Status: Not started

	# Experiment Log

	Record each experiment with: hypothesis, method, result, decision.

	---

	## E0 — Data Availability
	- Hypothesis: Public datasets (SAMSum, Enron, HF tickets) can be assembled into 20-40 reproducible case bundles with metadata and weak labels.
	- Status: Not started

	## E1 — Structuring Feasibility
	- Hypothesis: Root-cause L1/L2 classification + evidence citation schema pass rate is stable (>= 98%).
	- Status: Not started

	## E2 — Risk Gate
	- Hypothesis: review_required rules capture low-confidence / high-risk samples with precision >= 0.8 and recall >= 0.9.
	- Status: Not started

	## E3 — Business Insight
	- Hypothesis: VIP x root-cause churn correlation produces actionable, explainable conclusions.
	- Status: Not started

	## E4 — Iteration Loop
	- Hypothesis: Human review feedback improves specific failure modes (e.g., root cause confusion).
	- Status: Not started