| 2026-04-09 04:54:01,214 [INFO] Intervention Benchmark — testing causal effect validity |
| 2026-04-09 04:54:01,214 [INFO] |
| --- Causal Structure Tests --- |
| 2026-04-09 04:54:01,336 [INFO] Loading faiss with AVX512 support. |
| 2026-04-09 04:54:01,399 [INFO] Successfully loaded faiss with AVX512 support. |
| 2026-04-09 04:54:02,236 [INFO] confounded (seed 1/5) |
| 2026-04-09 04:54:03,411 [INFO] RMSE=0.893 DirAcc=0.000 TrajCorr=0.000 |
| 2026-04-09 04:54:03,411 [INFO] confounded (seed 2/5) |
| 2026-04-09 04:54:03,993 [INFO] RMSE=1.043 DirAcc=0.000 TrajCorr=0.000 |
| 2026-04-09 04:54:03,994 [INFO] confounded (seed 3/5) |
| 2026-04-09 04:54:04,675 [INFO] RMSE=0.333 DirAcc=0.000 TrajCorr=0.000 |
| 2026-04-09 04:54:04,675 [INFO] confounded (seed 4/5) |
| 2026-04-09 04:54:05,365 [INFO] RMSE=0.286 DirAcc=0.000 TrajCorr=0.000 |
| 2026-04-09 04:54:05,365 [INFO] confounded (seed 5/5) |
| 2026-04-09 04:54:06,165 [INFO] RMSE=0.853 DirAcc=0.000 TrajCorr=0.000 |
| 2026-04-09 04:54:06,166 [INFO] mediated (seed 1/5) |
| 2026-04-09 04:54:06,978 [INFO] RMSE=0.846 DirAcc=0.833 TrajCorr=0.263 |
| 2026-04-09 04:54:06,978 [INFO] mediated (seed 2/5) |
| 2026-04-09 04:54:07,669 [INFO] RMSE=0.583 DirAcc=0.283 TrajCorr=-0.392 |
| 2026-04-09 04:54:07,669 [INFO] mediated (seed 3/5) |
| 2026-04-09 04:54:08,584 [INFO] RMSE=1.298 DirAcc=0.683 TrajCorr=-0.318 |
| 2026-04-09 04:54:08,584 [INFO] mediated (seed 4/5) |
| 2026-04-09 04:54:09,269 [INFO] RMSE=0.713 DirAcc=0.300 TrajCorr=-0.579 |
| 2026-04-09 04:54:09,269 [INFO] mediated (seed 5/5) |
| 2026-04-09 04:54:10,274 [INFO] RMSE=1.021 DirAcc=0.233 TrajCorr=0.488 |
| 2026-04-09 04:54:10,274 [INFO] time_varying_confounded (seed 1/5) |
| 2026-04-09 04:54:10,666 [INFO] RMSE=0.235 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:10,666 [INFO] time_varying_confounded (seed 2/5) |
| 2026-04-09 04:54:11,291 [INFO] RMSE=0.506 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:11,291 [INFO] time_varying_confounded (seed 3/5) |
| 2026-04-09 04:54:12,081 [INFO] RMSE=0.180 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:12,081 [INFO] time_varying_confounded (seed 4/5) |
| 2026-04-09 04:54:12,865 [INFO] RMSE=0.448 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:12,865 [INFO] time_varying_confounded (seed 5/5) |
| 2026-04-09 04:54:13,484 [INFO] RMSE=0.680 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:13,484 [INFO] feedback (seed 1/5) |
| 2026-04-09 04:54:14,178 [INFO] RMSE=0.216 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:14,178 [INFO] feedback (seed 2/5) |
| 2026-04-09 04:54:15,076 [INFO] RMSE=0.419 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:15,076 [INFO] feedback (seed 3/5) |
| 2026-04-09 04:54:16,089 [INFO] RMSE=0.632 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:16,089 [INFO] feedback (seed 4/5) |
| 2026-04-09 04:54:16,783 [INFO] RMSE=0.076 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:16,783 [INFO] feedback (seed 5/5) |
| 2026-04-09 04:54:17,674 [INFO] RMSE=0.223 DirAcc=1.000 TrajCorr=0.000 |
| 2026-04-09 04:54:17,674 [INFO] instrumental_variable (seed 1/5) |
| 2026-04-09 04:54:18,378 [INFO] RMSE=0.915 DirAcc=1.000 TrajCorr=0.876 |
| 2026-04-09 04:54:18,378 [INFO] instrumental_variable (seed 2/5) |
| 2026-04-09 04:54:19,191 [INFO] RMSE=0.831 DirAcc=0.783 TrajCorr=-0.887 |
| 2026-04-09 04:54:19,191 [INFO] instrumental_variable (seed 3/5) |
| 2026-04-09 04:54:19,577 [INFO] RMSE=0.708 DirAcc=1.000 TrajCorr=-0.649 |
| 2026-04-09 04:54:19,577 [INFO] instrumental_variable (seed 4/5) |
| 2026-04-09 04:54:20,078 [INFO] RMSE=0.162 DirAcc=1.000 TrajCorr=-0.367 |
| 2026-04-09 04:54:20,078 [INFO] instrumental_variable (seed 5/5) |
| 2026-04-09 04:54:20,480 [INFO] RMSE=0.919 DirAcc=1.000 TrajCorr=0.917 |
| 2026-04-09 04:54:20,480 [INFO] non_identifiable (seed 1/5) |
| 2026-04-09 04:54:20,966 [INFO] RMSE=0.268 DirAcc=1.000 TrajCorr=-0.970 |
| 2026-04-09 04:54:20,966 [INFO] non_identifiable (seed 2/5) |
| 2026-04-09 04:54:21,378 [INFO] RMSE=0.248 DirAcc=1.000 TrajCorr=0.490 |
| 2026-04-09 04:54:21,378 [INFO] non_identifiable (seed 3/5) |
| 2026-04-09 04:54:21,736 [INFO] RMSE=0.121 DirAcc=1.000 TrajCorr=-0.703 |
| 2026-04-09 04:54:21,736 [INFO] non_identifiable (seed 4/5) |
| 2026-04-09 04:54:22,167 [INFO] RMSE=0.246 DirAcc=1.000 TrajCorr=0.217 |
| 2026-04-09 04:54:22,167 [INFO] non_identifiable (seed 5/5) |
| 2026-04-09 04:54:22,645 [INFO] RMSE=0.601 DirAcc=1.000 TrajCorr=-0.409 |
| 2026-04-09 04:54:22,645 [INFO] |
| --- Temporal Intervention Scenarios --- |
| 2026-04-09 04:54:22,645 [INFO] |
| Scenario: Step Intervention |
| 2026-04-09 04:54:22,645 [INFO] Seed 1/5 (seed=42) |
| 2026-04-09 04:54:23,887 [INFO] RMSE=0.4248 ATE_err=0.0578 DirAcc=0.578 |
| 2026-04-09 04:54:23,887 [INFO] Seed 2/5 (seed=142) |
| 2026-04-09 04:54:25,199 [INFO] RMSE=0.3083 ATE_err=0.0097 DirAcc=0.556 |
| 2026-04-09 04:54:25,199 [INFO] Seed 3/5 (seed=242) |
| 2026-04-09 04:54:26,680 [INFO] RMSE=0.4416 ATE_err=0.0539 DirAcc=0.511 |
| 2026-04-09 04:54:26,680 [INFO] Seed 4/5 (seed=342) |
| 2026-04-09 04:54:27,973 [INFO] RMSE=0.3954 ATE_err=0.1642 DirAcc=0.511 |
| 2026-04-09 04:54:27,973 [INFO] Seed 5/5 (seed=442) |
| 2026-04-09 04:54:29,087 [INFO] RMSE=0.6695 ATE_err=0.3789 DirAcc=0.533 |
| 2026-04-09 04:54:29,164 [INFO] |
| Scenario: Dose-Response Curve |
| 2026-04-09 04:54:29,164 [INFO] Seed 1/5 (seed=42) |
| 2026-04-09 04:55:42,566 [INFO] RMSE=0.2550 ATE_err=0.0132 DirAcc=0.578 |
| 2026-04-09 04:55:42,566 [INFO] Seed 2/5 (seed=142) |
| 2026-04-09 04:56:44,799 [INFO] RMSE=0.1873 ATE_err=0.0254 DirAcc=0.589 |
| 2026-04-09 04:56:44,799 [INFO] Seed 3/5 (seed=242) |
| 2026-04-09 04:57:48,565 [INFO] RMSE=0.2736 ATE_err=0.0589 DirAcc=0.511 |
| 2026-04-09 04:57:48,565 [INFO] Seed 4/5 (seed=342) |
| 2026-04-09 04:58:52,675 [INFO] RMSE=0.2455 ATE_err=0.1624 DirAcc=0.511 |
| 2026-04-09 04:58:52,675 [INFO] Seed 5/5 (seed=442) |
| 2026-04-09 04:59:57,885 [INFO] RMSE=0.4179 ATE_err=0.4102 DirAcc=0.533 |
| 2026-04-09 04:59:57,885 [INFO] |
| Scenario: Policy Comparison |
| 2026-04-09 04:59:57,885 [INFO] Seed 1/5 (seed=42) |
| 2026-04-09 05:00:32,476 [INFO] RMSE=0.0230 ATE_err=0.0102 DirAcc=1.000 |
| 2026-04-09 05:00:32,476 [INFO] Seed 2/5 (seed=142) |
| 2026-04-09 05:01:10,676 [INFO] RMSE=0.0214 ATE_err=0.0133 DirAcc=0.000 |
| 2026-04-09 05:01:10,676 [INFO] Seed 3/5 (seed=242) |
| 2026-04-09 05:01:48,781 [INFO] RMSE=0.0536 ATE_err=0.0462 DirAcc=0.000 |
| 2026-04-09 05:01:48,781 [INFO] Seed 4/5 (seed=342) |
| 2026-04-09 05:02:25,480 [INFO] RMSE=0.0844 ATE_err=0.0438 DirAcc=1.000 |
| 2026-04-09 05:02:25,481 [INFO] Seed 5/5 (seed=442) |
| 2026-04-09 05:03:03,275 [INFO] RMSE=0.2676 ATE_err=0.1943 DirAcc=1.000 |
| 2026-04-09 05:03:03,276 [INFO] |
| Scenario: Intervention Timing |
| 2026-04-09 05:03:03,276 [INFO] Seed 1/5 (seed=42) |
| 2026-04-09 05:03:03,792 [INFO] Timing t=50: RMSE=0.2253 ATE_err=0.0555 |
| 2026-04-09 05:03:04,591 [INFO] Timing t=100: RMSE=0.2310 ATE_err=0.1003 |
| 2026-04-09 05:03:05,464 [INFO] Timing t=200: RMSE=0.2434 ATE_err=0.1261 |
| 2026-04-09 05:03:06,264 [INFO] Timing t=500: RMSE=0.2513 ATE_err=0.1282 |
| 2026-04-09 05:03:06,265 [INFO] RMSE=0.2377 ATE_err=0.1025 DirAcc=0.517 |
| 2026-04-09 05:03:06,265 [INFO] Seed 2/5 (seed=142) |
| 2026-04-09 05:03:06,864 [INFO] Timing t=50: RMSE=0.5513 ATE_err=0.3532 |
| 2026-04-09 05:03:07,664 [INFO] Timing t=100: RMSE=0.4284 ATE_err=0.0595 |
| 2026-04-09 05:03:08,464 [INFO] Timing t=200: RMSE=0.7875 ATE_err=0.6660 |
| 2026-04-09 05:03:09,278 [INFO] Timing t=500: RMSE=0.4397 ATE_err=0.0154 |
| 2026-04-09 05:03:09,278 [INFO] RMSE=0.5518 ATE_err=0.2735 DirAcc=0.438 |
| 2026-04-09 05:03:09,278 [INFO] Seed 3/5 (seed=242) |
| 2026-04-09 05:03:09,864 [INFO] Timing t=50: RMSE=0.3754 ATE_err=0.2597 |
| 2026-04-09 05:03:10,566 [INFO] Timing t=100: RMSE=0.3403 ATE_err=0.2084 |
| 2026-04-09 05:03:11,491 [INFO] Timing t=200: RMSE=0.4809 ATE_err=0.3954 |
| 2026-04-09 05:03:12,264 [INFO] Timing t=500: RMSE=0.2722 ATE_err=0.0156 |
| 2026-04-09 05:03:12,265 [INFO] RMSE=0.3672 ATE_err=0.2198 DirAcc=0.508 |
| 2026-04-09 05:03:12,265 [INFO] Seed 4/5 (seed=342) |
| 2026-04-09 05:03:12,864 [INFO] Timing t=50: RMSE=0.4183 ATE_err=0.3306 |
| 2026-04-09 05:03:13,664 [INFO] Timing t=100: RMSE=0.5142 ATE_err=0.4501 |
| 2026-04-09 05:03:14,483 [INFO] Timing t=200: RMSE=0.3095 ATE_err=0.1712 |
| 2026-04-09 05:03:15,264 [INFO] Timing t=500: RMSE=0.2533 ATE_err=0.0399 |
| 2026-04-09 05:03:15,265 [INFO] RMSE=0.3738 ATE_err=0.2480 DirAcc=0.575 |
| 2026-04-09 05:03:15,265 [INFO] Seed 5/5 (seed=442) |
| 2026-04-09 05:03:15,864 [INFO] Timing t=50: RMSE=0.4634 ATE_err=0.1056 |
| 2026-04-09 05:03:16,579 [INFO] Timing t=100: RMSE=0.4908 ATE_err=0.1791 |
| 2026-04-09 05:03:17,264 [INFO] Timing t=200: RMSE=0.7385 ATE_err=0.5837 |
| 2026-04-09 05:03:17,987 [INFO] Timing t=500: RMSE=0.6197 ATE_err=0.4199 |
| 2026-04-09 05:03:17,988 [INFO] RMSE=0.5781 ATE_err=0.3221 DirAcc=0.554 |
|
|
| ================================================================================ |
| INTERVENTION BENCHMARK — Causal Structure Tests |
| ================================================================================ |
| Structure RMSE DirAcc TrajCorr NullDet TrueMean PredMean |
| -------------------------------------------------------------------------------- |
| confounded 0.682 0.000 0.000 0.000 0.000 0.681 |
| mediated 0.892 0.467 -0.108 N/A 0.551 -0.090 |
| time_varying_confounded 0.410 1.000 0.000 N/A 0.591 0.183 |
| feedback 0.313 1.000 0.000 N/A 0.515 0.544 |
| instrumental_variable 0.707 0.957 -0.022 N/A 0.877 0.234 |
| non_identifiable 0.297 1.000 -0.275 N/A 0.600 0.856 |
|
|
| ================================================================================ |
| INTERVENTION BENCHMARK — Temporal Intervention Scenarios |
| ================================================================================ |
| Scenario Seeds Traj RMSE ATE Error Dir Acc |
| -------------------------------------------------------------------------------- |
| step 5 0.4479+/-0.120 0.1329+/-0.133 0.538+/-0.03 |
| dose_response 5 0.2759+/-0.077 0.1340+/-0.148 0.544+/-0.03 |
| policy 5 0.0900+/-0.092 0.0616+/-0.068 0.600+/-0.49 |
| timing 5 0.4217+/-0.127 0.2332+/-0.073 0.518+/-0.05 |
|
|
| 2026-04-09 05:03:17,991 [INFO] Results saved to outputs/benchmarks/intervention_results.json |
|
|