Upload benchmarks/intervention.log with huggingface_hub
Browse files- benchmarks/intervention.log +159 -0
benchmarks/intervention.log
ADDED
|
@@ -0,0 +1,159 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
2026-04-09 04:54:01,214 [INFO] Intervention Benchmark — testing causal effect validity
|
| 2 |
+
2026-04-09 04:54:01,214 [INFO]
|
| 3 |
+
--- Causal Structure Tests ---
|
| 4 |
+
2026-04-09 04:54:01,336 [INFO] Loading faiss with AVX512 support.
|
| 5 |
+
2026-04-09 04:54:01,399 [INFO] Successfully loaded faiss with AVX512 support.
|
| 6 |
+
2026-04-09 04:54:02,236 [INFO] confounded (seed 1/5)
|
| 7 |
+
2026-04-09 04:54:03,411 [INFO] RMSE=0.893 DirAcc=0.000 TrajCorr=0.000
|
| 8 |
+
2026-04-09 04:54:03,411 [INFO] confounded (seed 2/5)
|
| 9 |
+
2026-04-09 04:54:03,993 [INFO] RMSE=1.043 DirAcc=0.000 TrajCorr=0.000
|
| 10 |
+
2026-04-09 04:54:03,994 [INFO] confounded (seed 3/5)
|
| 11 |
+
2026-04-09 04:54:04,675 [INFO] RMSE=0.333 DirAcc=0.000 TrajCorr=0.000
|
| 12 |
+
2026-04-09 04:54:04,675 [INFO] confounded (seed 4/5)
|
| 13 |
+
2026-04-09 04:54:05,365 [INFO] RMSE=0.286 DirAcc=0.000 TrajCorr=0.000
|
| 14 |
+
2026-04-09 04:54:05,365 [INFO] confounded (seed 5/5)
|
| 15 |
+
2026-04-09 04:54:06,165 [INFO] RMSE=0.853 DirAcc=0.000 TrajCorr=0.000
|
| 16 |
+
2026-04-09 04:54:06,166 [INFO] mediated (seed 1/5)
|
| 17 |
+
2026-04-09 04:54:06,978 [INFO] RMSE=0.846 DirAcc=0.833 TrajCorr=0.263
|
| 18 |
+
2026-04-09 04:54:06,978 [INFO] mediated (seed 2/5)
|
| 19 |
+
2026-04-09 04:54:07,669 [INFO] RMSE=0.583 DirAcc=0.283 TrajCorr=-0.392
|
| 20 |
+
2026-04-09 04:54:07,669 [INFO] mediated (seed 3/5)
|
| 21 |
+
2026-04-09 04:54:08,584 [INFO] RMSE=1.298 DirAcc=0.683 TrajCorr=-0.318
|
| 22 |
+
2026-04-09 04:54:08,584 [INFO] mediated (seed 4/5)
|
| 23 |
+
2026-04-09 04:54:09,269 [INFO] RMSE=0.713 DirAcc=0.300 TrajCorr=-0.579
|
| 24 |
+
2026-04-09 04:54:09,269 [INFO] mediated (seed 5/5)
|
| 25 |
+
2026-04-09 04:54:10,274 [INFO] RMSE=1.021 DirAcc=0.233 TrajCorr=0.488
|
| 26 |
+
2026-04-09 04:54:10,274 [INFO] time_varying_confounded (seed 1/5)
|
| 27 |
+
2026-04-09 04:54:10,666 [INFO] RMSE=0.235 DirAcc=1.000 TrajCorr=0.000
|
| 28 |
+
2026-04-09 04:54:10,666 [INFO] time_varying_confounded (seed 2/5)
|
| 29 |
+
2026-04-09 04:54:11,291 [INFO] RMSE=0.506 DirAcc=1.000 TrajCorr=0.000
|
| 30 |
+
2026-04-09 04:54:11,291 [INFO] time_varying_confounded (seed 3/5)
|
| 31 |
+
2026-04-09 04:54:12,081 [INFO] RMSE=0.180 DirAcc=1.000 TrajCorr=0.000
|
| 32 |
+
2026-04-09 04:54:12,081 [INFO] time_varying_confounded (seed 4/5)
|
| 33 |
+
2026-04-09 04:54:12,865 [INFO] RMSE=0.448 DirAcc=1.000 TrajCorr=0.000
|
| 34 |
+
2026-04-09 04:54:12,865 [INFO] time_varying_confounded (seed 5/5)
|
| 35 |
+
2026-04-09 04:54:13,484 [INFO] RMSE=0.680 DirAcc=1.000 TrajCorr=0.000
|
| 36 |
+
2026-04-09 04:54:13,484 [INFO] feedback (seed 1/5)
|
| 37 |
+
2026-04-09 04:54:14,178 [INFO] RMSE=0.216 DirAcc=1.000 TrajCorr=0.000
|
| 38 |
+
2026-04-09 04:54:14,178 [INFO] feedback (seed 2/5)
|
| 39 |
+
2026-04-09 04:54:15,076 [INFO] RMSE=0.419 DirAcc=1.000 TrajCorr=0.000
|
| 40 |
+
2026-04-09 04:54:15,076 [INFO] feedback (seed 3/5)
|
| 41 |
+
2026-04-09 04:54:16,089 [INFO] RMSE=0.632 DirAcc=1.000 TrajCorr=0.000
|
| 42 |
+
2026-04-09 04:54:16,089 [INFO] feedback (seed 4/5)
|
| 43 |
+
2026-04-09 04:54:16,783 [INFO] RMSE=0.076 DirAcc=1.000 TrajCorr=0.000
|
| 44 |
+
2026-04-09 04:54:16,783 [INFO] feedback (seed 5/5)
|
| 45 |
+
2026-04-09 04:54:17,674 [INFO] RMSE=0.223 DirAcc=1.000 TrajCorr=0.000
|
| 46 |
+
2026-04-09 04:54:17,674 [INFO] instrumental_variable (seed 1/5)
|
| 47 |
+
2026-04-09 04:54:18,378 [INFO] RMSE=0.915 DirAcc=1.000 TrajCorr=0.876
|
| 48 |
+
2026-04-09 04:54:18,378 [INFO] instrumental_variable (seed 2/5)
|
| 49 |
+
2026-04-09 04:54:19,191 [INFO] RMSE=0.831 DirAcc=0.783 TrajCorr=-0.887
|
| 50 |
+
2026-04-09 04:54:19,191 [INFO] instrumental_variable (seed 3/5)
|
| 51 |
+
2026-04-09 04:54:19,577 [INFO] RMSE=0.708 DirAcc=1.000 TrajCorr=-0.649
|
| 52 |
+
2026-04-09 04:54:19,577 [INFO] instrumental_variable (seed 4/5)
|
| 53 |
+
2026-04-09 04:54:20,078 [INFO] RMSE=0.162 DirAcc=1.000 TrajCorr=-0.367
|
| 54 |
+
2026-04-09 04:54:20,078 [INFO] instrumental_variable (seed 5/5)
|
| 55 |
+
2026-04-09 04:54:20,480 [INFO] RMSE=0.919 DirAcc=1.000 TrajCorr=0.917
|
| 56 |
+
2026-04-09 04:54:20,480 [INFO] non_identifiable (seed 1/5)
|
| 57 |
+
2026-04-09 04:54:20,966 [INFO] RMSE=0.268 DirAcc=1.000 TrajCorr=-0.970
|
| 58 |
+
2026-04-09 04:54:20,966 [INFO] non_identifiable (seed 2/5)
|
| 59 |
+
2026-04-09 04:54:21,378 [INFO] RMSE=0.248 DirAcc=1.000 TrajCorr=0.490
|
| 60 |
+
2026-04-09 04:54:21,378 [INFO] non_identifiable (seed 3/5)
|
| 61 |
+
2026-04-09 04:54:21,736 [INFO] RMSE=0.121 DirAcc=1.000 TrajCorr=-0.703
|
| 62 |
+
2026-04-09 04:54:21,736 [INFO] non_identifiable (seed 4/5)
|
| 63 |
+
2026-04-09 04:54:22,167 [INFO] RMSE=0.246 DirAcc=1.000 TrajCorr=0.217
|
| 64 |
+
2026-04-09 04:54:22,167 [INFO] non_identifiable (seed 5/5)
|
| 65 |
+
2026-04-09 04:54:22,645 [INFO] RMSE=0.601 DirAcc=1.000 TrajCorr=-0.409
|
| 66 |
+
2026-04-09 04:54:22,645 [INFO]
|
| 67 |
+
--- Temporal Intervention Scenarios ---
|
| 68 |
+
2026-04-09 04:54:22,645 [INFO]
|
| 69 |
+
Scenario: Step Intervention
|
| 70 |
+
2026-04-09 04:54:22,645 [INFO] Seed 1/5 (seed=42)
|
| 71 |
+
2026-04-09 04:54:23,887 [INFO] RMSE=0.4248 ATE_err=0.0578 DirAcc=0.578
|
| 72 |
+
2026-04-09 04:54:23,887 [INFO] Seed 2/5 (seed=142)
|
| 73 |
+
2026-04-09 04:54:25,199 [INFO] RMSE=0.3083 ATE_err=0.0097 DirAcc=0.556
|
| 74 |
+
2026-04-09 04:54:25,199 [INFO] Seed 3/5 (seed=242)
|
| 75 |
+
2026-04-09 04:54:26,680 [INFO] RMSE=0.4416 ATE_err=0.0539 DirAcc=0.511
|
| 76 |
+
2026-04-09 04:54:26,680 [INFO] Seed 4/5 (seed=342)
|
| 77 |
+
2026-04-09 04:54:27,973 [INFO] RMSE=0.3954 ATE_err=0.1642 DirAcc=0.511
|
| 78 |
+
2026-04-09 04:54:27,973 [INFO] Seed 5/5 (seed=442)
|
| 79 |
+
2026-04-09 04:54:29,087 [INFO] RMSE=0.6695 ATE_err=0.3789 DirAcc=0.533
|
| 80 |
+
2026-04-09 04:54:29,164 [INFO]
|
| 81 |
+
Scenario: Dose-Response Curve
|
| 82 |
+
2026-04-09 04:54:29,164 [INFO] Seed 1/5 (seed=42)
|
| 83 |
+
2026-04-09 04:55:42,566 [INFO] RMSE=0.2550 ATE_err=0.0132 DirAcc=0.578
|
| 84 |
+
2026-04-09 04:55:42,566 [INFO] Seed 2/5 (seed=142)
|
| 85 |
+
2026-04-09 04:56:44,799 [INFO] RMSE=0.1873 ATE_err=0.0254 DirAcc=0.589
|
| 86 |
+
2026-04-09 04:56:44,799 [INFO] Seed 3/5 (seed=242)
|
| 87 |
+
2026-04-09 04:57:48,565 [INFO] RMSE=0.2736 ATE_err=0.0589 DirAcc=0.511
|
| 88 |
+
2026-04-09 04:57:48,565 [INFO] Seed 4/5 (seed=342)
|
| 89 |
+
2026-04-09 04:58:52,675 [INFO] RMSE=0.2455 ATE_err=0.1624 DirAcc=0.511
|
| 90 |
+
2026-04-09 04:58:52,675 [INFO] Seed 5/5 (seed=442)
|
| 91 |
+
2026-04-09 04:59:57,885 [INFO] RMSE=0.4179 ATE_err=0.4102 DirAcc=0.533
|
| 92 |
+
2026-04-09 04:59:57,885 [INFO]
|
| 93 |
+
Scenario: Policy Comparison
|
| 94 |
+
2026-04-09 04:59:57,885 [INFO] Seed 1/5 (seed=42)
|
| 95 |
+
2026-04-09 05:00:32,476 [INFO] RMSE=0.0230 ATE_err=0.0102 DirAcc=1.000
|
| 96 |
+
2026-04-09 05:00:32,476 [INFO] Seed 2/5 (seed=142)
|
| 97 |
+
2026-04-09 05:01:10,676 [INFO] RMSE=0.0214 ATE_err=0.0133 DirAcc=0.000
|
| 98 |
+
2026-04-09 05:01:10,676 [INFO] Seed 3/5 (seed=242)
|
| 99 |
+
2026-04-09 05:01:48,781 [INFO] RMSE=0.0536 ATE_err=0.0462 DirAcc=0.000
|
| 100 |
+
2026-04-09 05:01:48,781 [INFO] Seed 4/5 (seed=342)
|
| 101 |
+
2026-04-09 05:02:25,480 [INFO] RMSE=0.0844 ATE_err=0.0438 DirAcc=1.000
|
| 102 |
+
2026-04-09 05:02:25,481 [INFO] Seed 5/5 (seed=442)
|
| 103 |
+
2026-04-09 05:03:03,275 [INFO] RMSE=0.2676 ATE_err=0.1943 DirAcc=1.000
|
| 104 |
+
2026-04-09 05:03:03,276 [INFO]
|
| 105 |
+
Scenario: Intervention Timing
|
| 106 |
+
2026-04-09 05:03:03,276 [INFO] Seed 1/5 (seed=42)
|
| 107 |
+
2026-04-09 05:03:03,792 [INFO] Timing t=50: RMSE=0.2253 ATE_err=0.0555
|
| 108 |
+
2026-04-09 05:03:04,591 [INFO] Timing t=100: RMSE=0.2310 ATE_err=0.1003
|
| 109 |
+
2026-04-09 05:03:05,464 [INFO] Timing t=200: RMSE=0.2434 ATE_err=0.1261
|
| 110 |
+
2026-04-09 05:03:06,264 [INFO] Timing t=500: RMSE=0.2513 ATE_err=0.1282
|
| 111 |
+
2026-04-09 05:03:06,265 [INFO] RMSE=0.2377 ATE_err=0.1025 DirAcc=0.517
|
| 112 |
+
2026-04-09 05:03:06,265 [INFO] Seed 2/5 (seed=142)
|
| 113 |
+
2026-04-09 05:03:06,864 [INFO] Timing t=50: RMSE=0.5513 ATE_err=0.3532
|
| 114 |
+
2026-04-09 05:03:07,664 [INFO] Timing t=100: RMSE=0.4284 ATE_err=0.0595
|
| 115 |
+
2026-04-09 05:03:08,464 [INFO] Timing t=200: RMSE=0.7875 ATE_err=0.6660
|
| 116 |
+
2026-04-09 05:03:09,278 [INFO] Timing t=500: RMSE=0.4397 ATE_err=0.0154
|
| 117 |
+
2026-04-09 05:03:09,278 [INFO] RMSE=0.5518 ATE_err=0.2735 DirAcc=0.438
|
| 118 |
+
2026-04-09 05:03:09,278 [INFO] Seed 3/5 (seed=242)
|
| 119 |
+
2026-04-09 05:03:09,864 [INFO] Timing t=50: RMSE=0.3754 ATE_err=0.2597
|
| 120 |
+
2026-04-09 05:03:10,566 [INFO] Timing t=100: RMSE=0.3403 ATE_err=0.2084
|
| 121 |
+
2026-04-09 05:03:11,491 [INFO] Timing t=200: RMSE=0.4809 ATE_err=0.3954
|
| 122 |
+
2026-04-09 05:03:12,264 [INFO] Timing t=500: RMSE=0.2722 ATE_err=0.0156
|
| 123 |
+
2026-04-09 05:03:12,265 [INFO] RMSE=0.3672 ATE_err=0.2198 DirAcc=0.508
|
| 124 |
+
2026-04-09 05:03:12,265 [INFO] Seed 4/5 (seed=342)
|
| 125 |
+
2026-04-09 05:03:12,864 [INFO] Timing t=50: RMSE=0.4183 ATE_err=0.3306
|
| 126 |
+
2026-04-09 05:03:13,664 [INFO] Timing t=100: RMSE=0.5142 ATE_err=0.4501
|
| 127 |
+
2026-04-09 05:03:14,483 [INFO] Timing t=200: RMSE=0.3095 ATE_err=0.1712
|
| 128 |
+
2026-04-09 05:03:15,264 [INFO] Timing t=500: RMSE=0.2533 ATE_err=0.0399
|
| 129 |
+
2026-04-09 05:03:15,265 [INFO] RMSE=0.3738 ATE_err=0.2480 DirAcc=0.575
|
| 130 |
+
2026-04-09 05:03:15,265 [INFO] Seed 5/5 (seed=442)
|
| 131 |
+
2026-04-09 05:03:15,864 [INFO] Timing t=50: RMSE=0.4634 ATE_err=0.1056
|
| 132 |
+
2026-04-09 05:03:16,579 [INFO] Timing t=100: RMSE=0.4908 ATE_err=0.1791
|
| 133 |
+
2026-04-09 05:03:17,264 [INFO] Timing t=200: RMSE=0.7385 ATE_err=0.5837
|
| 134 |
+
2026-04-09 05:03:17,987 [INFO] Timing t=500: RMSE=0.6197 ATE_err=0.4199
|
| 135 |
+
2026-04-09 05:03:17,988 [INFO] RMSE=0.5781 ATE_err=0.3221 DirAcc=0.554
|
| 136 |
+
|
| 137 |
+
================================================================================
|
| 138 |
+
INTERVENTION BENCHMARK — Causal Structure Tests
|
| 139 |
+
================================================================================
|
| 140 |
+
Structure RMSE DirAcc TrajCorr NullDet TrueMean PredMean
|
| 141 |
+
--------------------------------------------------------------------------------
|
| 142 |
+
confounded 0.682 0.000 0.000 0.000 0.000 0.681
|
| 143 |
+
mediated 0.892 0.467 -0.108 N/A 0.551 -0.090
|
| 144 |
+
time_varying_confounded 0.410 1.000 0.000 N/A 0.591 0.183
|
| 145 |
+
feedback 0.313 1.000 0.000 N/A 0.515 0.544
|
| 146 |
+
instrumental_variable 0.707 0.957 -0.022 N/A 0.877 0.234
|
| 147 |
+
non_identifiable 0.297 1.000 -0.275 N/A 0.600 0.856
|
| 148 |
+
|
| 149 |
+
================================================================================
|
| 150 |
+
INTERVENTION BENCHMARK — Temporal Intervention Scenarios
|
| 151 |
+
================================================================================
|
| 152 |
+
Scenario Seeds Traj RMSE ATE Error Dir Acc
|
| 153 |
+
--------------------------------------------------------------------------------
|
| 154 |
+
step 5 0.4479+/-0.120 0.1329+/-0.133 0.538+/-0.03
|
| 155 |
+
dose_response 5 0.2759+/-0.077 0.1340+/-0.148 0.544+/-0.03
|
| 156 |
+
policy 5 0.0900+/-0.092 0.0616+/-0.068 0.600+/-0.49
|
| 157 |
+
timing 5 0.4217+/-0.127 0.2332+/-0.073 0.518+/-0.05
|
| 158 |
+
|
| 159 |
+
2026-04-09 05:03:17,991 [INFO] Results saved to outputs/benchmarks/intervention_results.json
|