File size: 10,319 Bytes
b5b1bb1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
2026-04-09 04:54:01,214 [INFO] Intervention Benchmark — testing causal effect validity
2026-04-09 04:54:01,214 [INFO] 
--- Causal Structure Tests ---
2026-04-09 04:54:01,336 [INFO] Loading faiss with AVX512 support.
2026-04-09 04:54:01,399 [INFO] Successfully loaded faiss with AVX512 support.
2026-04-09 04:54:02,236 [INFO]   confounded (seed 1/5)
2026-04-09 04:54:03,411 [INFO]     RMSE=0.893 DirAcc=0.000 TrajCorr=0.000
2026-04-09 04:54:03,411 [INFO]   confounded (seed 2/5)
2026-04-09 04:54:03,993 [INFO]     RMSE=1.043 DirAcc=0.000 TrajCorr=0.000
2026-04-09 04:54:03,994 [INFO]   confounded (seed 3/5)
2026-04-09 04:54:04,675 [INFO]     RMSE=0.333 DirAcc=0.000 TrajCorr=0.000
2026-04-09 04:54:04,675 [INFO]   confounded (seed 4/5)
2026-04-09 04:54:05,365 [INFO]     RMSE=0.286 DirAcc=0.000 TrajCorr=0.000
2026-04-09 04:54:05,365 [INFO]   confounded (seed 5/5)
2026-04-09 04:54:06,165 [INFO]     RMSE=0.853 DirAcc=0.000 TrajCorr=0.000
2026-04-09 04:54:06,166 [INFO]   mediated (seed 1/5)
2026-04-09 04:54:06,978 [INFO]     RMSE=0.846 DirAcc=0.833 TrajCorr=0.263
2026-04-09 04:54:06,978 [INFO]   mediated (seed 2/5)
2026-04-09 04:54:07,669 [INFO]     RMSE=0.583 DirAcc=0.283 TrajCorr=-0.392
2026-04-09 04:54:07,669 [INFO]   mediated (seed 3/5)
2026-04-09 04:54:08,584 [INFO]     RMSE=1.298 DirAcc=0.683 TrajCorr=-0.318
2026-04-09 04:54:08,584 [INFO]   mediated (seed 4/5)
2026-04-09 04:54:09,269 [INFO]     RMSE=0.713 DirAcc=0.300 TrajCorr=-0.579
2026-04-09 04:54:09,269 [INFO]   mediated (seed 5/5)
2026-04-09 04:54:10,274 [INFO]     RMSE=1.021 DirAcc=0.233 TrajCorr=0.488
2026-04-09 04:54:10,274 [INFO]   time_varying_confounded (seed 1/5)
2026-04-09 04:54:10,666 [INFO]     RMSE=0.235 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:10,666 [INFO]   time_varying_confounded (seed 2/5)
2026-04-09 04:54:11,291 [INFO]     RMSE=0.506 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:11,291 [INFO]   time_varying_confounded (seed 3/5)
2026-04-09 04:54:12,081 [INFO]     RMSE=0.180 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:12,081 [INFO]   time_varying_confounded (seed 4/5)
2026-04-09 04:54:12,865 [INFO]     RMSE=0.448 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:12,865 [INFO]   time_varying_confounded (seed 5/5)
2026-04-09 04:54:13,484 [INFO]     RMSE=0.680 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:13,484 [INFO]   feedback (seed 1/5)
2026-04-09 04:54:14,178 [INFO]     RMSE=0.216 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:14,178 [INFO]   feedback (seed 2/5)
2026-04-09 04:54:15,076 [INFO]     RMSE=0.419 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:15,076 [INFO]   feedback (seed 3/5)
2026-04-09 04:54:16,089 [INFO]     RMSE=0.632 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:16,089 [INFO]   feedback (seed 4/5)
2026-04-09 04:54:16,783 [INFO]     RMSE=0.076 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:16,783 [INFO]   feedback (seed 5/5)
2026-04-09 04:54:17,674 [INFO]     RMSE=0.223 DirAcc=1.000 TrajCorr=0.000
2026-04-09 04:54:17,674 [INFO]   instrumental_variable (seed 1/5)
2026-04-09 04:54:18,378 [INFO]     RMSE=0.915 DirAcc=1.000 TrajCorr=0.876
2026-04-09 04:54:18,378 [INFO]   instrumental_variable (seed 2/5)
2026-04-09 04:54:19,191 [INFO]     RMSE=0.831 DirAcc=0.783 TrajCorr=-0.887
2026-04-09 04:54:19,191 [INFO]   instrumental_variable (seed 3/5)
2026-04-09 04:54:19,577 [INFO]     RMSE=0.708 DirAcc=1.000 TrajCorr=-0.649
2026-04-09 04:54:19,577 [INFO]   instrumental_variable (seed 4/5)
2026-04-09 04:54:20,078 [INFO]     RMSE=0.162 DirAcc=1.000 TrajCorr=-0.367
2026-04-09 04:54:20,078 [INFO]   instrumental_variable (seed 5/5)
2026-04-09 04:54:20,480 [INFO]     RMSE=0.919 DirAcc=1.000 TrajCorr=0.917
2026-04-09 04:54:20,480 [INFO]   non_identifiable (seed 1/5)
2026-04-09 04:54:20,966 [INFO]     RMSE=0.268 DirAcc=1.000 TrajCorr=-0.970
2026-04-09 04:54:20,966 [INFO]   non_identifiable (seed 2/5)
2026-04-09 04:54:21,378 [INFO]     RMSE=0.248 DirAcc=1.000 TrajCorr=0.490
2026-04-09 04:54:21,378 [INFO]   non_identifiable (seed 3/5)
2026-04-09 04:54:21,736 [INFO]     RMSE=0.121 DirAcc=1.000 TrajCorr=-0.703
2026-04-09 04:54:21,736 [INFO]   non_identifiable (seed 4/5)
2026-04-09 04:54:22,167 [INFO]     RMSE=0.246 DirAcc=1.000 TrajCorr=0.217
2026-04-09 04:54:22,167 [INFO]   non_identifiable (seed 5/5)
2026-04-09 04:54:22,645 [INFO]     RMSE=0.601 DirAcc=1.000 TrajCorr=-0.409
2026-04-09 04:54:22,645 [INFO] 
--- Temporal Intervention Scenarios ---
2026-04-09 04:54:22,645 [INFO] 
Scenario: Step Intervention
2026-04-09 04:54:22,645 [INFO]   Seed 1/5 (seed=42)
2026-04-09 04:54:23,887 [INFO]     RMSE=0.4248 ATE_err=0.0578 DirAcc=0.578
2026-04-09 04:54:23,887 [INFO]   Seed 2/5 (seed=142)
2026-04-09 04:54:25,199 [INFO]     RMSE=0.3083 ATE_err=0.0097 DirAcc=0.556
2026-04-09 04:54:25,199 [INFO]   Seed 3/5 (seed=242)
2026-04-09 04:54:26,680 [INFO]     RMSE=0.4416 ATE_err=0.0539 DirAcc=0.511
2026-04-09 04:54:26,680 [INFO]   Seed 4/5 (seed=342)
2026-04-09 04:54:27,973 [INFO]     RMSE=0.3954 ATE_err=0.1642 DirAcc=0.511
2026-04-09 04:54:27,973 [INFO]   Seed 5/5 (seed=442)
2026-04-09 04:54:29,087 [INFO]     RMSE=0.6695 ATE_err=0.3789 DirAcc=0.533
2026-04-09 04:54:29,164 [INFO] 
Scenario: Dose-Response Curve
2026-04-09 04:54:29,164 [INFO]   Seed 1/5 (seed=42)
2026-04-09 04:55:42,566 [INFO]     RMSE=0.2550 ATE_err=0.0132 DirAcc=0.578
2026-04-09 04:55:42,566 [INFO]   Seed 2/5 (seed=142)
2026-04-09 04:56:44,799 [INFO]     RMSE=0.1873 ATE_err=0.0254 DirAcc=0.589
2026-04-09 04:56:44,799 [INFO]   Seed 3/5 (seed=242)
2026-04-09 04:57:48,565 [INFO]     RMSE=0.2736 ATE_err=0.0589 DirAcc=0.511
2026-04-09 04:57:48,565 [INFO]   Seed 4/5 (seed=342)
2026-04-09 04:58:52,675 [INFO]     RMSE=0.2455 ATE_err=0.1624 DirAcc=0.511
2026-04-09 04:58:52,675 [INFO]   Seed 5/5 (seed=442)
2026-04-09 04:59:57,885 [INFO]     RMSE=0.4179 ATE_err=0.4102 DirAcc=0.533
2026-04-09 04:59:57,885 [INFO] 
Scenario: Policy Comparison
2026-04-09 04:59:57,885 [INFO]   Seed 1/5 (seed=42)
2026-04-09 05:00:32,476 [INFO]     RMSE=0.0230 ATE_err=0.0102 DirAcc=1.000
2026-04-09 05:00:32,476 [INFO]   Seed 2/5 (seed=142)
2026-04-09 05:01:10,676 [INFO]     RMSE=0.0214 ATE_err=0.0133 DirAcc=0.000
2026-04-09 05:01:10,676 [INFO]   Seed 3/5 (seed=242)
2026-04-09 05:01:48,781 [INFO]     RMSE=0.0536 ATE_err=0.0462 DirAcc=0.000
2026-04-09 05:01:48,781 [INFO]   Seed 4/5 (seed=342)
2026-04-09 05:02:25,480 [INFO]     RMSE=0.0844 ATE_err=0.0438 DirAcc=1.000
2026-04-09 05:02:25,481 [INFO]   Seed 5/5 (seed=442)
2026-04-09 05:03:03,275 [INFO]     RMSE=0.2676 ATE_err=0.1943 DirAcc=1.000
2026-04-09 05:03:03,276 [INFO] 
Scenario: Intervention Timing
2026-04-09 05:03:03,276 [INFO]   Seed 1/5 (seed=42)
2026-04-09 05:03:03,792 [INFO]   Timing t=50: RMSE=0.2253 ATE_err=0.0555
2026-04-09 05:03:04,591 [INFO]   Timing t=100: RMSE=0.2310 ATE_err=0.1003
2026-04-09 05:03:05,464 [INFO]   Timing t=200: RMSE=0.2434 ATE_err=0.1261
2026-04-09 05:03:06,264 [INFO]   Timing t=500: RMSE=0.2513 ATE_err=0.1282
2026-04-09 05:03:06,265 [INFO]     RMSE=0.2377 ATE_err=0.1025 DirAcc=0.517
2026-04-09 05:03:06,265 [INFO]   Seed 2/5 (seed=142)
2026-04-09 05:03:06,864 [INFO]   Timing t=50: RMSE=0.5513 ATE_err=0.3532
2026-04-09 05:03:07,664 [INFO]   Timing t=100: RMSE=0.4284 ATE_err=0.0595
2026-04-09 05:03:08,464 [INFO]   Timing t=200: RMSE=0.7875 ATE_err=0.6660
2026-04-09 05:03:09,278 [INFO]   Timing t=500: RMSE=0.4397 ATE_err=0.0154
2026-04-09 05:03:09,278 [INFO]     RMSE=0.5518 ATE_err=0.2735 DirAcc=0.438
2026-04-09 05:03:09,278 [INFO]   Seed 3/5 (seed=242)
2026-04-09 05:03:09,864 [INFO]   Timing t=50: RMSE=0.3754 ATE_err=0.2597
2026-04-09 05:03:10,566 [INFO]   Timing t=100: RMSE=0.3403 ATE_err=0.2084
2026-04-09 05:03:11,491 [INFO]   Timing t=200: RMSE=0.4809 ATE_err=0.3954
2026-04-09 05:03:12,264 [INFO]   Timing t=500: RMSE=0.2722 ATE_err=0.0156
2026-04-09 05:03:12,265 [INFO]     RMSE=0.3672 ATE_err=0.2198 DirAcc=0.508
2026-04-09 05:03:12,265 [INFO]   Seed 4/5 (seed=342)
2026-04-09 05:03:12,864 [INFO]   Timing t=50: RMSE=0.4183 ATE_err=0.3306
2026-04-09 05:03:13,664 [INFO]   Timing t=100: RMSE=0.5142 ATE_err=0.4501
2026-04-09 05:03:14,483 [INFO]   Timing t=200: RMSE=0.3095 ATE_err=0.1712
2026-04-09 05:03:15,264 [INFO]   Timing t=500: RMSE=0.2533 ATE_err=0.0399
2026-04-09 05:03:15,265 [INFO]     RMSE=0.3738 ATE_err=0.2480 DirAcc=0.575
2026-04-09 05:03:15,265 [INFO]   Seed 5/5 (seed=442)
2026-04-09 05:03:15,864 [INFO]   Timing t=50: RMSE=0.4634 ATE_err=0.1056
2026-04-09 05:03:16,579 [INFO]   Timing t=100: RMSE=0.4908 ATE_err=0.1791
2026-04-09 05:03:17,264 [INFO]   Timing t=200: RMSE=0.7385 ATE_err=0.5837
2026-04-09 05:03:17,987 [INFO]   Timing t=500: RMSE=0.6197 ATE_err=0.4199
2026-04-09 05:03:17,988 [INFO]     RMSE=0.5781 ATE_err=0.3221 DirAcc=0.554

================================================================================
INTERVENTION BENCHMARK — Causal Structure Tests
================================================================================
Structure                     RMSE   DirAcc   TrajCorr  NullDet   TrueMean   PredMean
--------------------------------------------------------------------------------
confounded                   0.682    0.000      0.000    0.000      0.000      0.681
mediated                     0.892    0.467     -0.108      N/A      0.551     -0.090
time_varying_confounded      0.410    1.000      0.000      N/A      0.591      0.183
feedback                     0.313    1.000      0.000      N/A      0.515      0.544
instrumental_variable        0.707    0.957     -0.022      N/A      0.877      0.234
non_identifiable             0.297    1.000     -0.275      N/A      0.600      0.856

================================================================================
INTERVENTION BENCHMARK — Temporal Intervention Scenarios
================================================================================
Scenario                   Seeds    Traj RMSE    ATE Error    Dir Acc
--------------------------------------------------------------------------------
  step                         5 0.4479+/-0.120 0.1329+/-0.133 0.538+/-0.03
  dose_response                5 0.2759+/-0.077 0.1340+/-0.148 0.544+/-0.03
  policy                       5 0.0900+/-0.092 0.0616+/-0.068 0.600+/-0.49
  timing                       5 0.4217+/-0.127 0.2332+/-0.073 0.518+/-0.05

2026-04-09 05:03:17,991 [INFO] Results saved to outputs/benchmarks/intervention_results.json