mrshravan commited on
Commit
b5b1bb1
·
verified ·
1 Parent(s): eb02732

Upload benchmarks/intervention.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. benchmarks/intervention.log +159 -0
benchmarks/intervention.log ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-04-09 04:54:01,214 [INFO] Intervention Benchmark — testing causal effect validity
2
+ 2026-04-09 04:54:01,214 [INFO]
3
+ --- Causal Structure Tests ---
4
+ 2026-04-09 04:54:01,336 [INFO] Loading faiss with AVX512 support.
5
+ 2026-04-09 04:54:01,399 [INFO] Successfully loaded faiss with AVX512 support.
6
+ 2026-04-09 04:54:02,236 [INFO] confounded (seed 1/5)
7
+ 2026-04-09 04:54:03,411 [INFO] RMSE=0.893 DirAcc=0.000 TrajCorr=0.000
8
+ 2026-04-09 04:54:03,411 [INFO] confounded (seed 2/5)
9
+ 2026-04-09 04:54:03,993 [INFO] RMSE=1.043 DirAcc=0.000 TrajCorr=0.000
10
+ 2026-04-09 04:54:03,994 [INFO] confounded (seed 3/5)
11
+ 2026-04-09 04:54:04,675 [INFO] RMSE=0.333 DirAcc=0.000 TrajCorr=0.000
12
+ 2026-04-09 04:54:04,675 [INFO] confounded (seed 4/5)
13
+ 2026-04-09 04:54:05,365 [INFO] RMSE=0.286 DirAcc=0.000 TrajCorr=0.000
14
+ 2026-04-09 04:54:05,365 [INFO] confounded (seed 5/5)
15
+ 2026-04-09 04:54:06,165 [INFO] RMSE=0.853 DirAcc=0.000 TrajCorr=0.000
16
+ 2026-04-09 04:54:06,166 [INFO] mediated (seed 1/5)
17
+ 2026-04-09 04:54:06,978 [INFO] RMSE=0.846 DirAcc=0.833 TrajCorr=0.263
18
+ 2026-04-09 04:54:06,978 [INFO] mediated (seed 2/5)
19
+ 2026-04-09 04:54:07,669 [INFO] RMSE=0.583 DirAcc=0.283 TrajCorr=-0.392
20
+ 2026-04-09 04:54:07,669 [INFO] mediated (seed 3/5)
21
+ 2026-04-09 04:54:08,584 [INFO] RMSE=1.298 DirAcc=0.683 TrajCorr=-0.318
22
+ 2026-04-09 04:54:08,584 [INFO] mediated (seed 4/5)
23
+ 2026-04-09 04:54:09,269 [INFO] RMSE=0.713 DirAcc=0.300 TrajCorr=-0.579
24
+ 2026-04-09 04:54:09,269 [INFO] mediated (seed 5/5)
25
+ 2026-04-09 04:54:10,274 [INFO] RMSE=1.021 DirAcc=0.233 TrajCorr=0.488
26
+ 2026-04-09 04:54:10,274 [INFO] time_varying_confounded (seed 1/5)
27
+ 2026-04-09 04:54:10,666 [INFO] RMSE=0.235 DirAcc=1.000 TrajCorr=0.000
28
+ 2026-04-09 04:54:10,666 [INFO] time_varying_confounded (seed 2/5)
29
+ 2026-04-09 04:54:11,291 [INFO] RMSE=0.506 DirAcc=1.000 TrajCorr=0.000
30
+ 2026-04-09 04:54:11,291 [INFO] time_varying_confounded (seed 3/5)
31
+ 2026-04-09 04:54:12,081 [INFO] RMSE=0.180 DirAcc=1.000 TrajCorr=0.000
32
+ 2026-04-09 04:54:12,081 [INFO] time_varying_confounded (seed 4/5)
33
+ 2026-04-09 04:54:12,865 [INFO] RMSE=0.448 DirAcc=1.000 TrajCorr=0.000
34
+ 2026-04-09 04:54:12,865 [INFO] time_varying_confounded (seed 5/5)
35
+ 2026-04-09 04:54:13,484 [INFO] RMSE=0.680 DirAcc=1.000 TrajCorr=0.000
36
+ 2026-04-09 04:54:13,484 [INFO] feedback (seed 1/5)
37
+ 2026-04-09 04:54:14,178 [INFO] RMSE=0.216 DirAcc=1.000 TrajCorr=0.000
38
+ 2026-04-09 04:54:14,178 [INFO] feedback (seed 2/5)
39
+ 2026-04-09 04:54:15,076 [INFO] RMSE=0.419 DirAcc=1.000 TrajCorr=0.000
40
+ 2026-04-09 04:54:15,076 [INFO] feedback (seed 3/5)
41
+ 2026-04-09 04:54:16,089 [INFO] RMSE=0.632 DirAcc=1.000 TrajCorr=0.000
42
+ 2026-04-09 04:54:16,089 [INFO] feedback (seed 4/5)
43
+ 2026-04-09 04:54:16,783 [INFO] RMSE=0.076 DirAcc=1.000 TrajCorr=0.000
44
+ 2026-04-09 04:54:16,783 [INFO] feedback (seed 5/5)
45
+ 2026-04-09 04:54:17,674 [INFO] RMSE=0.223 DirAcc=1.000 TrajCorr=0.000
46
+ 2026-04-09 04:54:17,674 [INFO] instrumental_variable (seed 1/5)
47
+ 2026-04-09 04:54:18,378 [INFO] RMSE=0.915 DirAcc=1.000 TrajCorr=0.876
48
+ 2026-04-09 04:54:18,378 [INFO] instrumental_variable (seed 2/5)
49
+ 2026-04-09 04:54:19,191 [INFO] RMSE=0.831 DirAcc=0.783 TrajCorr=-0.887
50
+ 2026-04-09 04:54:19,191 [INFO] instrumental_variable (seed 3/5)
51
+ 2026-04-09 04:54:19,577 [INFO] RMSE=0.708 DirAcc=1.000 TrajCorr=-0.649
52
+ 2026-04-09 04:54:19,577 [INFO] instrumental_variable (seed 4/5)
53
+ 2026-04-09 04:54:20,078 [INFO] RMSE=0.162 DirAcc=1.000 TrajCorr=-0.367
54
+ 2026-04-09 04:54:20,078 [INFO] instrumental_variable (seed 5/5)
55
+ 2026-04-09 04:54:20,480 [INFO] RMSE=0.919 DirAcc=1.000 TrajCorr=0.917
56
+ 2026-04-09 04:54:20,480 [INFO] non_identifiable (seed 1/5)
57
+ 2026-04-09 04:54:20,966 [INFO] RMSE=0.268 DirAcc=1.000 TrajCorr=-0.970
58
+ 2026-04-09 04:54:20,966 [INFO] non_identifiable (seed 2/5)
59
+ 2026-04-09 04:54:21,378 [INFO] RMSE=0.248 DirAcc=1.000 TrajCorr=0.490
60
+ 2026-04-09 04:54:21,378 [INFO] non_identifiable (seed 3/5)
61
+ 2026-04-09 04:54:21,736 [INFO] RMSE=0.121 DirAcc=1.000 TrajCorr=-0.703
62
+ 2026-04-09 04:54:21,736 [INFO] non_identifiable (seed 4/5)
63
+ 2026-04-09 04:54:22,167 [INFO] RMSE=0.246 DirAcc=1.000 TrajCorr=0.217
64
+ 2026-04-09 04:54:22,167 [INFO] non_identifiable (seed 5/5)
65
+ 2026-04-09 04:54:22,645 [INFO] RMSE=0.601 DirAcc=1.000 TrajCorr=-0.409
66
+ 2026-04-09 04:54:22,645 [INFO]
67
+ --- Temporal Intervention Scenarios ---
68
+ 2026-04-09 04:54:22,645 [INFO]
69
+ Scenario: Step Intervention
70
+ 2026-04-09 04:54:22,645 [INFO] Seed 1/5 (seed=42)
71
+ 2026-04-09 04:54:23,887 [INFO] RMSE=0.4248 ATE_err=0.0578 DirAcc=0.578
72
+ 2026-04-09 04:54:23,887 [INFO] Seed 2/5 (seed=142)
73
+ 2026-04-09 04:54:25,199 [INFO] RMSE=0.3083 ATE_err=0.0097 DirAcc=0.556
74
+ 2026-04-09 04:54:25,199 [INFO] Seed 3/5 (seed=242)
75
+ 2026-04-09 04:54:26,680 [INFO] RMSE=0.4416 ATE_err=0.0539 DirAcc=0.511
76
+ 2026-04-09 04:54:26,680 [INFO] Seed 4/5 (seed=342)
77
+ 2026-04-09 04:54:27,973 [INFO] RMSE=0.3954 ATE_err=0.1642 DirAcc=0.511
78
+ 2026-04-09 04:54:27,973 [INFO] Seed 5/5 (seed=442)
79
+ 2026-04-09 04:54:29,087 [INFO] RMSE=0.6695 ATE_err=0.3789 DirAcc=0.533
80
+ 2026-04-09 04:54:29,164 [INFO]
81
+ Scenario: Dose-Response Curve
82
+ 2026-04-09 04:54:29,164 [INFO] Seed 1/5 (seed=42)
83
+ 2026-04-09 04:55:42,566 [INFO] RMSE=0.2550 ATE_err=0.0132 DirAcc=0.578
84
+ 2026-04-09 04:55:42,566 [INFO] Seed 2/5 (seed=142)
85
+ 2026-04-09 04:56:44,799 [INFO] RMSE=0.1873 ATE_err=0.0254 DirAcc=0.589
86
+ 2026-04-09 04:56:44,799 [INFO] Seed 3/5 (seed=242)
87
+ 2026-04-09 04:57:48,565 [INFO] RMSE=0.2736 ATE_err=0.0589 DirAcc=0.511
88
+ 2026-04-09 04:57:48,565 [INFO] Seed 4/5 (seed=342)
89
+ 2026-04-09 04:58:52,675 [INFO] RMSE=0.2455 ATE_err=0.1624 DirAcc=0.511
90
+ 2026-04-09 04:58:52,675 [INFO] Seed 5/5 (seed=442)
91
+ 2026-04-09 04:59:57,885 [INFO] RMSE=0.4179 ATE_err=0.4102 DirAcc=0.533
92
+ 2026-04-09 04:59:57,885 [INFO]
93
+ Scenario: Policy Comparison
94
+ 2026-04-09 04:59:57,885 [INFO] Seed 1/5 (seed=42)
95
+ 2026-04-09 05:00:32,476 [INFO] RMSE=0.0230 ATE_err=0.0102 DirAcc=1.000
96
+ 2026-04-09 05:00:32,476 [INFO] Seed 2/5 (seed=142)
97
+ 2026-04-09 05:01:10,676 [INFO] RMSE=0.0214 ATE_err=0.0133 DirAcc=0.000
98
+ 2026-04-09 05:01:10,676 [INFO] Seed 3/5 (seed=242)
99
+ 2026-04-09 05:01:48,781 [INFO] RMSE=0.0536 ATE_err=0.0462 DirAcc=0.000
100
+ 2026-04-09 05:01:48,781 [INFO] Seed 4/5 (seed=342)
101
+ 2026-04-09 05:02:25,480 [INFO] RMSE=0.0844 ATE_err=0.0438 DirAcc=1.000
102
+ 2026-04-09 05:02:25,481 [INFO] Seed 5/5 (seed=442)
103
+ 2026-04-09 05:03:03,275 [INFO] RMSE=0.2676 ATE_err=0.1943 DirAcc=1.000
104
+ 2026-04-09 05:03:03,276 [INFO]
105
+ Scenario: Intervention Timing
106
+ 2026-04-09 05:03:03,276 [INFO] Seed 1/5 (seed=42)
107
+ 2026-04-09 05:03:03,792 [INFO] Timing t=50: RMSE=0.2253 ATE_err=0.0555
108
+ 2026-04-09 05:03:04,591 [INFO] Timing t=100: RMSE=0.2310 ATE_err=0.1003
109
+ 2026-04-09 05:03:05,464 [INFO] Timing t=200: RMSE=0.2434 ATE_err=0.1261
110
+ 2026-04-09 05:03:06,264 [INFO] Timing t=500: RMSE=0.2513 ATE_err=0.1282
111
+ 2026-04-09 05:03:06,265 [INFO] RMSE=0.2377 ATE_err=0.1025 DirAcc=0.517
112
+ 2026-04-09 05:03:06,265 [INFO] Seed 2/5 (seed=142)
113
+ 2026-04-09 05:03:06,864 [INFO] Timing t=50: RMSE=0.5513 ATE_err=0.3532
114
+ 2026-04-09 05:03:07,664 [INFO] Timing t=100: RMSE=0.4284 ATE_err=0.0595
115
+ 2026-04-09 05:03:08,464 [INFO] Timing t=200: RMSE=0.7875 ATE_err=0.6660
116
+ 2026-04-09 05:03:09,278 [INFO] Timing t=500: RMSE=0.4397 ATE_err=0.0154
117
+ 2026-04-09 05:03:09,278 [INFO] RMSE=0.5518 ATE_err=0.2735 DirAcc=0.438
118
+ 2026-04-09 05:03:09,278 [INFO] Seed 3/5 (seed=242)
119
+ 2026-04-09 05:03:09,864 [INFO] Timing t=50: RMSE=0.3754 ATE_err=0.2597
120
+ 2026-04-09 05:03:10,566 [INFO] Timing t=100: RMSE=0.3403 ATE_err=0.2084
121
+ 2026-04-09 05:03:11,491 [INFO] Timing t=200: RMSE=0.4809 ATE_err=0.3954
122
+ 2026-04-09 05:03:12,264 [INFO] Timing t=500: RMSE=0.2722 ATE_err=0.0156
123
+ 2026-04-09 05:03:12,265 [INFO] RMSE=0.3672 ATE_err=0.2198 DirAcc=0.508
124
+ 2026-04-09 05:03:12,265 [INFO] Seed 4/5 (seed=342)
125
+ 2026-04-09 05:03:12,864 [INFO] Timing t=50: RMSE=0.4183 ATE_err=0.3306
126
+ 2026-04-09 05:03:13,664 [INFO] Timing t=100: RMSE=0.5142 ATE_err=0.4501
127
+ 2026-04-09 05:03:14,483 [INFO] Timing t=200: RMSE=0.3095 ATE_err=0.1712
128
+ 2026-04-09 05:03:15,264 [INFO] Timing t=500: RMSE=0.2533 ATE_err=0.0399
129
+ 2026-04-09 05:03:15,265 [INFO] RMSE=0.3738 ATE_err=0.2480 DirAcc=0.575
130
+ 2026-04-09 05:03:15,265 [INFO] Seed 5/5 (seed=442)
131
+ 2026-04-09 05:03:15,864 [INFO] Timing t=50: RMSE=0.4634 ATE_err=0.1056
132
+ 2026-04-09 05:03:16,579 [INFO] Timing t=100: RMSE=0.4908 ATE_err=0.1791
133
+ 2026-04-09 05:03:17,264 [INFO] Timing t=200: RMSE=0.7385 ATE_err=0.5837
134
+ 2026-04-09 05:03:17,987 [INFO] Timing t=500: RMSE=0.6197 ATE_err=0.4199
135
+ 2026-04-09 05:03:17,988 [INFO] RMSE=0.5781 ATE_err=0.3221 DirAcc=0.554
136
+
137
+ ================================================================================
138
+ INTERVENTION BENCHMARK — Causal Structure Tests
139
+ ================================================================================
140
+ Structure RMSE DirAcc TrajCorr NullDet TrueMean PredMean
141
+ --------------------------------------------------------------------------------
142
+ confounded 0.682 0.000 0.000 0.000 0.000 0.681
143
+ mediated 0.892 0.467 -0.108 N/A 0.551 -0.090
144
+ time_varying_confounded 0.410 1.000 0.000 N/A 0.591 0.183
145
+ feedback 0.313 1.000 0.000 N/A 0.515 0.544
146
+ instrumental_variable 0.707 0.957 -0.022 N/A 0.877 0.234
147
+ non_identifiable 0.297 1.000 -0.275 N/A 0.600 0.856
148
+
149
+ ================================================================================
150
+ INTERVENTION BENCHMARK — Temporal Intervention Scenarios
151
+ ================================================================================
152
+ Scenario Seeds Traj RMSE ATE Error Dir Acc
153
+ --------------------------------------------------------------------------------
154
+ step 5 0.4479+/-0.120 0.1329+/-0.133 0.538+/-0.03
155
+ dose_response 5 0.2759+/-0.077 0.1340+/-0.148 0.544+/-0.03
156
+ policy 5 0.0900+/-0.092 0.0616+/-0.068 0.600+/-0.49
157
+ timing 5 0.4217+/-0.127 0.2332+/-0.073 0.518+/-0.05
158
+
159
+ 2026-04-09 05:03:17,991 [INFO] Results saved to outputs/benchmarks/intervention_results.json