mrshravan commited on
Commit
7b1958c
Β·
verified Β·
1 Parent(s): bfbe51c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +112 -50
README.md CHANGED
@@ -6,78 +6,140 @@ tags:
6
  - temporal-causal
7
  - causal-judgment
8
  license: other
9
- license_name: tcpfn-nc-1.0
10
  license_link: LICENSE
11
  ---
12
 
13
- # TCPFN β€” Temporal Causal Prior-Fitted Networks
14
 
15
- A causal reasoning foundation model β€” predicts effects, judges trustworthiness, operates zero-shot.
16
 
17
- ## Contributions
18
- 1. **Temporal Token Design** -- to our knowledge, first PFN for temporal panel data
19
- 2. **Causal Judgment Head** -- learned reliability signals (null detection, regime classification, identifiability)
20
- 3. **Causal Regime Prior** -- direct, confounded, mediated, feedback training structures
21
- 4. **Self-Calibration** -- auto-detects natural experiments in sensor data
22
- 5. **End-to-End System** -- discovery + estimation + judgment + RCA, all zero-shot
23
 
24
- ## Capabilities
25
- - **Causal Discovery**: pairwise interventional CATE with judgment-aware edge scoring, natural experiment detection, continuous treatment, multi-lag estimation, asymmetry penalty
26
- - **Effect Estimation**: temporal CATE trajectories with distributional output
27
- - **Causal Judgment**: null-effect detection (NullF1 0.94, NullAUROC 0.99, NullBrier 0.04, NullSep 0.86), regime classification (RegimeAcc 0.68, random 0.25). These are learned heuristics, not formal guarantees.
28
- - **Root Cause Analysis**: 8-method ensemble (AERCA, ESD, ProRCA, GCM noise, ICC, Shapley, counterfactual, chain tracing)
29
 
30
- ## Model
31
- - `models/temporal/final.pt` β€” One model for everything (v2.1, 200K steps, curriculum-trained)
32
- - Causal Judgment Head: 5 trained outputs (null prob, confounding, identifiability, mediation, regime)
33
- - Anti-hallucination: independent regime in training prior (FPR dropped from 1.0 to ~0.03)
34
- - Training: mixed_prior (40% CausalTimePrior + 30% base + 30% CausalFM)
35
- - Curriculum: Phase 1 CATE-only β†’ Phase 2 +Null β†’ Phase 3 Full judgment
36
- - Hardware: RTX 5090, ~4.1 hours, 13.9 steps/s
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## Usage
39
 
40
  ```python
41
  from tcpfn import TemporalCausalAnalyzer
42
 
43
- analyzer = TemporalCausalAnalyzer(
44
- temporal_model="models/temporal/final.pt",
45
- )
 
 
 
 
 
46
 
47
- # Causal discovery + effect estimation
48
  report = analyzer.run("sensor_data.csv")
49
- print(report.edges) # causal graph with edge strengths and lags
50
- print(report.summary()) # human-readable summary
51
 
52
- # Root cause analysis for a specific event
53
  result = analyzer.explain_event(
54
  data_path="sensor_data.csv",
55
  target_var="temperature_sensor",
56
  event_time="2025-11-15 14:15",
57
  )
58
- print(result.summary()) # ranked root causes + causal chains
59
  ```
60
 
61
- ## Training Metrics (mean over steps 150K-200K)
62
- - EffectLoss: ~2.9 | JudgmentLoss: ~2.8
63
- - NullF1: 0.94 [0.83-1.00] | NullAUROC: 0.99 [0.83-1.00]
64
- - NullBrier: 0.04 [0.00-0.19] | NullSep: 0.86 [0.51-0.99]
65
- - RegimeAcc: 0.68 [0.40-0.90] | RegimeMacroF1: 0.48 [0.32-0.90]
66
-
67
- ## Discovery Benchmarks (14 datasets, 6 domains, all zero-shot)
68
- - Sachs (11 proteins, biological): F1 0.412, AUROC 0.725 (beats Granger 0.291/0.621)
69
- - Causal Rivers (environmental): F1 0.319, AUROC 0.955
70
- - Tennessee Eastman (52 vars, industrial): F1 0.314, AUROC 0.904
71
- - SWaT (51 vars, water treatment): F1 0.265, AUROC 0.859
72
- - CauseMe NVAR-5 (nonlinear): F1 0.571 (beats Granger 0.353)
73
- - CauseMe NVAR-10 (nonlinear): F1 0.439 (beats Granger 0.415)
74
- - Highest F1@default on 6 of 14 datasets
75
- - Hallucination FPR: 0.02-0.08 (was 1.0 in v2.0)
76
-
77
- ## Limitations
78
- - CATE estimation quality is weak (PEHE 0.92) due to per-group Z-standardization
79
- - Global standardization fix implemented, pending v3 retrain
80
- - Regime classification noisy (0.68 accuracy, eval variance)
81
 
82
  ## Paper
83
- Stalupula et al., "Temporal Causal Prior-Fitted Networks for Panel Data with Learned Reliability Signals"
 
6
  - temporal-causal
7
  - causal-judgment
8
  license: other
9
+ license_name: proprietary
10
  license_link: LICENSE
11
  ---
12
 
13
+ # TCPFN β€” Temporal Causal Prior-Data Fitted Networks
14
 
15
+ A family of causal reasoning foundation models β€” predict effects, judge trustworthiness, operate zero-shot. Three checkpoints share one architecture (12-layer transformer, `embed_dim=512`, 8 heads, HL-Gaussian output head) and differ only in training-data distribution and curriculum.
16
 
17
+ ## Pick by task
 
 
 
 
 
18
 
19
+ | Task | Best checkpoint | Path |
20
+ |------|-----------------|------|
21
+ | General causal discovery (biology, cross-sectional, short-lag) | **v2.1** | `models/temporal/final.pt` |
22
+ | Industrial / long-range discovery (12+ h lags, digester→paper machine etc.) | **v2.2** | `models/v2.2/final.pt` |
23
+ | Effect estimation (CATE / PEHE) | **v3** | `models/v3/final.pt` |
24
 
25
+ All three are zero-shot. Pick the one matching your task β€” specialisation beats generalist on every task we've measured.
26
+
27
+ ## Shared contributions
28
+ 1. **Temporal Token Design** β€” first PFN for temporal panel data.
29
+ 2. **Causal Judgment Head** β€” learned reliability signals (null detection, regime classification, identifiability, mediation, confounding).
30
+ 3. **Causal Regime Prior** β€” direct / confounded / mediated / feedback training structures.
31
+ 4. **Self-Calibration** β€” auto-detects natural experiments in sensor data.
32
+ 5. **End-to-End System** β€” discovery + estimation + judgment + RCA from one forward pass.
33
+
34
+ ## Shared capabilities
35
+ - **Causal Discovery** β€” pairwise interventional CATE with judgment-aware edge scoring, natural-experiment detection, continuous treatment, multi-lag estimation, asymmetry penalty.
36
+ - **Effect Estimation** β€” temporal CATE trajectories with distributional output.
37
+ - **Causal Judgment** β€” null-effect detection, regime classification (learned heuristics, not formal guarantees).
38
+ - **Root Cause Analysis** β€” 8-method ensemble (AERCA, ESD, ProRCA, GCM noise, ICC, Shapley, counterfactual, chain tracing).
39
+
40
+ ---
41
+
42
+ ## v2.1 β€” default discovery model
43
+ - 200K steps, curriculum-trained (Phase 1 CATE-only β†’ Phase 2 +Null β†’ Phase 3 Full).
44
+ - Mixed prior: 40% CausalTimePrior + 30% base + 30% CausalFM.
45
+ - Training window: `max_T_pre=50, max_T_post=30`.
46
+ - Hardware: RTX 5090, ~4.1 h, 13.9 steps/s.
47
+
48
+ ### Discovery benchmarks (14 datasets, 6 domains, zero-shot)
49
+ - Sachs (11 proteins, biological): F1 0.412, **AUROC 0.725** (vs Granger 0.291 / 0.621) β€” **champion**.
50
+ - Causal Rivers (environmental): F1 0.319, AUROC 0.955.
51
+ - Tennessee Eastman (52 vars, industrial): F1 0.314, AUROC 0.904.
52
+ - SWaT (51 vars, water treatment): F1 0.265, AUROC 0.859.
53
+ - CauseMe NVAR-5 / NVAR-10: F1 0.571 / 0.439.
54
+ - Highest default-threshold F1 on 6 of 14 datasets.
55
+ - Hallucination FPR: 0.02–0.08 (down from 1.0 in v2.0).
56
+
57
+ ### Training metrics (mean over steps 150K–200K)
58
+ - EffectLoss ~2.9 | JudgmentLoss ~2.8
59
+ - NullF1 0.94 | NullAUROC 0.99 | NullBrier 0.04 | NullSep 0.86
60
+ - RegimeAcc 0.68 | RegimeMacroF1 0.48
61
+
62
+ ### Limitations
63
+ - CATE estimation weak (PEHE 0.92) due to per-group Z-standardisation β€” **use v3 for estimation**.
64
+
65
+ ---
66
+
67
+ ## v2.2 β€” industrial / long-range specialist
68
+ Built for 12+ hour causal lags in industrial control loops (digester β†’ paper machine, reactor β†’ downstream controller). Training window extended 4Γ— and curriculum rebalanced to include null-effect batches in Phase 2.
69
+
70
+ - 200K steps, BF16 mixed precision, `head_lr_scale=0.1` (decouples output-head learning from backbone to prevent late-stage drift collapse).
71
+ - Training window: `max_T_pre=200, max_T_post=100, max_horizon=500` β€” supports lags up to ~16 h at 2-min sampling.
72
+ - Manual NaN-skip with observability (saves first NaN-producing batch, aborts if skip rate β‰₯ threshold).
73
+ - Hardware: RTX 5090, ~14.9 h.
74
+
75
+ ### Discovery benchmarks (default threshold 0.5)
76
+ Strong on industrial / multivariate temporal data β€” **use this** when lags exceed ~1 h or when data is genuinely time-series (not stitched cross-sectional).
77
+
78
+ | Dataset | Default F1 | Best F1 | AUROC |
79
+ |---------|-----------|---------|-------|
80
+ | Tennessee Eastman | **0.512** | 0.545 | **0.972** |
81
+ | SWaT | **0.463** | 0.552 | **0.945** |
82
+ | CauseMe VAR-5 | 0.769 | 0.800 | 0.960 |
83
+ | CauseMe NVAR-5 | 0.800 | 0.800 | 0.863 |
84
+ | CauseMe VAR-10 | 0.488 | 0.643 | 0.812 |
85
+ | CauseMe NVAR-10 | 0.634 | 0.634 | 0.759 |
86
+ | CauseMe Lorenz96-10 | 0.484 | 0.638 | 0.699 |
87
+ | Sachs | 0.174 | 0.308 | 0.565 |
88
+
89
+ Granger and PCMCI collapse on industrial data β€” they over-predict (1897 edges on TE vs 38 true), giving F1 ~0.04. TCPFN v2.2 is the only method with usable precision + recall together.
90
+
91
+ ### Estimation benchmarks
92
+ - Overall PEHE 0.917 | ATE MAE 0.504 | trajectory correlation β‰ˆ 0 β€” **use v3 for CATE**.
93
+
94
+ ### Limitations
95
+ - **Sachs regressed** vs v2.1 (AUROC 0.565 vs 0.725). Use v2.1 for cross-sectional biological graphs.
96
+ - Estimation degraded β€” trades short-range precision for long-range reach (see scar-tissue entry L-33 in project docs).
97
+
98
+ ---
99
+
100
+ ## v3 β€” estimation champion (experimental)
101
+ Tag: `3.0.0-exp-global-std`. Global standardisation fix for the per-group Z-score bias that caps v2.1/v2.2 estimation quality.
102
+
103
+ - 200K steps.
104
+ - **PEHE 0.72** (vs v2.1 0.92 and v2.2 0.92) β€” best of the three on CATE estimation.
105
+ - Discovery regressed slightly as trade-off; not yet benchmarked across all 14 discovery datasets β€” **use v2.1 or v2.2 for discovery**.
106
+
107
+ ### Limitations
108
+ - Experimental tag β€” standardisation change not yet battle-tested beyond estimation.
109
+ - Full benchmark matrix still pending.
110
+
111
+ ---
112
 
113
  ## Usage
114
 
115
  ```python
116
  from tcpfn import TemporalCausalAnalyzer
117
 
118
+ # Discovery on general data (biology, cross-sectional)
119
+ analyzer = TemporalCausalAnalyzer(temporal_model="models/temporal/final.pt")
120
+
121
+ # Industrial / long-range discovery (lags in hours)
122
+ analyzer = TemporalCausalAnalyzer(temporal_model="models/v2.2/final.pt")
123
+
124
+ # Effect estimation (CATE trajectories, PEHE-sensitive work)
125
+ analyzer = TemporalCausalAnalyzer(temporal_model="models/v3/final.pt")
126
 
 
127
  report = analyzer.run("sensor_data.csv")
128
+ print(report.edges) # causal graph with edge strengths and lags
129
+ print(report.summary()) # human-readable summary
130
 
 
131
  result = analyzer.explain_event(
132
  data_path="sensor_data.csv",
133
  target_var="temperature_sensor",
134
  event_time="2025-11-15 14:15",
135
  )
136
+ print(result.summary()) # ranked root causes + causal chains
137
  ```
138
 
139
+ ## Cross-cutting limitations
140
+ - Regime classification is noisy (~0.68 accuracy, high eval variance). Judgment heads are **learned heuristics**, not formal guarantees.
141
+ - Low-dim cross-sectional data stitched into pseudo-timeseries is out-of-distribution for v2.2 and v3; use v2.1.
142
+ - v3 has not yet been run on the full discovery benchmark suite.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
 
144
  ## Paper
145
+ Stalupula et al., "Temporal Causal Prior-Data Fitted Networks for Panel Data with Learned Reliability Signals"