ProfitOpsAI
/

tcpfn

@@ -6,78 +6,140 @@ tags:
   - temporal-causal
   - causal-judgment
 license: other
-license_name: tcpfn-nc-1.0
 license_link: LICENSE
 ---
-# TCPFN — Temporal Causal Prior-Fitted Networks
-A causal reasoning foundation model — predicts effects, judges trustworthiness, operates zero-shot.
-## Contributions
-1. **Temporal Token Design** -- to our knowledge, first PFN for temporal panel data
-2. **Causal Judgment Head** -- learned reliability signals (null detection, regime classification, identifiability)
-3. **Causal Regime Prior** -- direct, confounded, mediated, feedback training structures
-4. **Self-Calibration** -- auto-detects natural experiments in sensor data
-5. **End-to-End System** -- discovery + estimation + judgment + RCA, all zero-shot
-## Capabilities
-- **Causal Discovery**: pairwise interventional CATE with judgment-aware edge scoring, natural experiment detection, continuous treatment, multi-lag estimation, asymmetry penalty
-- **Effect Estimation**: temporal CATE trajectories with distributional output
-- **Causal Judgment**: null-effect detection (NullF1 0.94, NullAUROC 0.99, NullBrier 0.04, NullSep 0.86), regime classification (RegimeAcc 0.68, random 0.25). These are learned heuristics, not formal guarantees.
-- **Root Cause Analysis**: 8-method ensemble (AERCA, ESD, ProRCA, GCM noise, ICC, Shapley, counterfactual, chain tracing)
-## Model
-- `models/temporal/final.pt` — One model for everything (v2.1, 200K steps, curriculum-trained)
-- Causal Judgment Head: 5 trained outputs (null prob, confounding, identifiability, mediation, regime)
-- Anti-hallucination: independent regime in training prior (FPR dropped from 1.0 to ~0.03)
-- Training: mixed_prior (40% CausalTimePrior + 30% base + 30% CausalFM)
-- Curriculum: Phase 1 CATE-only → Phase 2 +Null → Phase 3 Full judgment
-- Hardware: RTX 5090, ~4.1 hours, 13.9 steps/s
 ## Usage
 ```python
 from tcpfn import TemporalCausalAnalyzer
-analyzer = TemporalCausalAnalyzer(
-    temporal_model="models/temporal/final.pt",
-)
-# Causal discovery + effect estimation
 report = analyzer.run("sensor_data.csv")
-print(report.edges)          # causal graph with edge strengths and lags
-print(report.summary())      # human-readable summary
-# Root cause analysis for a specific event
 result = analyzer.explain_event(
     data_path="sensor_data.csv",
     target_var="temperature_sensor",
     event_time="2025-11-15 14:15",
 )
-print(result.summary())      # ranked root causes + causal chains
 ```
-## Training Metrics (mean over steps 150K-200K)
-- EffectLoss: ~2.9 | JudgmentLoss: ~2.8
-- NullF1: 0.94 [0.83-1.00] | NullAUROC: 0.99 [0.83-1.00]
-- NullBrier: 0.04 [0.00-0.19] | NullSep: 0.86 [0.51-0.99]
-- RegimeAcc: 0.68 [0.40-0.90] | RegimeMacroF1: 0.48 [0.32-0.90]
-## Discovery Benchmarks (14 datasets, 6 domains, all zero-shot)
-- Sachs (11 proteins, biological): F1 0.412, AUROC 0.725 (beats Granger 0.291/0.621)
-- Causal Rivers (environmental): F1 0.319, AUROC 0.955
-- Tennessee Eastman (52 vars, industrial): F1 0.314, AUROC 0.904
-- SWaT (51 vars, water treatment): F1 0.265, AUROC 0.859
-- CauseMe NVAR-5 (nonlinear): F1 0.571 (beats Granger 0.353)
-- CauseMe NVAR-10 (nonlinear): F1 0.439 (beats Granger 0.415)
-- Highest F1@default on 6 of 14 datasets
-- Hallucination FPR: 0.02-0.08 (was 1.0 in v2.0)
-## Limitations
-- CATE estimation quality is weak (PEHE 0.92) due to per-group Z-standardization
-- Global standardization fix implemented, pending v3 retrain
-- Regime classification noisy (0.68 accuracy, eval variance)
 ## Paper
-Stalupula et al., "Temporal Causal Prior-Fitted Networks for Panel Data with Learned Reliability Signals"

   - temporal-causal
   - causal-judgment
 license: other
+license_name: proprietary
 license_link: LICENSE
 ---
+# TCPFN — Temporal Causal Prior-Data Fitted Networks
+A family of causal reasoning foundation models — predict effects, judge trustworthiness, operate zero-shot. Three checkpoints share one architecture (12-layer transformer, `embed_dim=512`, 8 heads, HL-Gaussian output head) and differ only in training-data distribution and curriculum.
+## Pick by task
+| Task | Best checkpoint | Path |
+|------|-----------------|------|
+| General causal discovery (biology, cross-sectional, short-lag) | **v2.1** | `models/temporal/final.pt` |
+| Industrial / long-range discovery (12+ h lags, digester→paper machine etc.) | **v2.2** | `models/v2.2/final.pt` |
+| Effect estimation (CATE / PEHE) | **v3** | `models/v3/final.pt` |
+All three are zero-shot. Pick the one matching your task — specialisation beats generalist on every task we've measured.
+## Shared contributions
+1. **Temporal Token Design** — first PFN for temporal panel data.
+2. **Causal Judgment Head** — learned reliability signals (null detection, regime classification, identifiability, mediation, confounding).
+3. **Causal Regime Prior** — direct / confounded / mediated / feedback training structures.
+4. **Self-Calibration** — auto-detects natural experiments in sensor data.
+5. **End-to-End System** — discovery + estimation + judgment + RCA from one forward pass.
+## Shared capabilities
+- **Causal Discovery** — pairwise interventional CATE with judgment-aware edge scoring, natural-experiment detection, continuous treatment, multi-lag estimation, asymmetry penalty.
+- **Effect Estimation** — temporal CATE trajectories with distributional output.
+- **Causal Judgment** — null-effect detection, regime classification (learned heuristics, not formal guarantees).
+- **Root Cause Analysis** — 8-method ensemble (AERCA, ESD, ProRCA, GCM noise, ICC, Shapley, counterfactual, chain tracing).
+---
+## v2.1 — default discovery model
+- 200K steps, curriculum-trained (Phase 1 CATE-only → Phase 2 +Null → Phase 3 Full).
+- Mixed prior: 40% CausalTimePrior + 30% base + 30% CausalFM.
+- Training window: `max_T_pre=50, max_T_post=30`.
+- Hardware: RTX 5090, ~4.1 h, 13.9 steps/s.
+### Discovery benchmarks (14 datasets, 6 domains, zero-shot)
+- Sachs (11 proteins, biological): F1 0.412, **AUROC 0.725** (vs Granger 0.291 / 0.621) — **champion**.
+- Causal Rivers (environmental): F1 0.319, AUROC 0.955.
+- Tennessee Eastman (52 vars, industrial): F1 0.314, AUROC 0.904.
+- SWaT (51 vars, water treatment): F1 0.265, AUROC 0.859.
+- CauseMe NVAR-5 / NVAR-10: F1 0.571 / 0.439.
+- Highest default-threshold F1 on 6 of 14 datasets.
+- Hallucination FPR: 0.02–0.08 (down from 1.0 in v2.0).
+### Training metrics (mean over steps 150K–200K)
+- EffectLoss ~2.9 | JudgmentLoss ~2.8
+- NullF1 0.94 | NullAUROC 0.99 | NullBrier 0.04 | NullSep 0.86
+- RegimeAcc 0.68 | RegimeMacroF1 0.48
+### Limitations
+- CATE estimation weak (PEHE 0.92) due to per-group Z-standardisation — **use v3 for estimation**.
+---
+## v2.2 — industrial / long-range specialist
+Built for 12+ hour causal lags in industrial control loops (digester → paper machine, reactor → downstream controller). Training window extended 4× and curriculum rebalanced to include null-effect batches in Phase 2.
+- 200K steps, BF16 mixed precision, `head_lr_scale=0.1` (decouples output-head learning from backbone to prevent late-stage drift collapse).
+- Training window: `max_T_pre=200, max_T_post=100, max_horizon=500` — supports lags up to ~16 h at 2-min sampling.
+- Manual NaN-skip with observability (saves first NaN-producing batch, aborts if skip rate ≥ threshold).
+- Hardware: RTX 5090, ~14.9 h.
+### Discovery benchmarks (default threshold 0.5)
+Strong on industrial / multivariate temporal data — **use this** when lags exceed ~1 h or when data is genuinely time-series (not stitched cross-sectional).
+| Dataset | Default F1 | Best F1 | AUROC |
+|---------|-----------|---------|-------|
+| Tennessee Eastman | **0.512** | 0.545 | **0.972** |
+| SWaT | **0.463** | 0.552 | **0.945** |
+| CauseMe VAR-5 | 0.769 | 0.800 | 0.960 |
+| CauseMe NVAR-5 | 0.800 | 0.800 | 0.863 |
+| CauseMe VAR-10 | 0.488 | 0.643 | 0.812 |
+| CauseMe NVAR-10 | 0.634 | 0.634 | 0.759 |
+| CauseMe Lorenz96-10 | 0.484 | 0.638 | 0.699 |
+| Sachs | 0.174 | 0.308 | 0.565 |
+Granger and PCMCI collapse on industrial data — they over-predict (1897 edges on TE vs 38 true), giving F1 ~0.04. TCPFN v2.2 is the only method with usable precision + recall together.
+### Estimation benchmarks
+- Overall PEHE 0.917 | ATE MAE 0.504 | trajectory correlation ≈ 0 — **use v3 for CATE**.
+### Limitations
+- **Sachs regressed** vs v2.1 (AUROC 0.565 vs 0.725). Use v2.1 for cross-sectional biological graphs.
+- Estimation degraded — trades short-range precision for long-range reach (see scar-tissue entry L-33 in project docs).
+---
+## v3 — estimation champion (experimental)
+Tag: `3.0.0-exp-global-std`. Global standardisation fix for the per-group Z-score bias that caps v2.1/v2.2 estimation quality.
+- 200K steps.
+- **PEHE 0.72** (vs v2.1 0.92 and v2.2 0.92) — best of the three on CATE estimation.
+- Discovery regressed slightly as trade-off; not yet benchmarked across all 14 discovery datasets — **use v2.1 or v2.2 for discovery**.
+### Limitations
+- Experimental tag — standardisation change not yet battle-tested beyond estimation.
+- Full benchmark matrix still pending.
+---
 ## Usage
 ```python
 from tcpfn import TemporalCausalAnalyzer
+# Discovery on general data (biology, cross-sectional)
+analyzer = TemporalCausalAnalyzer(temporal_model="models/temporal/final.pt")
+# Industrial / long-range discovery (lags in hours)
+analyzer = TemporalCausalAnalyzer(temporal_model="models/v2.2/final.pt")
+# Effect estimation (CATE trajectories, PEHE-sensitive work)
+analyzer = TemporalCausalAnalyzer(temporal_model="models/v3/final.pt")
 report = analyzer.run("sensor_data.csv")
+print(report.edges)      # causal graph with edge strengths and lags
+print(report.summary())  # human-readable summary
 result = analyzer.explain_event(
     data_path="sensor_data.csv",
     target_var="temperature_sensor",
     event_time="2025-11-15 14:15",
 )
+print(result.summary())  # ranked root causes + causal chains
 ```
+## Cross-cutting limitations
+- Regime classification is noisy (~0.68 accuracy, high eval variance). Judgment heads are **learned heuristics**, not formal guarantees.
+- Low-dim cross-sectional data stitched into pseudo-timeseries is out-of-distribution for v2.2 and v3; use v2.1.
+- v3 has not yet been run on the full discovery benchmark suite.
 ## Paper
+Stalupula et al., "Temporal Causal Prior-Data Fitted Networks for Panel Data with Learned Reliability Signals"