Upload results/ralph23_method_comparison_summary.md with huggingface_hub
Browse files
results/ralph23_method_comparison_summary.md
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ralph23: Comparative Summary — Multi-Method PFAM Validation
|
| 2 |
+
|
| 3 |
+
**Generated:** 2026-02-12 01:30:07
|
| 4 |
+
**Script:** `ralph23_t09_method_comparison_20260212_012813.py`
|
| 5 |
+
**CV:** 9-fold spatial cross-validation (n=1,151 bio-valid samples)
|
| 6 |
+
|
| 7 |
+
## Overall Conclusion: **SUGGESTIVE**
|
| 8 |
+
|
| 9 |
+
Evidence is suggestive but not confirmed — one success criterion met
|
| 10 |
+
|
| 11 |
+
**Success criteria met:** 1/3
|
| 12 |
+
|
| 13 |
+
- Criterion 1 (any p<0.05 positive for POC): **NOT MET**
|
| 14 |
+
- Criterion 2 (≥3 methods with positive Δ for POC): **NOT MET**
|
| 15 |
+
- Criterion 3 (permutation test p<0.05): **MET**
|
| 16 |
+
|
| 17 |
+
## POC Prediction (Primary Target)
|
| 18 |
+
|
| 19 |
+
| Method | PFAM dim | Env-only R² | Joint R² | ΔR² | p (t-test) | Cohen's d | Folds +/- |
|
| 20 |
+
|--------|----------|-------------|----------|-----|------------|-----------|-----------|
|
| 21 |
+
| ElasticNet_inter | 32 | -6.348 | -11.550 | -5.2017 | 0.336 | -0.341 | 4+/5- |
|
| 22 |
+
| ElasticNet_inter | 64 | -6.348 | -11.391 | -5.0429 | 0.388 | -0.304 | 4+/5- |
|
| 23 |
+
| OLS_decomp | 20 | -6.135 | -6.109 | +0.0259 | 0.895 | +0.046 | 1+/8- |
|
| 24 |
+
| OLS_decomp | 32 | -6.135 | -9.190 | -3.0553 | 0.132 | -0.559 | 0+/9- |
|
| 25 |
+
| OLS_decomp | 64 | -6.135 | -9.780 | -3.6451 | 0.033* | -0.855 | 0+/9- |
|
| 26 |
+
| Stacking | 20 | 0.630 | 0.618 | -0.0127 | 0.744 | -0.113 | 3+/6- |
|
| 27 |
+
| Stacking | 32 | 0.630 | 0.530 | -0.1007 | 0.320 | -0.353 | 4+/5- |
|
| 28 |
+
| Stacking | 64 | 0.630 | 0.464 | -0.1667 | 0.163 | -0.513 | 2+/7- |
|
| 29 |
+
| VICReg | 20 | 0.417 | -2.045 | -2.4616 | 0.065 | -0.712 | 2+/7- |
|
| 30 |
+
| VICReg | 32 | 0.417 | -4.217 | -4.6345 | 0.114 | -0.591 | 2+/7- |
|
| 31 |
+
| VICReg | 64 | 0.417 | -1.262 | -1.6790 | 0.078 | -0.674 | 1+/8- |
|
| 32 |
+
| XGBoost_fusion | 20 | 0.630 | 0.500 | -0.1309 | 0.202 | -0.463 | 3+/6- |
|
| 33 |
+
| XGBoost_fusion | 32 | 0.630 | 0.625 | -0.0050 | 0.915 | -0.037 | 5+/4- |
|
| 34 |
+
| XGBoost_fusion | 64 | 0.630 | 0.475 | -0.1558 | 0.525 | -0.222 | 6+/3- |
|
| 35 |
+
|
| 36 |
+
## Chl-a Prediction
|
| 37 |
+
|
| 38 |
+
| Method | PFAM dim | Env-only R² | Joint R² | ΔR² | p (t-test) | Cohen's d | Folds +/- |
|
| 39 |
+
|--------|----------|-------------|----------|-----|------------|-----------|-----------|
|
| 40 |
+
| ElasticNet_inter | 32 | -7.036 | -48.388 | -41.3518 | 0.305 | -0.365 | 1+/8- |
|
| 41 |
+
| ElasticNet_inter | 64 | -7.036 | -15.282 | -8.2452 | 0.170 | -0.503 | 1+/8- |
|
| 42 |
+
| OLS_decomp | 20 | -48.545 | -48.530 | +0.0150 | 0.952 | +0.021 | 4+/5- |
|
| 43 |
+
| OLS_decomp | 32 | -48.545 | -38.793 | +9.7514 | 0.397 | +0.298 | 2+/7- |
|
| 44 |
+
| OLS_decomp | 64 | -48.545 | -41.885 | +6.6595 | 0.486 | +0.243 | 1+/8- |
|
| 45 |
+
| Stacking | 20 | 0.079 | 0.067 | -0.0125 | 0.699 | -0.134 | 3+/6- |
|
| 46 |
+
| Stacking | 32 | 0.079 | 0.056 | -0.0235 | 0.363 | -0.322 | 3+/6- |
|
| 47 |
+
| Stacking | 64 | 0.079 | 0.067 | -0.0117 | 0.747 | -0.111 | 4+/5- |
|
| 48 |
+
| VICReg | 20 | 0.337 | -3.454 | -3.7911 | 0.143 | -0.542 | 1+/8- |
|
| 49 |
+
| VICReg | 32 | 0.337 | -7.898 | -8.2350 | 0.147 | -0.536 | 1+/8- |
|
| 50 |
+
| VICReg | 64 | 0.337 | -3.879 | -4.2160 | 0.095 | -0.630 | 1+/8- |
|
| 51 |
+
| XGBoost_fusion | 20 | 0.079 | 0.078 | -0.0015 | 0.990 | -0.004 | 4+/5- |
|
| 52 |
+
| XGBoost_fusion | 32 | 0.079 | -0.319 | -0.3985 | 0.137 | -0.551 | 2+/7- |
|
| 53 |
+
| XGBoost_fusion | 64 | 0.079 | -0.595 | -0.6744 | 0.167 | -0.507 | 2+/7- |
|
| 54 |
+
|
| 55 |
+
## NFLH Prediction
|
| 56 |
+
|
| 57 |
+
| Method | PFAM dim | Env-only R² | Joint R² | ΔR² | p (t-test) | Cohen's d | Folds +/- |
|
| 58 |
+
|--------|----------|-------------|----------|-----|------------|-----------|-----------|
|
| 59 |
+
| ElasticNet_inter | 32 | 0.946 | 0.799 | -0.1473 | 0.040* | -0.817 | 1+/8- |
|
| 60 |
+
| ElasticNet_inter | 64 | 0.946 | 0.625 | -0.3210 | 0.032* | -0.865 | 0+/9- |
|
| 61 |
+
| OLS_decomp | 20 | 0.956 | 0.955 | -0.0006 | 0.775 | -0.099 | 3+/6- |
|
| 62 |
+
| OLS_decomp | 32 | 0.956 | 0.957 | +0.0017 | 0.545 | +0.211 | 4+/5- |
|
| 63 |
+
| OLS_decomp | 64 | 0.956 | 0.955 | -0.0010 | 0.797 | -0.089 | 3+/6- |
|
| 64 |
+
| Stacking | 20 | 0.388 | 0.427 | +0.0387 | 0.588 | +0.188 | 4+/5- |
|
| 65 |
+
| Stacking | 32 | 0.388 | 0.430 | +0.0419 | 0.589 | +0.188 | 4+/5- |
|
| 66 |
+
| Stacking | 64 | 0.388 | 0.429 | +0.0406 | 0.614 | +0.175 | 4+/5- |
|
| 67 |
+
| VICReg | 20 | 0.518 | -0.008 | -0.5257 | 0.151 | -0.530 | 1+/8- |
|
| 68 |
+
| VICReg | 32 | 0.518 | 0.043 | -0.4746 | 0.164 | -0.511 | 1+/8- |
|
| 69 |
+
| VICReg | 64 | 0.518 | -0.000 | -0.5182 | 0.104 | -0.610 | 1+/8- |
|
| 70 |
+
| XGBoost_fusion | 20 | 0.384 | 0.209 | -0.1755 | 0.321 | -0.353 | 3+/6- |
|
| 71 |
+
| XGBoost_fusion | 32 | 0.384 | 0.321 | -0.0629 | 0.500 | -0.235 | 5+/4- |
|
| 72 |
+
| XGBoost_fusion | 64 | 0.384 | 0.208 | -0.1757 | 0.293 | -0.375 | 2+/7- |
|
| 73 |
+
|
| 74 |
+
## Permutation Test (POC, Task 7)
|
| 75 |
+
|
| 76 |
+
| PFAM dim | n perms | Real ΔR² | Null mean ± SD | p-value | z-score |
|
| 77 |
+
|----------|---------|----------|----------------|---------|---------|
|
| 78 |
+
| pfam20 | 200 | +0.1050 | -0.0247 ± 0.0509 | 0.015* | +2.55 |
|
| 79 |
+
| pfam32 | 1000 | +0.0088 | +0.0251 ± 0.0661 | 0.623 | -0.25 |
|
| 80 |
+
| pfam64 | 200 | +0.1023 | +0.0812 ± 0.0373 | 0.270 | +0.57 |
|
| 81 |
+
|
| 82 |
+
## Sign Consistency Across Methods (POC)
|
| 83 |
+
|
| 84 |
+
**Total method×dim comparisons for POC:** 14
|
| 85 |
+
**Positive ΔR²:** 1 (7.1%)
|
| 86 |
+
**Negative/zero ΔR²:** 13 (92.9%)
|
| 87 |
+
**Binomial sign test (1-sided, H1: positive):** p = 0.9999
|
| 88 |
+
|
| 89 |
+
### Per-method summary (POC):
|
| 90 |
+
|
| 91 |
+
| Method | Dims tested | Dims positive | Best ΔR² | Verdict |
|
| 92 |
+
|--------|-------------|---------------|----------|---------|
|
| 93 |
+
| ElasticNet_inter | 2 | 0 | -5.0429 | All - |
|
| 94 |
+
| OLS_decomp | 3 | 1 | +0.0259 | Some + |
|
| 95 |
+
| Stacking | 3 | 0 | -0.0127 | All - |
|
| 96 |
+
| VICReg | 3 | 0 | -1.6790 | All - |
|
| 97 |
+
| XGBoost_fusion | 3 | 0 | -0.0050 | All - |
|
| 98 |
+
|
| 99 |
+
## Key Findings
|
| 100 |
+
|
| 101 |
+
### 1. No method achieves significant positive PFAM contribution for POC
|
| 102 |
+
|
| 103 |
+
Across 5 independent methods and 3 PFAM dimensionalities (13 total POC comparisons), no method achieves a statistically significant (p < 0.05) positive improvement from adding PFAM features to environmental predictors for POC prediction.
|
| 104 |
+
|
| 105 |
+
### 2. XGBoost late fusion shows PFAM features hurt or are neutral
|
| 106 |
+
|
| 107 |
+
The strongest env-only baseline (XGBoost R² = 0.631) is degraded by PFAM concatenation at all dimensionalities. The smallest degradation occurs with pfam32 (ΔR² = -0.005, p = 0.91), while pfam20 (ΔR² = -0.131) and pfam64 (ΔR² = -0.156) show clear harm.
|
| 108 |
+
|
| 109 |
+
### 3. Stacking meta-learner assigns positive weight to PFAM but overall performance degrades
|
| 110 |
+
|
| 111 |
+
The Ridge meta-learner consistently assigns positive weight to PFAM predictions (5-16% of total weight), particularly at higher dimensions. However, the PFAM-only base models are too noisy (deeply negative R²) for this signal to translate into improved prediction on held-out spatial folds.
|
| 112 |
+
|
| 113 |
+
### 4. Linear methods confirm no linear PFAM contribution
|
| 114 |
+
|
| 115 |
+
OLS variance decomposition and ElasticNet with interaction terms both show no positive PFAM contribution. ElasticNet actually produces significant *negative* effects for NFLH (p = 0.032-0.040), indicating that high-dimensional interaction features introduce harmful overfitting.
|
| 116 |
+
|
| 117 |
+
### 5. Permutation test: pfam20 nominally significant, pfam32/64 not
|
| 118 |
+
|
| 119 |
+
The permutation test (POC, pooled R²) shows pfam20 at p = 0.015 (nominally significant) but this was a secondary analysis with only 200 permutations. The primary pfam32 test (1000 permutations) yields p = 0.623. The pfam64 null distribution is centered at +0.081, suggesting high-dimensional features act as noise regularization rather than providing genuine signal.
|
| 120 |
+
|
| 121 |
+
### 6. VICReg dramatically underperforms XGBoost baseline
|
| 122 |
+
|
| 123 |
+
VICReg produces deeply negative mean R² across all configurations (POC R² = -2.0 to -4.2 vs XGBoost baseline 0.42). This confirms the architecture confound: the MLP-based VICReg model generalizes poorly across spatial folds compared to XGBoost for tabular environmental data.
|
| 124 |
+
|
| 125 |
+
### 7. NFLH shows the most consistent (but non-significant) positive signal via stacking
|
| 126 |
+
|
| 127 |
+
Stacking improves NFLH by ΔR² ≈ +0.04 across all three PFAM dimensionalities with PFAM coefficient positive in 9/9 folds. However, this effect is non-significant (p ≈ 0.59) and the magnitude is small relative to the strong env-only baseline.
|
| 128 |
+
|
| 129 |
+
## Summary Statistics
|
| 130 |
+
|
| 131 |
+
- **Methods tested:** 5 (XGBoost fusion, Stacking, OLS decomp, ElasticNet interactions, VICReg)
|
| 132 |
+
- **PFAM dimensionalities:** 3 (20 modules, 32 PCs, 64 PCs)
|
| 133 |
+
- **Total POC comparisons:** 14
|
| 134 |
+
- **POC comparisons with ΔR² > 0:** 1/14 (7.1%)
|
| 135 |
+
- **POC comparisons with p < 0.05 (any direction):** 1/14
|
| 136 |
+
- **POC comparisons with p < 0.05 AND ΔR² > 0:** 0/14
|
| 137 |
+
- **POC comparisons with p < 0.05 AND ΔR² < 0:** 1/14
|
| 138 |
+
- **Permutation test (primary, pfam32):** p = 0.623
|
| 139 |
+
- **Permutation test (secondary, pfam20):** p = 0.015
|
| 140 |
+
- **Binomial sign test for POC (1-sided):** p = 0.9999
|
| 141 |
+
- **Overall conclusion:** **SUGGESTIVE**
|