File size: 4,017 Bytes
79e6483 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | # Alpha Robustness Matrix for Kaggle Offline
This path is intentionally separate from the stable backtest flow.
Use these files:
- `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/jsonl_alpha_robustness.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/run_alpha_robustness_matrix.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_factor_executor.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_qlib_backtester.py`
The top-10 alpha pack prepared from `test7` is bundled here:
- `/Users/giaphu/Documents/Pylab/AAE-new/data/robustness_inputs/test7_top10_best_return_alpha_pack.jsonl`
## What the standard matrix runs
The standard matrix avoids duplicate cases where the baseline already covers the anchor setting.
Cases included:
- `baseline_replay`
- `topk_10`
- `topk_15`
- `weight_alpha_score_cap20`
- `weight_alpha_score_no_cap`
- `fee_0bps`
- `fee_10bps`
- `fee_20bps`
- `fee_30bps`
- `fee_50bps`
- `rebalance_10d`
- `rebalance_20d`
- `score_rank`
- `score_zscore`
- `score_rank_zscore`
- `frozen_recent_2026_ytd`
The baseline is:
- `backtest_engine = custom`
- `top_k = 5`
- `rebalance_freq = 5`
- `custom_weight_mode = equal`
- `max_pos_each_stock = 0.2`
- `buy_fee = sell_fee = 0.0013`
- `enforce_cash_limit = True`
- `volume/amount limits = off`
## Kaggle offline notebook cells
### 1. Install offline runtime
```python
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
subprocess.run(
[sys.executable, str(REPO / "deploy" / "v2" / "install_kaggle_offline.py")],
check=True,
)
```
### 2. Run the full standard matrix on the bundled top-10 alpha pack
```python
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_standard_matrix")
cmd = [
sys.executable,
str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
"--jsonl", str(INPUT_JSONL),
"--data-path", str(DATA_PATH),
"--period", "test",
"--backtest-workers", "4",
"--capture-detail-artifacts",
"--output-root", str(OUTPUT_ROOT),
]
subprocess.run(cmd, check=True)
print("OUTPUT_ROOT =", OUTPUT_ROOT)
```
### 3. Debug on only a few cases first
```python
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_smoke")
cmd = [
sys.executable,
str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
"--jsonl", str(INPUT_JSONL),
"--data-path", str(DATA_PATH),
"--period", "test",
"--backtest-workers", "2",
"--case-filter", "baseline_replay,topk_10,weight_alpha_score_cap20",
"--capture-detail-artifacts",
"--output-root", str(OUTPUT_ROOT),
]
subprocess.run(cmd, check=True)
```
## What comes out
Each case gets its own folder under:
- `OUTPUT_ROOT / cases / <case_id>`
The matrix runner also writes merged files at the root:
- `matrix_cases.csv`
- `merged_summary.csv`
- `merged_trials.csv`
- `merged_summary_yearly.csv`
- `merged_trials_yearly.csv`
- `merged_aggregate_yearly.csv`
- `coverage_sparsity_summary.csv`
- `matrix_manifest.json`
## Coverage / sparsity summary
`coverage_sparsity_summary.csv` is derived from detail artifacts and includes metrics like:
- number of rebalance days
- number of trade days
- mean / median / min / max target count
- percent of rebalance days with fewer names than `top_k`
- cash-weight stats
- top-k signal count by day
- all-zero-score day rate
This means `coverage / sparsity robustness` is analyzed from artifact outputs, not by running a separate backtest axis.
|