# Alpha Robustness Matrix for Kaggle Offline This path is intentionally separate from the stable backtest flow. Use these files: - `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/jsonl_alpha_robustness.py` - `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/run_alpha_robustness_matrix.py` - `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_factor_executor.py` - `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_qlib_backtester.py` The top-10 alpha pack prepared from `test7` is bundled here: - `/Users/giaphu/Documents/Pylab/AAE-new/data/robustness_inputs/test7_top10_best_return_alpha_pack.jsonl` ## What the standard matrix runs The standard matrix avoids duplicate cases where the baseline already covers the anchor setting. Cases included: - `baseline_replay` - `topk_10` - `topk_15` - `weight_alpha_score_cap20` - `weight_alpha_score_no_cap` - `fee_0bps` - `fee_10bps` - `fee_20bps` - `fee_30bps` - `fee_50bps` - `rebalance_10d` - `rebalance_20d` - `score_rank` - `score_zscore` - `score_rank_zscore` - `frozen_recent_2026_ytd` The baseline is: - `backtest_engine = custom` - `top_k = 5` - `rebalance_freq = 5` - `custom_weight_mode = equal` - `max_pos_each_stock = 0.2` - `buy_fee = sell_fee = 0.0013` - `enforce_cash_limit = True` - `volume/amount limits = off` ## Kaggle offline notebook cells ### 1. Install offline runtime ```python import sys import subprocess from pathlib import Path REPO = Path("/kaggle/input/datasets/gplebih/aae-new") subprocess.run( [sys.executable, str(REPO / "deploy" / "v2" / "install_kaggle_offline.py")], check=True, ) ``` ### 2. Run the full standard matrix on the bundled top-10 alpha pack ```python import sys import subprocess from pathlib import Path REPO = Path("/kaggle/input/datasets/gplebih/aae-new") INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl" DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5" OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_standard_matrix") cmd = [ sys.executable, str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"), "--jsonl", str(INPUT_JSONL), "--data-path", str(DATA_PATH), "--period", "test", "--backtest-workers", "4", "--capture-detail-artifacts", "--output-root", str(OUTPUT_ROOT), ] subprocess.run(cmd, check=True) print("OUTPUT_ROOT =", OUTPUT_ROOT) ``` ### 3. Debug on only a few cases first ```python import sys import subprocess from pathlib import Path REPO = Path("/kaggle/input/datasets/gplebih/aae-new") INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl" DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5" OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_smoke") cmd = [ sys.executable, str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"), "--jsonl", str(INPUT_JSONL), "--data-path", str(DATA_PATH), "--period", "test", "--backtest-workers", "2", "--case-filter", "baseline_replay,topk_10,weight_alpha_score_cap20", "--capture-detail-artifacts", "--output-root", str(OUTPUT_ROOT), ] subprocess.run(cmd, check=True) ``` ## What comes out Each case gets its own folder under: - `OUTPUT_ROOT / cases / ` The matrix runner also writes merged files at the root: - `matrix_cases.csv` - `merged_summary.csv` - `merged_trials.csv` - `merged_summary_yearly.csv` - `merged_trials_yearly.csv` - `merged_aggregate_yearly.csv` - `coverage_sparsity_summary.csv` - `matrix_manifest.json` ## Coverage / sparsity summary `coverage_sparsity_summary.csv` is derived from detail artifacts and includes metrics like: - number of rebalance days - number of trade days - mean / median / min / max target count - percent of rebalance days with fewer names than `top_k` - cash-weight stats - top-k signal count by day - all-zero-score day rate This means `coverage / sparsity robustness` is analyzed from artifact outputs, not by running a separate backtest axis.