Alpha Robustness Matrix for Kaggle Offline
This path is intentionally separate from the stable backtest flow.
Use these files:
/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/jsonl_alpha_robustness.py/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/run_alpha_robustness_matrix.py/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_factor_executor.py/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_qlib_backtester.py
The top-10 alpha pack prepared from test7 is bundled here:
/Users/giaphu/Documents/Pylab/AAE-new/data/robustness_inputs/test7_top10_best_return_alpha_pack.jsonl
What the standard matrix runs
The standard matrix avoids duplicate cases where the baseline already covers the anchor setting.
Cases included:
baseline_replaytopk_10topk_15weight_alpha_score_cap20weight_alpha_score_no_capfee_0bpsfee_10bpsfee_20bpsfee_30bpsfee_50bpsrebalance_10drebalance_20dscore_rankscore_zscorescore_rank_zscorefrozen_recent_2026_ytd
The baseline is:
backtest_engine = customtop_k = 5rebalance_freq = 5custom_weight_mode = equalmax_pos_each_stock = 0.2buy_fee = sell_fee = 0.0013enforce_cash_limit = Truevolume/amount limits = off
Kaggle offline notebook cells
1. Install offline runtime
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
subprocess.run(
[sys.executable, str(REPO / "deploy" / "v2" / "install_kaggle_offline.py")],
check=True,
)
2. Run the full standard matrix on the bundled top-10 alpha pack
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_standard_matrix")
cmd = [
sys.executable,
str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
"--jsonl", str(INPUT_JSONL),
"--data-path", str(DATA_PATH),
"--period", "test",
"--backtest-workers", "4",
"--capture-detail-artifacts",
"--output-root", str(OUTPUT_ROOT),
]
subprocess.run(cmd, check=True)
print("OUTPUT_ROOT =", OUTPUT_ROOT)
3. Debug on only a few cases first
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_smoke")
cmd = [
sys.executable,
str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
"--jsonl", str(INPUT_JSONL),
"--data-path", str(DATA_PATH),
"--period", "test",
"--backtest-workers", "2",
"--case-filter", "baseline_replay,topk_10,weight_alpha_score_cap20",
"--capture-detail-artifacts",
"--output-root", str(OUTPUT_ROOT),
]
subprocess.run(cmd, check=True)
What comes out
Each case gets its own folder under:
OUTPUT_ROOT / cases / <case_id>
The matrix runner also writes merged files at the root:
matrix_cases.csvmerged_summary.csvmerged_trials.csvmerged_summary_yearly.csvmerged_trials_yearly.csvmerged_aggregate_yearly.csvcoverage_sparsity_summary.csvmatrix_manifest.json
Coverage / sparsity summary
coverage_sparsity_summary.csv is derived from detail artifacts and includes metrics like:
- number of rebalance days
- number of trade days
- mean / median / min / max target count
- percent of rebalance days with fewer names than
top_k - cash-weight stats
- top-k signal count by day
- all-zero-score day rate
This means coverage / sparsity robustness is analyzed from artifact outputs, not by running a separate backtest axis.