robust-AAE / docs /README_alpha_robustness_matrix.md
PuLam's picture
Add standalone alpha robustness matrix bundle
79e6483 verified

Alpha Robustness Matrix for Kaggle Offline

This path is intentionally separate from the stable backtest flow.

Use these files:

  • /Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/jsonl_alpha_robustness.py
  • /Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/run_alpha_robustness_matrix.py
  • /Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_factor_executor.py
  • /Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_qlib_backtester.py

The top-10 alpha pack prepared from test7 is bundled here:

  • /Users/giaphu/Documents/Pylab/AAE-new/data/robustness_inputs/test7_top10_best_return_alpha_pack.jsonl

What the standard matrix runs

The standard matrix avoids duplicate cases where the baseline already covers the anchor setting.

Cases included:

  • baseline_replay
  • topk_10
  • topk_15
  • weight_alpha_score_cap20
  • weight_alpha_score_no_cap
  • fee_0bps
  • fee_10bps
  • fee_20bps
  • fee_30bps
  • fee_50bps
  • rebalance_10d
  • rebalance_20d
  • score_rank
  • score_zscore
  • score_rank_zscore
  • frozen_recent_2026_ytd

The baseline is:

  • backtest_engine = custom
  • top_k = 5
  • rebalance_freq = 5
  • custom_weight_mode = equal
  • max_pos_each_stock = 0.2
  • buy_fee = sell_fee = 0.0013
  • enforce_cash_limit = True
  • volume/amount limits = off

Kaggle offline notebook cells

1. Install offline runtime

import sys
import subprocess
from pathlib import Path

REPO = Path("/kaggle/input/datasets/gplebih/aae-new")

subprocess.run(
    [sys.executable, str(REPO / "deploy" / "v2" / "install_kaggle_offline.py")],
    check=True,
)

2. Run the full standard matrix on the bundled top-10 alpha pack

import sys
import subprocess
from pathlib import Path

REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_standard_matrix")

cmd = [
    sys.executable,
    str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
    "--jsonl", str(INPUT_JSONL),
    "--data-path", str(DATA_PATH),
    "--period", "test",
    "--backtest-workers", "4",
    "--capture-detail-artifacts",
    "--output-root", str(OUTPUT_ROOT),
]

subprocess.run(cmd, check=True)

print("OUTPUT_ROOT =", OUTPUT_ROOT)

3. Debug on only a few cases first

import sys
import subprocess
from pathlib import Path

REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_smoke")

cmd = [
    sys.executable,
    str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
    "--jsonl", str(INPUT_JSONL),
    "--data-path", str(DATA_PATH),
    "--period", "test",
    "--backtest-workers", "2",
    "--case-filter", "baseline_replay,topk_10,weight_alpha_score_cap20",
    "--capture-detail-artifacts",
    "--output-root", str(OUTPUT_ROOT),
]

subprocess.run(cmd, check=True)

What comes out

Each case gets its own folder under:

  • OUTPUT_ROOT / cases / <case_id>

The matrix runner also writes merged files at the root:

  • matrix_cases.csv
  • merged_summary.csv
  • merged_trials.csv
  • merged_summary_yearly.csv
  • merged_trials_yearly.csv
  • merged_aggregate_yearly.csv
  • coverage_sparsity_summary.csv
  • matrix_manifest.json

Coverage / sparsity summary

coverage_sparsity_summary.csv is derived from detail artifacts and includes metrics like:

  • number of rebalance days
  • number of trade days
  • mean / median / min / max target count
  • percent of rebalance days with fewer names than top_k
  • cash-weight stats
  • top-k signal count by day
  • all-zero-score day rate

This means coverage / sparsity robustness is analyzed from artifact outputs, not by running a separate backtest axis.