robust-AAE / docs /README_alpha_robustness_matrix.md
PuLam's picture
Add standalone alpha robustness matrix bundle
79e6483 verified
# Alpha Robustness Matrix for Kaggle Offline
This path is intentionally separate from the stable backtest flow.
Use these files:
- `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/jsonl_alpha_robustness.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/run_alpha_robustness_matrix.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_factor_executor.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_qlib_backtester.py`
The top-10 alpha pack prepared from `test7` is bundled here:
- `/Users/giaphu/Documents/Pylab/AAE-new/data/robustness_inputs/test7_top10_best_return_alpha_pack.jsonl`
## What the standard matrix runs
The standard matrix avoids duplicate cases where the baseline already covers the anchor setting.
Cases included:
- `baseline_replay`
- `topk_10`
- `topk_15`
- `weight_alpha_score_cap20`
- `weight_alpha_score_no_cap`
- `fee_0bps`
- `fee_10bps`
- `fee_20bps`
- `fee_30bps`
- `fee_50bps`
- `rebalance_10d`
- `rebalance_20d`
- `score_rank`
- `score_zscore`
- `score_rank_zscore`
- `frozen_recent_2026_ytd`
The baseline is:
- `backtest_engine = custom`
- `top_k = 5`
- `rebalance_freq = 5`
- `custom_weight_mode = equal`
- `max_pos_each_stock = 0.2`
- `buy_fee = sell_fee = 0.0013`
- `enforce_cash_limit = True`
- `volume/amount limits = off`
## Kaggle offline notebook cells
### 1. Install offline runtime
```python
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
subprocess.run(
[sys.executable, str(REPO / "deploy" / "v2" / "install_kaggle_offline.py")],
check=True,
)
```
### 2. Run the full standard matrix on the bundled top-10 alpha pack
```python
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_standard_matrix")
cmd = [
sys.executable,
str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
"--jsonl", str(INPUT_JSONL),
"--data-path", str(DATA_PATH),
"--period", "test",
"--backtest-workers", "4",
"--capture-detail-artifacts",
"--output-root", str(OUTPUT_ROOT),
]
subprocess.run(cmd, check=True)
print("OUTPUT_ROOT =", OUTPUT_ROOT)
```
### 3. Debug on only a few cases first
```python
import sys
import subprocess
from pathlib import Path
REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_smoke")
cmd = [
sys.executable,
str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
"--jsonl", str(INPUT_JSONL),
"--data-path", str(DATA_PATH),
"--period", "test",
"--backtest-workers", "2",
"--case-filter", "baseline_replay,topk_10,weight_alpha_score_cap20",
"--capture-detail-artifacts",
"--output-root", str(OUTPUT_ROOT),
]
subprocess.run(cmd, check=True)
```
## What comes out
Each case gets its own folder under:
- `OUTPUT_ROOT / cases / <case_id>`
The matrix runner also writes merged files at the root:
- `matrix_cases.csv`
- `merged_summary.csv`
- `merged_trials.csv`
- `merged_summary_yearly.csv`
- `merged_trials_yearly.csv`
- `merged_aggregate_yearly.csv`
- `coverage_sparsity_summary.csv`
- `matrix_manifest.json`
## Coverage / sparsity summary
`coverage_sparsity_summary.csv` is derived from detail artifacts and includes metrics like:
- number of rebalance days
- number of trade days
- mean / median / min / max target count
- percent of rebalance days with fewer names than `top_k`
- cash-weight stats
- top-k signal count by day
- all-zero-score day rate
This means `coverage / sparsity robustness` is analyzed from artifact outputs, not by running a separate backtest axis.