File size: 4,017 Bytes
79e6483
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# Alpha Robustness Matrix for Kaggle Offline

This path is intentionally separate from the stable backtest flow.

Use these files:

- `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/jsonl_alpha_robustness.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/deploy/v2/run_alpha_robustness_matrix.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_factor_executor.py`
- `/Users/giaphu/Documents/Pylab/AAE-new/backtest/robust_qlib_backtester.py`

The top-10 alpha pack prepared from `test7` is bundled here:

- `/Users/giaphu/Documents/Pylab/AAE-new/data/robustness_inputs/test7_top10_best_return_alpha_pack.jsonl`

## What the standard matrix runs

The standard matrix avoids duplicate cases where the baseline already covers the anchor setting.

Cases included:

- `baseline_replay`
- `topk_10`
- `topk_15`
- `weight_alpha_score_cap20`
- `weight_alpha_score_no_cap`
- `fee_0bps`
- `fee_10bps`
- `fee_20bps`
- `fee_30bps`
- `fee_50bps`
- `rebalance_10d`
- `rebalance_20d`
- `score_rank`
- `score_zscore`
- `score_rank_zscore`
- `frozen_recent_2026_ytd`

The baseline is:

- `backtest_engine = custom`
- `top_k = 5`
- `rebalance_freq = 5`
- `custom_weight_mode = equal`
- `max_pos_each_stock = 0.2`
- `buy_fee = sell_fee = 0.0013`
- `enforce_cash_limit = True`
- `volume/amount limits = off`

## Kaggle offline notebook cells

### 1. Install offline runtime

```python
import sys
import subprocess
from pathlib import Path

REPO = Path("/kaggle/input/datasets/gplebih/aae-new")

subprocess.run(
    [sys.executable, str(REPO / "deploy" / "v2" / "install_kaggle_offline.py")],
    check=True,
)
```

### 2. Run the full standard matrix on the bundled top-10 alpha pack

```python
import sys
import subprocess
from pathlib import Path

REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_standard_matrix")

cmd = [
    sys.executable,
    str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
    "--jsonl", str(INPUT_JSONL),
    "--data-path", str(DATA_PATH),
    "--period", "test",
    "--backtest-workers", "4",
    "--capture-detail-artifacts",
    "--output-root", str(OUTPUT_ROOT),
]

subprocess.run(cmd, check=True)

print("OUTPUT_ROOT =", OUTPUT_ROOT)
```

### 3. Debug on only a few cases first

```python
import sys
import subprocess
from pathlib import Path

REPO = Path("/kaggle/input/datasets/gplebih/aae-new")
INPUT_JSONL = REPO / "data" / "robustness_inputs" / "test7_top10_best_return_alpha_pack.jsonl"
DATA_PATH = REPO / "backtest" / "data" / "daily_pv.h5"
OUTPUT_ROOT = Path("/kaggle/working/alpha_robustness/test7_top10_smoke")

cmd = [
    sys.executable,
    str(REPO / "deploy" / "v2" / "run_alpha_robustness_matrix.py"),
    "--jsonl", str(INPUT_JSONL),
    "--data-path", str(DATA_PATH),
    "--period", "test",
    "--backtest-workers", "2",
    "--case-filter", "baseline_replay,topk_10,weight_alpha_score_cap20",
    "--capture-detail-artifacts",
    "--output-root", str(OUTPUT_ROOT),
]

subprocess.run(cmd, check=True)
```

## What comes out

Each case gets its own folder under:

- `OUTPUT_ROOT / cases / <case_id>`

The matrix runner also writes merged files at the root:

- `matrix_cases.csv`
- `merged_summary.csv`
- `merged_trials.csv`
- `merged_summary_yearly.csv`
- `merged_trials_yearly.csv`
- `merged_aggregate_yearly.csv`
- `coverage_sparsity_summary.csv`
- `matrix_manifest.json`

## Coverage / sparsity summary

`coverage_sparsity_summary.csv` is derived from detail artifacts and includes metrics like:

- number of rebalance days
- number of trade days
- mean / median / min / max target count
- percent of rebalance days with fewer names than `top_k`
- cash-weight stats
- top-k signal count by day
- all-zero-score day rate

This means `coverage / sparsity robustness` is analyzed from artifact outputs, not by running a separate backtest axis.