| Your task is to write a Python script that processes ATLAS diphoton event data. |
|
|
| Load the following two numpy array files: |
| - {BASE_DIR}/solution/arrays/data_raw.npy (real collision data) |
| - {BASE_DIR}/solution/arrays/signal_raw.npy (Monte Carlo simulated signal) |
|
|
| Each file contains a 2D array with shape (N_events, 46), where each row is one event and columns store physics quantities. |
|
|
| Your script must: |
| 1. Apply MC reweighting to simulated events |
| 2. Compute diphoton kinematics for all events |
| 3. Apply physics selection cuts |
| 4. Save final signal and background samples |
|
|
| Save outputs to: |
| - {BASE_DIR}/arrays/signal.npy |
| - {BASE_DIR}/arrays/bkgd.npy |
|
|
| ==================== |
| COLUMN DEFINITIONS |
| ==================== |
|
|
| 0: leading photon pT (MeV) |
| 1: leading photon eta |
| 2: leading photon phi |
| 3: subleading photon pT (MeV) |
| 4: subleading photon eta |
| 5: subleading photon phi |
| 6: leading lepton pT |
| 7: leading lepton eta |
| 8: leading lepton phi |
| 9: subleading lepton pT |
| 10: subleading lepton eta |
| 11: subleading lepton phi |
| 12-29: jet kinematics (6 jets x 3 variables) |
| 30: missing ET |
| 31: missing ET phi |
| 32: event weight |
| 33: sum of MC weights |
| 34: cross section (pb) |
| 35: leading photon tight ID flag |
| 36: subleading photon tight ID flag |
| 37: scaleFactor_PILEUP |
| 38: scaleFactor_PHOTON |
| 39: scaleFactor_PhotonTRIGGER |
| 40: scaleFactor_ELE |
| 41: scaleFactor_MUON |
| 42: scaleFactor_LepTRIGGER |
| 43: scaleFactor_BTAG |
| 44: (initially NaN) diphoton invariant mass m_yy (MeV) |
| 45: (initially NaN) diphoton transverse momentum pT_yy (MeV) |
|
|
| ==================== |
| STEP 1: LOAD AND VALIDATE |
| ==================== |
|
|
| Load both .npy files with numpy.load(). Verify each has exactly 46 columns; raise ValueError if not. |
| Do NOT drop any columns. Preserve the full (N, 46) shape throughout. |
|
|
| ==================== |
| STEP 2: MC WEIGHT UPDATE (signal_raw.npy only) |
| ==================== |
|
|
| A. Cross-section correction: |
| For any row where abs(column_34 - 2.64338632e-06) < 1e-10: |
| Replace column 34 with 0.000116 (correct Higgs to gamma-gamma cross-section in pb) |
|
|
| B. Normalization (per-event, not global): |
| For each row independently compute: |
| norm = (column_34 * 10000.0) / column_33 |
| where 10000.0 is the luminosity in pb inverse |
|
|
| C. Scale factor product: |
| For each row multiply columns 37 through 43 (7 factors total) |
|
|
| D. Final weight: |
| column_32 = column_32 * norm * scale_factor_product |
| Store result back into column 32 |
|
|
| ==================== |
| STEP 3: KINEMATICS (both MC and data) |
| ==================== |
|
|
| For every event use ROOT.TLorentzVector to compute diphoton system: |
|
|
| photon1 = ROOT.TLorentzVector() |
| photon1.SetPtEtaPhiM(column_0, column_1, column_2, 0.0) |
|
|
| photon2 = ROOT.TLorentzVector() |
| photon2.SetPtEtaPhiM(column_3, column_4, column_5, 0.0) |
|
|
| diphoton = photon1 + photon2 |
| column_44 = diphoton.M() |
| column_45 = diphoton.Pt() |
|
|
| ==================== |
| STEP 4: PRESELECTION (both MC and data) |
| ==================== |
|
|
| Create a safe denominator for ratio cuts: |
| m_yy_safe = np.where(column_44 <= 0, 1e-6, column_44) |
|
|
| Apply ALL of the following cuts (combine with logical AND): |
|
|
| 1. Photon eta acceptance (both photons): |
| abs(column_1) < 1.37 OR (1.52 < abs(column_1) < 2.37) |
| abs(column_4) < 1.37 OR (1.52 < abs(column_4) < 2.37) |
|
|
| 2. Photon pT thresholds: |
| column_0 > 25000 (leading photon pT in MeV) |
| column_3 > 25000 (subleading photon pT in MeV) |
|
|
| 3. pT/mass ratios (use m_yy_safe to avoid division by zero): |
| column_0 / m_yy_safe > 0.35 (leading photon) |
| column_3 / m_yy_safe > 0.25 (subleading photon) |
| |
| CRITICAL: Column 0 is ALWAYS the leading photon, column 3 is ALWAYS subleading. |
| Do NOT use np.maximum or np.minimum to pick which is which. |
| The input arrays are already sorted by pT. |
|
|
| 4. Diphoton mass window: |
| 105000 < column_44 < 160000 (MeV) |
|
|
| Keep only rows passing all cuts above. |
|
|
| After preselection, for DATA ONLY: |
| Set column_32 = 1.0 for all remaining data events |
|
|
| ==================== |
| STEP 5: SIGNAL SELECTION (MC only) |
| ==================== |
|
|
| From preselected MC events, apply: |
|
|
| 1. Tight photon ID: |
| (column_35 == 1.0) AND (column_36 == 1.0) |
| Use exact equality. Do NOT use np.isclose(). |
|
|
| 2. Signal mass window: |
| 123000 < column_44 < 127000 (MeV) |
|
|
| Save selected events to {BASE_DIR}/arrays/signal.npy |
|
|
| ==================== |
| STEP 6: BACKGROUND MODELING (data only) |
| ==================== |
|
|
| From preselected data events (with column_32 = 1.0): |
|
|
| Define categories: |
| - TI (tight): (column_35 == 1.0) AND (column_36 == 1.0) |
| - NTI (non-tight): NOT TI |
|
|
| Define regions: |
| - Signal: 123000 < column_44 < 127000 |
| - Sideband: (105000 < column_44 < 120000) OR (130000 < column_44 < 160000) |
|
|
| Compute yields (sum of column_32): |
| Y_NTI_sideband = sum of weights for (NTI AND sideband) |
| Y_NTI_signal = sum of weights for (NTI AND signal) |
| Y_TI_sideband = sum of weights for (TI AND sideband) |
|
|
| Scale factors (if Y_NTI_sideband > 0): |
| SF1 = Y_TI_sideband / Y_NTI_sideband |
| SF2 = Y_NTI_signal / Y_NTI_sideband |
|
|
| Expected yield: |
| Y_expected = SF1 * SF2 * Y_NTI_sideband |
|
|
| Keep ONLY NTI sideband events. |
| Rescale their weights: column_32 = column_32 * (Y_expected / Y_NTI_sideband) |
|
|
| Save to {BASE_DIR}/arrays/bkgd.npy |
|
|
| ==================== |
| IMPLEMENTATION NOTES |
| ==================== |
|
|
| - Import ROOT at the start; raise clear error if unavailable |
| - Use explicit Python loops for TLorentzVector (no vectorization) |
| - Guard all divisions (check denominator != 0) |
| - Preserve all 46 columns in output files |
| - Use exact equality (==) for tight ID, not approximate checks |
|
|
|
|