| Your task is to write a Python script that: |
|
|
| 1. Loads the following two .npy files: |
| - {BASE_DIR}/solution/arrays/Apply the following preselection cuts to both MC and data: |
|
|
| - Photon pseudorapidity (|eta|): |eta| < 1.37 or 1.52 < |eta| < 2.37 (for each photon) |
| - Photon transverse momentum: pt_yy > 25,000 MeV (both photons) |
| - Leading photon: (pt_lead / m_yy) > 0.35, where pt_lead is column 0 (the leading photon pT is always stored in column 0) |
| - Subleading photon: (pt_sub / m_yy) > 0.25, where pt_sub is column 3 (the subleading photon pT is always stored in column 3) |
| - Diphoton invariant mass: 105,000 MeV < m_yy < 160,000 MeV |
| - Use the safe denominator defined above for all pT/m_yy ratios so that no division by zero occurs and any event with m_yy ≤ 1e-6 (effectively zero or negative) automatically fails the ratio requirements. |
| - IMPORTANT: Do NOT dynamically determine which photon is leading/subleading using np.maximum or np.minimum. The input arrays are pre-ordered so column 0 is always the leading photon and column 3 is always the subleading photon..npy (real data events) |
| - {BASE_DIR}/solution/arrays/signal_raw.npy (MC signal events) |
|
|
| Each file contains a NumPy array of shape (N, 46), where each row corresponds to a physics event and each column represents a feature. Your goal is to preprocess these arrays following the steps below, and save the processed results as: |
|
|
| - signal.npy: selected MC signal events |
| - bkgd.npy: selected and rescaled background events from real data |
|
|
| Save both output files to: {BASE_DIR}/arrays/ |
|
|
| Information on the column indices: |
|
|
| 0: leading photon pT |
| 1: leading photon eta |
| 2: leading photon phi |
| 3: subleading photon pT |
| 4: subleading photon eta |
| 5: subleadingphoton phi |
| 6: leading lepton pT |
| 7: leading lepton eta |
| 8: leading lepton phi |
| 9: subleading lepton pT |
| 10: subleading lepton eta |
| 11: subleading lepton phi |
| 12: jet 1 pT |
| 13: jet 1 eta |
| 14: jet 1 phi |
| 15: jet 2 pT |
| 16: jet 2 eta |
| 17: jet 2 phi |
| 18: jet 3 pT |
| 19: jet 3 eta |
| 20: jet 3 phi |
| 21: jet 4 pT |
| 22: jet 4 eta |
| 23: jet 4 phi |
| 24: jet 5 pT |
| 25: jet 5 eta |
| 26: jet 5 phi |
| 27: jet 6 pT |
| 28: jet 6 eta |
| 29: jet 6 phi |
| 30: MET ET |
| 31: MET phi |
| 32: MC weight |
| 33: sum of weights |
| 34: cross section (XSection) |
| 35: leading photon tight ID? |
| 36: subleading photon tight ID? |
| 37: scaleFactor_PILEUP |
| 38: scaleFactor_PHOTON |
| 39: scaleFactor_PhotonTRIGGER |
| 40: scaleFactor_ELE |
| 41: scaleFactor_MUON |
| 42: scaleFactor_LepTRIGGER |
| 43: scaleFactor_BTAG |
| 44: unused(NaN) (to store diphoton invariant mass) |
| 45: unused(NaN) (to store diphoton transverse momentum) |
|
|
| --- |
|
|
| Step 1: Load and Validate |
|
|
| - Load both .npy files using NumPy. |
| - Verify that each array has exactly 46 columns. Raise an error if not. |
| - Do not drop any columns — preserve the full (N, 46) shape. |
| - Update the following columns in place: |
| - Column 32: final event weight |
| - Column 34: cross section (XSection) - only for ttH process |
| - Column 44: diphoton invariant mass (m_yy) |
| - Column 45: diphoton transverse momentum (pt_yy) |
|
|
| --- |
|
|
| Step 2: MC Signal Weight Update (signal_raw.npy only) |
|
|
| Normalization: |
|
|
| - Use luminosity = 10,000 pb^{-1}. |
| - For each event (row-by-row), compute the normalization factor as: |
| (cross_section * luminosity) / sum_of_weights |
| - The normalization factor is event-specific. Do not compute a single global value; apply the formula independently for every row. |
| - The values of cross_section and sum_of_weights are found in columns 34 and 33, respectively. |
| - Important: If the cross-section value is np.abs(XSection - 2.64338632e-06) < 1e-10 (corresponding to ttH SM Higgs production), replace it with 0.000116 pb (the correct SM Higgs -> γγ cross-section) in column 34. |
| - Use the corrected cross-section value when computing normalization. |
|
|
| Scale factors: |
|
|
| - For each event, multiply the following scale factors: |
| - scaleFactor_PILEUP (column 37) |
| - scaleFactor_PHOTON (column 38) |
| - scaleFactor_PhotonTRIGGER (column 39) |
| - scaleFactor_ELE (column 40) |
| - scaleFactor_MUON (column 41) |
| - scaleFactor_LepTRIGGER (column 42) |
| - scaleFactor_BTAG (column 43) |
|
|
| Final weight: |
|
|
| - Compute the final event weight as: |
| final_weight = mcWeight * normalization * (product of all scale factors) |
| - Here, mcWeight is taken from column 32. |
| - Store the computed final weight back into column 32, replacing the original mcWeight. |
|
|
| --- |
|
|
| Step 3: Kinematic Calculations and Preselection (for both MC and data) |
|
|
| - For each event, compute diphoton invariant mass and transverse momentum using ROOT.TLorentzVector (do not use the vector module). |
| - Store the diphoton invariant mass in column 44 (m_yy). |
| - Store the diphoton transverse momentum in column 45 (pt_yy). |
| - When computing ratios that involve m_yy, create a safe denominator first. For example, define `m_yy_safe = np.where(m_yy <= 0, 1e-6, m_yy)` and use `m_yy_safe` in every division. Events that would have m_yy <= 0 must fail the subsequent ratio cuts. |
|
|
| Apply the following preselection cuts to both MC and data: |
|
|
| - Photon pseudorapidity (|eta|): |eta| < 1.37 or 1.52 < |eta| < 2.37 (for each photon) |
| - Photon transverse momentum: pt_yy > 25,000 MeV (both photons) |
| - Leading photon: (pt_yy / m_yy) > 0.35 |
| - Subleading photon: (pt_yy / m_yy) > 0.25 |
| - Diphoton invariant mass: 105,000 MeV < m_yy < 160,000 MeV |
| - Use the safe denominator defined above for all pT/m_yy ratios so that no division by zero occurs and any event with m_yy <= 1e-6 (effectively zero or negative) automatically fails the ratio requirements. |
|
|
| - After computing the diphoton variables, set all data event weights (column 32) to 1.0 before background modeling. |
|
|
| --- |
|
|
| Step 4a: Final Signal Selection (MC only) |
|
|
| From the preselected MC events: |
|
|
| - Before applying photon-ID cuts, build boolean masks for columns 35 and 36 using exact equality: `tight = (column == 1.0)`. Only values exactly equal to 1.0 pass tight ID; treat everything else (including values like 0.0, 0.5, NaNs) as `False`. |
| - Keep only events where both photons pass tight photon ID (both boolean flags must be True). |
| - Keep only events within the signal region: 123,000 MeV < m_yy < 127,000 MeV |
|
|
| Save the selected events to: |
|
|
| - {BASE_DIR}/arrays/signal.npy |
|
|
| --- |
|
|
| Step 4b: Background Modeling and Normalization (real data only) |
|
|
| Using preselected data events: |
|
|
| Region definitions: |
|
|
| - Signal region: 123,000 MeV < m_yy < 127,000 MeV |
| - Sideband region: 105,000 MeV < m_yy < 120,000 MeV or 130,000 MeV < m_yy < 160,000 MeV |
|
|
| Photon ID categories: |
|
|
| - TI (tight ID): both photons pass tight photon ID (use the boolean masks built with `(column == 1.0)`) |
| - NTI (non-tight ID): photons fail tight ID but pass loose ID |
|
|
| Steps: |
|
|
| 1. Compute yields (sum of weights) for: |
| - NTI sideband |
| - NTI signal region |
| - TI sideband |
| 2. Calculate scale factors: |
| - SF1 = (TI sideband) / (NTI sideband) |
| - SF2 = (NTI signal region) / (NTI sideband) |
| 3. Estimate expected yield in TI signal region: |
| - expected_yield = SF1 * SF2 * (NTI sideband) |
| 4. Retain only NTI sideband events. |
| 5. Rescale their weights so that the total weight matches expected_yield. |
| 6. Save the result to: |
| - {BASE_DIR}/arrays/bkgd.npy |
|
|
| --- |
|
|
| Final Output Summary: |
|
|
| - signal.npy – MC signal events passing preselection, signal region, and tight ID cuts |
| - bkgd.npy – Real data events (NTI sideband) rescaled to match expected background |