| Your task is to write a Python script that: |
|
|
| 1. Loads the following two .npy files: |
| - {BASE_DIR}/solution/arrays/data_raw.npy (real data events) |
| - {BASE_DIR}/solution/arrays/signal_raw.npy (MC signal events) |
|
|
| Each file contains a NumPy array of shape (N, 46), where each row corresponds to a physics event and each column represents a feature. Your goal is to preprocess these arrays following the steps below, and save the processed results as: |
|
|
| - signal.npy: selected MC signal events |
| - bkgd.npy: selected and rescaled background events from real data |
|
|
| Save both output files to: {BASE_DIR}/arrays/ |
|
|
| Information on the column indices: |
|
|
| 0: leading photon pT |
| 1: leading photon eta |
| 2: leading photon phi |
| 3: subleading photon pT |
| 4: subleading photon eta |
| 5: subleadingphoton phi |
| 6: leading lepton pT |
| 7: leading lepton eta |
| 8: leading lepton phi |
| 9: subleading lepton pT |
| 10: subleading lepton eta |
| 11: subleading lepton phi |
| 12: jet 1 pT |
| 13: jet 1 eta |
| 14: jet 1 phi |
| 15: jet 2 pT |
| 16: jet 2 eta |
| 17: jet 2 phi |
| 18: jet 3 pT |
| 19: jet 3 eta |
| 20: jet 3 phi |
| 21: jet 4 pT |
| 22: jet 4 eta |
| 23: jet 4 phi |
| 24: jet 5 pT |
| 25: jet 5 eta |
| 26: jet 5 phi |
| 27: jet 6 pT |
| 28: jet 6 eta |
| 29: jet 6 phi |
| 30: MET ET |
| 31: MET phi |
| 32: MC weight |
| 33: sum of weights |
| 34: cross section (XSection) |
| 35: leading photon tight ID? |
| 36: subleading photon tight ID? |
| 37: scaleFactor_PILEUP |
| 38: scaleFactor_PHOTON |
| 39: scaleFactor_PhotonTRIGGER |
| 40: scaleFactor_ELE |
| 41: scaleFactor_MUON |
| 42: scaleFactor_LepTRIGGER |
| 43: scaleFactor_BTAG |
| 44: unused(NaN) (to store diphoton invariant mass) |
| 45: unused(NaN) (to store diphoton transverse momentum) |
|
|
| --- |
|
|
| Step 1: Load and Validate |
|
|
| - Load both .npy files using NumPy. |
| - Verify that each array has exactly 46 columns. Raise an error if not. |
| - Do not drop any columns — preserve the full (N, 46) shape. |
| - Update the following columns in place: |
| - Column 32: final event weight |
| - Column 34: cross section (XSection) - only for ttH process |
| - Column 44: diphoton invariant mass (m_yy) |
| - Column 45: diphoton transverse momentum (pt_yy) |
|
|
| --- |
|
|
| Step 2: MC Signal Weight Update (signal_raw.npy only) |
|
|
| Normalization: |
|
|
| - Use luminosity = 10,000 pb^{-1}. |
| - For each event, compute the normalization factor as: |
| (cross_section * luminosity) / sum_of_weights |
| - The values of cross_section and sum_of_weights are found in columns 34 and 33, respectively. |
| - Important: If the cross-section value is 2.64338632e-06 pb (corresponding to ttH SM Higgs production), replace it with 0.000116 pb (the correct SM Higgs → γγ cross-section). |
| - This correction should be applied only to events where the cross-section matches 2.64338632e-06 pb, and the corrected value should overwrite the original in column 34. |
| - Use the corrected cross-section value when computing normalization. |
|
|
| Scale factors: |
|
|
| - For each event, multiply the following scale factors: |
| - scaleFactor_PILEUP (column 37) |
| - scaleFactor_PHOTON (column 38) |
| - scaleFactor_PhotonTRIGGER (column 39) |
| - scaleFactor_ELE (column 40) |
| - scaleFactor_MUON (column 41) |
| - scaleFactor_LepTRIGGER (column 42) |
| - scaleFactor_BTAG (column 43) |
| - Remove any event where any of these scale factors is exactly zero. |
|
|
| Final weight: |
|
|
| - Compute the final event weight as: |
| final_weight = mcWeight * normalization * (product of all scale factors) |
| - Here, mcWeight is taken from column 32. |
| - Store the computed final weight back into column 32, replacing the original mcWeight. |
|
|
| --- |
|
|
| Step 3: Kinematic Calculations and Preselection (for both MC and data) |
|
|
| - For each event, compute diphoton invariant mass and transverse momentum using ROOT.TLorentzVector (do not use the vector module). |
| - Store the diphoton invariant mass in column 44 (m_yy). |
| - Store the diphoton transverse momentum in column 45 (pt_yy). |
|
|
| Apply the following preselection cuts to both MC and data: |
|
|
| - Photon pseudorapidity (|eta|): |eta| < 1.37 or 1.52 < |eta| < 2.37 (for each photon) |
| - Photon transverse momentum: pt_yy > 25,000 MeV (both photons) |
| - Leading photon: (pt_yy / m_yy) > 0.35 |
| - Subleading photon: (pt_yy / m_yy) > 0.25 |
| - Diphoton invariant mass: 105,000 MeV < m_yy < 160,000 MeV |
|
|
| --- |
|
|
| Step 4a: Final Signal Selection (MC only) |
|
|
| From the preselected MC events: |
|
|
| - Keep only events where both photons pass tight photon ID. |
| - Keep only events within the signal region: 123,000 MeV < m_yy < 127,000 MeV |
|
|
| Save the selected events to: |
|
|
| - {BASE_DIR}/arrays/signal.npy |
|
|
| --- |
|
|
| Step 4b: Background Modeling and Normalization (real data only) |
|
|
| Using preselected data events: |
|
|
| Region definitions: |
|
|
| - Signal region: 123,000 MeV < m_yy < 127,000 MeV |
| - Sideband region: 105,000 MeV < m_yy < 120,000 MeV or 130,000 MeV < m_yy < 160,000 MeV |
|
|
| Photon ID categories: |
|
|
| - TI (tight ID): both photons pass tight photon ID |
| - NTI (non-tight ID): photons fail tight ID but pass loose ID |
|
|
| Steps: |
|
|
| 1. Compute yields (sum of weights) for: |
| - NTI sideband |
| - NTI signal region |
| - TI sideband |
| 2. Calculate scale factors: |
| - SF1 = (TI sideband) / (NTI sideband) |
| - SF2 = (NTI signal region) / (NTI sideband) |
| 3. Estimate expected yield in TI signal region: |
| - expected_yield = SF1 * SF2 * (NTI sideband) |
| 4. Retain only NTI sideband events. |
| 5. Rescale their weights so that the total weight matches expected_yield. |
| 6. Save the result to: |
| - {BASE_DIR}/arrays/bkgd.npy |
|
|
| --- |
|
|
| Final Output Summary: |
|
|
| - signal.npy – MC signal events passing preselection, signal region, and tight ID cuts |
| - bkgd.npy – Real data events (NTI sideband) rescaled to match expected background |