LLM4HEP / prompts /create_numpy.txt
ho22joshua's picture
initial commit
cfcbbc8
Your task is to write a Python script that reads each ROOT file listed in {BASE_DIR}/solution/arrays/file_list.txt using uproot. For each file, extract the specified observables and store them in a NumPy array.
The naming of the output NumPy file should follow these rules:
- If the input ROOT file listed in file_list.txt contains "data_A.GamGam.root", name the output file: {BASE_DIR}/arrays/data_A_raw.npy
- If the input ROOT file listed in file_list.txt contains "mc_345318.WpH125J_Wincl_gamgam.GamGam.root", name the output file: {BASE_DIR}/arrays/signal_WH_raw.npy
- For other files, do not process or generate any output.
Refer to the ROOT file summary provided below to identify the correct tree and branch names. Be precise β€” instruct the worker exactly which trees and branches to extract.
Note: Some branches (for example, photon, lepton, and jet observables) are arrays containing multiple entries per event, ordered by descending pT.
Important: Do not loop over events. Use uproot to load entire branches at once for efficient processing.
For each event, you should save
- pT, eta, phi of each of the two photons
- pT, eta, phi of the two leptons in the event with the highest pT
- pT, eta, phi of the six jets in the event with the highest pT
- pT and phi of the MET
- Event weight (just MC weight, not multiplied by any extra scale factors)
- Flag for each photon indicating whether tight ID requirements are satisfied
- Cross section
- Sum of weights in ROOT file
- Scale factors for photon, electron,muon, btagging, pileup, electron trigger, photon trigger.
The indices should be as follows (note that these names may not correspond to the branch names in the ROOT files):
0: leading photon pt
1: leading photon eta
2: leading photon phi
3: subleading photon pt
4: subleading photon eta
5: subleading photon phi
6: leading lepton pt
7: leading lepton eta
8: leading lepton phi
9: subleading lepton pT
10: subleading lepton eta
11: subleading lepton phi
12: jet 1 pT
13: jet 1 eta
14: jet 1 phi
15: jet 2 pT
16: jet 2 eta
17: jet 2 phi
18: jet 3 pT
19: jet 3 eta
20: jet 3 phi
21: jet 4 pT
22: jet 4 eta
23: jet 4 phi
24: jet 5 pT
25: jet 5 eta
26: jet 5 phi
27: jet 6 pT
28: jet 6 eta
29: jet 6 phi
30: met ET
31: met phi
32: MC weight
33: sum of weights
34: cross section
35: tight ID of leading photon
36: tight ID of subleading photon
37: scaleFactor_PILEUP
38: scaleFactor_PHOTON
39: scaleFactor_PhotonTRIGGER
40: scaleFactor_ELE
41: scaleFactor_MUON
42: scaleFactor_LepTRIGGER
43: scaleFactor_BTAG
44: NaN
45: NaN
Fill indices 44 and 45 (last indices of the column) with NaN values to serve as placeholders for the diphoton invariant mass and transverse momentum, which will be computed later.
# Implementation Details (required for correct column mapping)
- Use TTree named "mini" and load branches via `uproot.open(...)["mini"].arrays()` or `uproot.lazy()`.
- Branch-to-column mapping:
* Columns 0–2: `photon_pt[0]`, `photon_eta[0]`, `photon_phi[0]`
* Columns 3–5: `photon_pt[1]`, `photon_eta[1]`, `photon_phi[1]`
* Columns 6–8: `lep_pt[0]`, `lep_eta[0]`, `lep_phi[0]`
* Columns 9–11: `lep_pt[1]`, `lep_eta[1]`, `lep_phi[1]`
* Columns 12–14: `jet_pt[0]`, `jet_eta[0]`, `jet_phi[0]` (and so on through index 29 for jets 0–5)
* Column 30: `met_et`
* Column 31: `met_phi`
* Column 32: `mcWeight`
* Column 33: `SumWeights`
* Column 34: `XSection`
* Column 35: `photon_isTightID[0]`
* Column 36: `photon_isTightID[1]`
* Columns 37–43: scale factors in the order `[scaleFactor_PILEUP, scaleFactor_PHOTON, scaleFactor_PhotonTRIGGER, scaleFactor_ELE, scaleFactor_MUON, scaleFactor_LepTRIGGER, scaleFactor_BTAG]`
- Jagged arrays (photons, leptons, jets) must be padded to length 2 or 6 with `np.nan`.
- After saving, print file path, array shape, dtype, and per-column NaN counts.