initial commit

cfcbbc8 3 months ago

3.93 kB

	Your task is to write a Python script that reads each ROOT file listed in {BASE_DIR}/solution/arrays/file_list.txt using uproot. For each file, extract the specified observables and store them in a NumPy array.

	The naming of the output NumPy file should follow these rules:
	- If the input ROOT file listed in file_list.txt contains "data_A.GamGam.root", name the output file: {BASE_DIR}/arrays/data_A_raw.npy
	- If the input ROOT file listed in file_list.txt contains "mc_345318.WpH125J_Wincl_gamgam.GamGam.root", name the output file: {BASE_DIR}/arrays/signal_WH_raw.npy
	- For other files, do not process or generate any output.

	Refer to the ROOT file summary provided below to identify the correct tree and branch names. Be precise — instruct the worker exactly which trees and branches to extract.

	Note: Some branches (for example, photon, lepton, and jet observables) are arrays containing multiple entries per event, ordered by descending pT.
	Important: Do not loop over events. Use uproot to load entire branches at once for efficient processing.

	For each event, you should save
	- pT, eta, phi of each of the two photons
	- pT, eta, phi of the two leptons in the event with the highest pT
	- pT, eta, phi of the six jets in the event with the highest pT
	- pT and phi of the MET
	- Event weight (just MC weight, not multiplied by any extra scale factors)
	- Flag for each photon indicating whether tight ID requirements are satisfied
	- Cross section
	- Sum of weights in ROOT file
	- Scale factors for photon, electron,muon, btagging, pileup, electron trigger, photon trigger.

	The indices should be as follows (note that these names may not correspond to the branch names in the ROOT files):
	0: leading photon pt
	1: leading photon eta
	2: leading photon phi
	3: subleading photon pt
	4: subleading photon eta
	5: subleading photon phi
	6: leading lepton pt
	7: leading lepton eta
	8: leading lepton phi
	9: subleading lepton pT
	10: subleading lepton eta
	11: subleading lepton phi
	12: jet 1 pT
	13: jet 1 eta
	14: jet 1 phi
	15: jet 2 pT
	16: jet 2 eta
	17: jet 2 phi
	18: jet 3 pT
	19: jet 3 eta
	20: jet 3 phi
	21: jet 4 pT
	22: jet 4 eta
	23: jet 4 phi
	24: jet 5 pT
	25: jet 5 eta
	26: jet 5 phi
	27: jet 6 pT
	28: jet 6 eta
	29: jet 6 phi
	30: met ET
	31: met phi
	32: MC weight
	33: sum of weights
	34: cross section
	35: tight ID of leading photon
	36: tight ID of subleading photon
	37: scaleFactor_PILEUP
	38: scaleFactor_PHOTON
	39: scaleFactor_PhotonTRIGGER
	40: scaleFactor_ELE
	41: scaleFactor_MUON
	42: scaleFactor_LepTRIGGER
	43: scaleFactor_BTAG
	44: NaN
	45: NaN

	Fill indices 44 and 45 (last indices of the column) with NaN values to serve as placeholders for the diphoton invariant mass and transverse momentum, which will be computed later.

	# Implementation Details (required for correct column mapping)
	- Use TTree named "mini" and load branches via `uproot.open(...)["mini"].arrays()` or `uproot.lazy()`.
	- Branch-to-column mapping:
	* Columns 0–2: `photon_pt[0]`, `photon_eta[0]`, `photon_phi[0]`
	* Columns 3–5: `photon_pt[1]`, `photon_eta[1]`, `photon_phi[1]`
	* Columns 6–8: `lep_pt[0]`, `lep_eta[0]`, `lep_phi[0]`
	* Columns 9–11: `lep_pt[1]`, `lep_eta[1]`, `lep_phi[1]`
	* Columns 12–14: `jet_pt[0]`, `jet_eta[0]`, `jet_phi[0]` (and so on through index 29 for jets 0–5)
	* Column 30: `met_et`
	* Column 31: `met_phi`
	* Column 32: `mcWeight`
	* Column 33: `SumWeights`
	* Column 34: `XSection`
	* Column 35: `photon_isTightID[0]`
	* Column 36: `photon_isTightID[1]`
	* Columns 37–43: scale factors in the order `[scaleFactor_PILEUP, scaleFactor_PHOTON, scaleFactor_PhotonTRIGGER, scaleFactor_ELE, scaleFactor_MUON, scaleFactor_LepTRIGGER, scaleFactor_BTAG]`
	- Jagged arrays (photons, leptons, jets) must be padded to length 2 or 6 with `np.nan`.
	- After saving, print file path, array shape, dtype, and per-column NaN counts.