ho22joshua commited on
Commit
e1f04e9
·
1 Parent(s): 0f17633

updated readme

Browse files
Files changed (1) hide show
  1. root_gnn_dgl/README.md +21 -1
root_gnn_dgl/README.md CHANGED
@@ -50,7 +50,6 @@ The entire demo can be ran with the command
50
  source run_demo.sh
51
  ```
52
 
53
-
54
  ## Data Preparation
55
  The first step in the process is to convert the events stored in ROOT files into DGL graph objects. This conversion is handled automatically by the Dataset objects during their creation, provided the graph data has not already been saved to disk. To accomplish this, a simple script is used to initialize the relevant Dataset object and then exit. This script needs to be executed for each data chunk in each dataset being used for training.
56
 
@@ -71,3 +70,24 @@ done
71
  The `--shuffle_mode` flag performs shuffling and pre-batches the graphs in each chunk, since holding the entire dataset in memory and shuffling it together can be prohibitive for large datasets.
72
 
73
  ## Training
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  source run_demo.sh
51
  ```
52
 
 
53
  ## Data Preparation
54
  The first step in the process is to convert the events stored in ROOT files into DGL graph objects. This conversion is handled automatically by the Dataset objects during their creation, provided the graph data has not already been saved to disk. To accomplish this, a simple script is used to initialize the relevant Dataset object and then exit. This script needs to be executed for each data chunk in each dataset being used for training.
55
 
 
70
  The `--shuffle_mode` flag performs shuffling and pre-batches the graphs in each chunk, since holding the entire dataset in memory and shuffling it together can be prohibitive for large datasets.
71
 
72
  ## Training
73
+ Training is run by `scripts/training_script`. `--preshuffle` tells it to use the preshuffled and batched graphs rather than shuffling and batching on the fly, and `--restart` can be used to force the training to start from the beginning rather than from the last available checkpoint.
74
+
75
+ ```bash
76
+ python scripts/training_script.py --config configs/demo/pretraining_multiclass.yaml --preshuffle --nocompile --lazy
77
+ ```
78
+
79
+ This step should produce the training directory `trainings/demo/pretraining_multiclass/` containing a copy of the config file, checkpoints (`model_epoch_*.pt`) with the model weights after each epoch of training, npz files with the GNN outputs for each event after each epoch of training, and two files `training.log` and `training.png` which summarize the model performance and convergence.
80
+
81
+ ## Inference
82
+ Inference is done by `scripts/inference.py`. This script applies the model defined by `--config` onto the samples located at `--target`. A new set of samples with the GNN scores saved as the `--branch` in the ntuples will be created at `--destination`. The `--chunks` arguement will handel the inference in specified chunks.
83
+
84
+ ```bash
85
+ python scripts/inference.py \
86
+ --target "/global/cfs/projectdirs/atlas/joshua/root_gnn/root_gnn_dgl/data/ntuples/Hyy_pretraining/multilabel_10K/ttH_NLO.root" \
87
+ --destination "/global/cfs/projectdirs/atlas/joshua/GNN4Colliders/root_gnn_dgl/scores/ttH_NLO.root" \
88
+ --config "configs/demo/finetuning_ttH_CP_Even_vs_Odd.yaml" \
89
+ --chunks 1 \
90
+ --chunkno 0 \
91
+ --write \
92
+ --branch 'GNN_Score'
93
+ ```