Commit ·
e1f04e9
1
Parent(s): 0f17633
updated readme
Browse files- root_gnn_dgl/README.md +21 -1
root_gnn_dgl/README.md
CHANGED
|
@@ -50,7 +50,6 @@ The entire demo can be ran with the command
|
|
| 50 |
source run_demo.sh
|
| 51 |
```
|
| 52 |
|
| 53 |
-
|
| 54 |
## Data Preparation
|
| 55 |
The first step in the process is to convert the events stored in ROOT files into DGL graph objects. This conversion is handled automatically by the Dataset objects during their creation, provided the graph data has not already been saved to disk. To accomplish this, a simple script is used to initialize the relevant Dataset object and then exit. This script needs to be executed for each data chunk in each dataset being used for training.
|
| 56 |
|
|
@@ -71,3 +70,24 @@ done
|
|
| 71 |
The `--shuffle_mode` flag performs shuffling and pre-batches the graphs in each chunk, since holding the entire dataset in memory and shuffling it together can be prohibitive for large datasets.
|
| 72 |
|
| 73 |
## Training
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
source run_demo.sh
|
| 51 |
```
|
| 52 |
|
|
|
|
| 53 |
## Data Preparation
|
| 54 |
The first step in the process is to convert the events stored in ROOT files into DGL graph objects. This conversion is handled automatically by the Dataset objects during their creation, provided the graph data has not already been saved to disk. To accomplish this, a simple script is used to initialize the relevant Dataset object and then exit. This script needs to be executed for each data chunk in each dataset being used for training.
|
| 55 |
|
|
|
|
| 70 |
The `--shuffle_mode` flag performs shuffling and pre-batches the graphs in each chunk, since holding the entire dataset in memory and shuffling it together can be prohibitive for large datasets.
|
| 71 |
|
| 72 |
## Training
|
| 73 |
+
Training is run by `scripts/training_script`. `--preshuffle` tells it to use the preshuffled and batched graphs rather than shuffling and batching on the fly, and `--restart` can be used to force the training to start from the beginning rather than from the last available checkpoint.
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
python scripts/training_script.py --config configs/demo/pretraining_multiclass.yaml --preshuffle --nocompile --lazy
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
This step should produce the training directory `trainings/demo/pretraining_multiclass/` containing a copy of the config file, checkpoints (`model_epoch_*.pt`) with the model weights after each epoch of training, npz files with the GNN outputs for each event after each epoch of training, and two files `training.log` and `training.png` which summarize the model performance and convergence.
|
| 80 |
+
|
| 81 |
+
## Inference
|
| 82 |
+
Inference is done by `scripts/inference.py`. This script applies the model defined by `--config` onto the samples located at `--target`. A new set of samples with the GNN scores saved as the `--branch` in the ntuples will be created at `--destination`. The `--chunks` arguement will handel the inference in specified chunks.
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
python scripts/inference.py \
|
| 86 |
+
--target "/global/cfs/projectdirs/atlas/joshua/root_gnn/root_gnn_dgl/data/ntuples/Hyy_pretraining/multilabel_10K/ttH_NLO.root" \
|
| 87 |
+
--destination "/global/cfs/projectdirs/atlas/joshua/GNN4Colliders/root_gnn_dgl/scores/ttH_NLO.root" \
|
| 88 |
+
--config "configs/demo/finetuning_ttH_CP_Even_vs_Odd.yaml" \
|
| 89 |
+
--chunks 1 \
|
| 90 |
+
--chunkno 0 \
|
| 91 |
+
--write \
|
| 92 |
+
--branch 'GNN_Score'
|
| 93 |
+
```
|