File size: 8,119 Bytes
f43af3c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 |
============================================
Training a Model & Configuration Explanation
============================================
This tutorial shows how one can use ``EasyTPP`` to train the implemented models.
In principle, firstly we need to initialize a config yaml file, containing all the input configuration to guide the training and eval process. The overall structure of a config file is shown as below:
.. code-block:: yaml
pipeline_config_id: .. # name of the config for guiding the pipeline
data:
[Dataset ID]: # name of the dataset, e.g, taxi
....
[EXPERIMENT ID]: # name of the experiment to run
base_config:
....
model_config:
...
After the config file is setup, we can run the script, by specifying the `config directory` and `experiment id`, to start the pipeline. We currently provide a preset script at `examples/train_nhp.py`.
Step 1: Setup the config file containing data and model configs
================================================================
To be specific, one needs to define the following entries in the config file:
- **pipeline_config_id**: registered name of EasyTPP.Config objects, such as `runner_config` or `hpo_runner_config`. By reading this, the corresponding configuration class will be loaded for constructing the pipeline.
.. code-block:: yaml
pipeline_config_id: runner_config
- **data**: dataset specifics. One can put multiple dataset specifics in the config file, but only one will be used in one experiment.
- *[DATASET ID]*: name of the dataset, e.g., taxi.
- *train_dir, valid_dir, test_dir*: directory of the datafile. For the moment we only accept pkl file (please see `Dataset <./dataset.html>`_ for details)
- *data_spec*: define the event type information.
.. code-block:: yaml
data:
taxi:
data_format: pkl
train_dir: ../data/taxi/train.pkl
valid_dir: ../data/taxi/dev.pkl
test_dir: ../data/taxi/test.pkl
data_spec:
num_event_types: 7 # num of types excluding pad events.
pad_token_id: 6 # event type index for pad events
padding_side: right # pad at the right end of the sequence
truncation_side: right # truncate at the right end of the sequence
max_len: 100 # max sequence length used as model input
- **[EXPERIMENT ID]**: name of the experiment to run in the pipeline. It contains two blocks of configs:
*base_config* contains the pipeline framework related specifications.
.. code-block:: yaml
base_config:
stage: train # train, eval and generate
backend: tensorflow # tensorflow and torch
dataset_id: conttime # name of the dataset
runner_id: std_tpp # registered name of the pipeline runner
model_id: RMTPP # model name # registered name of the implemented model
base_dir: './checkpoints/' # base dir to save the logs and models.
*model_config* contains the model related specifications.
.. code-block:: yaml
model_config:
hidden_size: 32
time_emb_size: 16
num_layers: 2
num_heads: 2
mc_num_sample_per_step: 20
sharing_param_layer: False
loss_integral_num_sample_per_step: 20
dropout: 0.0
use_ln: False
thinning_params: # thinning algorithm for event sampling
num_seq: 10
num_sample: 1
num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
look_ahead_time: 10
patience_counter: 5 # the maximum iteration used in adaptive thinning
over_sample_rate: 5
num_samples_boundary: 5
dtime_max: 5
*trainer_config* contains the training related specifications.
.. code-block:: yaml
trainer_config: # trainer arguments
seed: 2019
gpu: 0
batch_size: 256
max_epoch: 10
shuffle: False
optimizer: adam
learning_rate: 1.e-3
valid_freq: 1
use_tfb: False
metrics: ['acc', 'rmse']
A complete example of these files can be seen at *examples/example_config*.
Step 2: Run the training script
===============================================
To run the training process, we simply need to call two functions:
1. ``Config``: it reads the directory of the configs specified in Step 1 and do some processing to form a complete configuration.
2. ``Runner``: it reads the configuration and setups the whole pipeline for training, evaluation and generation.
The following code is an example, which is a copy from *examples/train_nhp.py*.
.. code-block:: python
import argparse
from easy_tpp.config_factory import Config
from easy_tpp.runner import Runner
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--config_dir', type=str, required=False, default='configs/experiment_config.yaml',
help='Dir of configuration yaml to train and evaluate the model.')
parser.add_argument('--experiment_id', type=str, required=False, default='RMTPP_train',
help='Experiment id in the config file.')
args = parser.parse_args()
config = Config.build_from_yaml_file(args.config_dir, experiment_id=args.experiment_id)
model_runner = Runner.build_from_config(config)
model_runner.run()
if __name__ == '__main__':
main()
Checkout the output
========================
During training, the log, the best model based on valid set performance, the complete configuration file are all saved. The directory of the saved files is specified in 'base' of ``model_config.yaml``, i.e.,
In the `./checkpoints/` folder, one find the correct subfolder by concatenating the 'experiment_id' and running timestamps. Inside that subfolder, there is a complete configuration file, e.g., ``NHP_train_output.yaml`` that records all the information used in the pipeline. The
.. code-block:: yaml
data_config:
train_dir: ../data/conttime/train.pkl
valid_dir: ../data/conttime/dev.pkl
test_dir: ../data/conttime/test.pkl
specs:
num_event_types_pad: 6
num_event_types: 5
event_pad_index: 5
data_format: pkl
base_config:
stage: train
backend: tensorflow
dataset_id: conttime
runner_id: std_tpp
model_id: RMTPP
base_dir: ./checkpoints/
exp_id: RMTPP_train
log_folder: ./checkpoints/98888_4299965824_221205-153425
saved_model_dir: ./checkpoints/98888_4299965824_221205-153425/models/saved_model
saved_log_dir: ./checkpoints/98888_4299965824_221205-153425/log
output_config_dir: ./checkpoints/98888_4299965824_221205-153425/RMTPP_train_output.yaml
model_config:
hidden_size: 32
time_emb_size: 16
num_layers: 2
num_heads: 2
mc_num_sample_per_step: 20
sharing_param_layer: false
loss_integral_num_sample_per_step: 20
dropout: 0.0
use_ln: false
seed: 2019
gpu: 0
thinning_params:
num_seq: 10
num_sample: 1
num_exp: 500
look_ahead_time: 10
patience_counter: 5
over_sample_rate: 5
num_samples_boundary: 5
dtime_max: 5
num_step_gen: 1
trainer:
batch_size: 256
max_epoch: 10
shuffle: false
optimizer: adam
learning_rate: 0.001
valid_freq: 1
use_tfb: false
metrics:
- acc
- rmse
seq_pad_end: true
is_training: true
num_event_types_pad: 6
num_event_types: 5
event_pad_index: 5
model_id: RMTPP
If we set ``use_tfb`` to ``true``, it means we can launch the tensorboard to track the training process, one
can see `Running Tensorboard <../advanced/tensorboard.html>`_ for details. |