EasyTemporalPointProcess-main / docs /source /user_guide /run_train_pipeline.rst

Upload folder using huggingface_hub

f43af3c verified 1 day ago

8.12 kB

	============================================
	Training a Model & Configuration Explanation
	============================================

	This tutorial shows how one can use ``EasyTPP`` to train the implemented models.

	In principle, firstly we need to initialize a config yaml file, containing all the input configuration to guide the training and eval process. The overall structure of a config file is shown as below:

	.. code-block:: yaml

	pipeline_config_id: .. # name of the config for guiding the pipeline

	data:
	[Dataset ID]: # name of the dataset, e.g, taxi
	....

	[EXPERIMENT ID]: # name of the experiment to run
	base_config:
	....
	model_config:
	...


	After the config file is setup, we can run the script, by specifying the `config directory` and `experiment id`, to start the pipeline. We currently provide a preset script at `examples/train_nhp.py`.


	Step 1: Setup the config file containing data and model configs
	================================================================


	To be specific, one needs to define the following entries in the config file:

	- pipeline_config_id: registered name of EasyTPP.Config objects, such as `runner_config` or `hpo_runner_config`. By reading this, the corresponding configuration class will be loaded for constructing the pipeline.

	.. code-block:: yaml

	pipeline_config_id: runner_config


	- data: dataset specifics. One can put multiple dataset specifics in the config file, but only one will be used in one experiment.

	- [DATASET ID]: name of the dataset, e.g., taxi.
	- train_dir, valid_dir, test_dir: directory of the datafile. For the moment we only accept pkl file (please see `Dataset <./dataset.html>`_ for details)
	- data_spec: define the event type information.

	.. code-block:: yaml

	data:
	taxi:
	data_format: pkl
	train_dir: ../data/taxi/train.pkl
	valid_dir: ../data/taxi/dev.pkl
	test_dir: ../data/taxi/test.pkl
	data_spec：
	num_event_types: 7 # num of types excluding pad events.
	pad_token_id: 6 # event type index for pad events
	padding_side: right # pad at the right end of the sequence
	truncation_side: right # truncate at the right end of the sequence
	max_len: 100 # max sequence length used as model input

	- [EXPERIMENT ID]: name of the experiment to run in the pipeline. It contains two blocks of configs:

	base_config contains the pipeline framework related specifications.

	.. code-block:: yaml

	base_config:
	stage: train # train, eval and generate
	backend: tensorflow # tensorflow and torch
	dataset_id: conttime # name of the dataset
	runner_id: std_tpp # registered name of the pipeline runner
	model_id: RMTPP # model name # registered name of the implemented model
	base_dir: './checkpoints/' # base dir to save the logs and models.



	model_config contains the model related specifications.


	.. code-block:: yaml

	model_config:
	hidden_size: 32
	time_emb_size: 16
	num_layers: 2
	num_heads: 2
	mc_num_sample_per_step: 20
	sharing_param_layer: False
	loss_integral_num_sample_per_step: 20
	dropout: 0.0
	use_ln: False
	thinning_params: # thinning algorithm for event sampling
	num_seq: 10
	num_sample: 1
	num_exp: 500 # number of i.i.d. Exp(intensity_bound) draws at one time in thinning algorithm
	look_ahead_time: 10
	patience_counter: 5 # the maximum iteration used in adaptive thinning
	over_sample_rate: 5
	num_samples_boundary: 5
	dtime_max: 5


	trainer_config contains the training related specifications.

	.. code-block:: yaml

	trainer_config: # trainer arguments
	seed: 2019
	gpu: 0
	batch_size: 256
	max_epoch: 10
	shuffle: False
	optimizer: adam
	learning_rate: 1.e-3
	valid_freq: 1
	use_tfb: False
	metrics: ['acc', 'rmse']




	A complete example of these files can be seen at examples/example_config.


	Step 2: Run the training script
	===============================================

	To run the training process, we simply need to call two functions:

	1. ``Config``: it reads the directory of the configs specified in Step 1 and do some processing to form a complete configuration.
	2. ``Runner``: it reads the configuration and setups the whole pipeline for training, evaluation and generation.


	The following code is an example, which is a copy from examples/train_nhp.py.


	.. code-block:: python

	import argparse
	from easy_tpp.config_factory import Config
	from easy_tpp.runner import Runner


	def main():
	parser = argparse.ArgumentParser()

	parser.add_argument('--config_dir', type=str, required=False, default='configs/experiment_config.yaml',
	help='Dir of configuration yaml to train and evaluate the model.')

	parser.add_argument('--experiment_id', type=str, required=False, default='RMTPP_train',
	help='Experiment id in the config file.')

	args = parser.parse_args()

	config = Config.build_from_yaml_file(args.config_dir, experiment_id=args.experiment_id)

	model_runner = Runner.build_from_config(config)

	model_runner.run()


	if __name__ == '__main__':
	main()





	Checkout the output
	========================


	During training, the log, the best model based on valid set performance, the complete configuration file are all saved. The directory of the saved files is specified in 'base' of ``model_config.yaml``, i.e.,



	In the `./checkpoints/` folder, one find the correct subfolder by concatenating the 'experiment_id' and running timestamps. Inside that subfolder, there is a complete configuration file, e.g., ``NHP_train_output.yaml`` that records all the information used in the pipeline. The

	.. code-block:: yaml

	data_config:
	train_dir: ../data/conttime/train.pkl
	valid_dir: ../data/conttime/dev.pkl
	test_dir: ../data/conttime/test.pkl
	specs:
	num_event_types_pad: 6
	num_event_types: 5
	event_pad_index: 5
	data_format: pkl
	base_config:
	stage: train
	backend: tensorflow
	dataset_id: conttime
	runner_id: std_tpp
	model_id: RMTPP
	base_dir: ./checkpoints/
	exp_id: RMTPP_train
	log_folder: ./checkpoints/98888_4299965824_221205-153425
	saved_model_dir: ./checkpoints/98888_4299965824_221205-153425/models/saved_model
	saved_log_dir: ./checkpoints/98888_4299965824_221205-153425/log
	output_config_dir: ./checkpoints/98888_4299965824_221205-153425/RMTPP_train_output.yaml
	model_config:
	hidden_size: 32
	time_emb_size: 16
	num_layers: 2
	num_heads: 2
	mc_num_sample_per_step: 20
	sharing_param_layer: false
	loss_integral_num_sample_per_step: 20
	dropout: 0.0
	use_ln: false
	seed: 2019
	gpu: 0
	thinning_params:
	num_seq: 10
	num_sample: 1
	num_exp: 500
	look_ahead_time: 10
	patience_counter: 5
	over_sample_rate: 5
	num_samples_boundary: 5
	dtime_max: 5
	num_step_gen: 1
	trainer:
	batch_size: 256
	max_epoch: 10
	shuffle: false
	optimizer: adam
	learning_rate: 0.001
	valid_freq: 1
	use_tfb: false
	metrics:
	- acc
	- rmse
	seq_pad_end: true
	is_training: true
	num_event_types_pad: 6
	num_event_types: 5
	event_pad_index: 5
	model_id: RMTPP



	If we set ``use_tfb`` to ``true``, it means we can launch the tensorboard to track the training process, one
	can see `Running Tensorboard <../advanced/tensorboard.html>`_ for details.