ConvLab
/

lava-policy-multiwoz20

dialogue policy

task-oriented dialog

Model card Files Files and versions

nflubis commited on Dec 1, 2022

Commit

64a1ea2

·

1 Parent(s): af9a20c

Update README.md

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
 ---
 license: apache-2.0
 ---

 ---
+language:
+- en
 license: apache-2.0
+tags:
+- dialogue policy
+- task-oriented dialog
 ---
+# lava-policy-multiwoz
+This is the best performing LAVA_kl model from the [LAVA paper](https://aclanthology.org/2020.coling-main.41/) which can be used as a word-level policy module in ConvLab3 pipeline.
+Refer to [ConvLab-3](https://github.com/ConvLab/ConvLab-3) for model description and usage.
+## Training procedure
+The model was trained on MultiWOZ 2.0 data using the [LAVA codebase](https://gitlab.cs.uni-duesseldorf.de/general/dsml/lava-public). The model started with VAE pre-training and fine-tuning with informative prior KL loss, followed by corpus-based RL with REINFORCE.
+### Training hyperparameters
+The following hyperparameters were used during SL training:
+- y_size: 10
+- k_size: 20
+- beta: 0.1
+- simple_posterior: true
+- contextual_posterior: false
+- learning_rate: 1e-03
+- max_vocab_size: 1000
+- max_utt_len: 50
+- max_dec_len: 30
+- backward_size: 2
+- train_batch_size: 128
+- seed: 58
+- optimizer: Adam
+- num_epoch: 100 with early stopping based on validation set
+The following hyperparameters were used during RL training:
+- tune_pi_only: false
+- max_words: 100
+- temperature: 1.0
+- episode_repeat: 1.0
+- rl_lr: 0.01
+- momentum: 0.0
+- nesterov: false
+- gamma: 0.99
+- rl_clip: 5.0
+- random_seed: 38