|
|
=================================== |
|
|
Model Implementation Details |
|
|
=================================== |
|
|
|
|
|
Basic structure |
|
|
=================================== |
|
|
|
|
|
In the model folder, `torch_basemodel` / `tf_basemodel` implements functionalities of computing loglikelihood and sampling procedures that are common |
|
|
to all the TPP models. In the inherited class, models with specific structures are defined, explained in below sections. |
|
|
|
|
|
|
|
|
Computing the loglikelihood of non-pad event sequence |
|
|
------------------------------------------------------ |
|
|
|
|
|
The loglikelihood computation, following the definition in Equation 8 of `The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process <https: |
|
|
|
|
|
it takes `time_delta_seqs`, `lambda_at_event`, `lambdas_loss_samples`, `seq_mask`, |
|
|
`lambda_type_mask` as the input and output the loglikelihood items, please see `torch_basemodel` / `tf_basemodel` |
|
|
for details. |
|
|
|
|
|
It is noted that: |
|
|
|
|
|
1. Sequential prediction: because we performance sequential prediction, i.e., predict next one given previous, we do not consider the last one as it has no labels. To implement the `forward` function, we take input of `time_seqs[:, :-1]` |
|
|
and `type_seqs[:, :-1]`. For `time_delta_seqs` it is different; please see the next point. |
|
|
|
|
|
|
|
|
|
|
|
2. Continuous-time evolution: recall the definition in [dataset](./dataset.rst), assume we have a sequence of 4 events and 1 pad event |
|
|
at the end, i.e., |
|
|
|
|
|
.. code-block:: bash |
|
|
|
|
|
index: 0, 1, 2, 3, 4 |
|
|
dtimes: 0, t_1-t_0, t_2-t_1, t_3-t_2, pad |
|
|
types: e_0, e_1, e_2, e_3, pad |
|
|
non_pad_mask: True, True, True, True, False |
|
|
|
|
|
For the i-th event, i-th dtime denotes the time evolution (e.g., decay in NHP) to the current event and |
|
|
(i+1)-th dtime denotes the time evolution to the next event. To compute the non-event loglikelihood, |
|
|
we should consider the time evolution after the event happens. Therefore we should use `type_delta_seqs[:, 1:]` with masks specified in the below step. |
|
|
|
|
|
3. Masking: suppose we have predictions of 0,1,2,3-th event and their labels are 1,2,3,4-th events |
|
|
where $4$-th event needed to be masked. So we should set the sequence mask as `True, True, True, False`, i.e., `seq_mask=batch_non_pad_mask[:, 1:]`. |
|
|
The same logic applies to the attention mask and event type mask. |
|
|
|
|
|
Therefore the following code is a typical example of calling the loglikelihood computation: |
|
|
|
|
|
|
|
|
.. code-block:: python |
|
|
|
|
|
event_ll, non_event_ll, num_events = self.compute_loglikelihood(lambda_at_event=lambda_at_event, # seq_len = max_len - 1 |
|
|
lambdas_loss_samples=lambda_t_sample, # seq_len = max_len - 1 |
|
|
time_delta_seq=time_delta_seq[:, 1:], |
|
|
seq_mask=batch_non_pad_mask[:, 1:], |
|
|
lambda_type_mask=type_mask[:, 1:]) |
|
|
|
|
|
|
|
|
|
|
|
Computing the integral inside the loglikelihood |
|
|
----------------------------------------------- |
|
|
|
|
|
|
|
|
The loglikelihood of the parameters is the sum of the log-intensities of the events that happened, at the times they happened, |
|
|
minus an integral of the total intensities over the observation interval over [0,T]: |
|
|
|
|
|
.. math:: |
|
|
|
|
|
\sum_\log \lambda_(t_i) - \int_0^T \lambda(t) dt |
|
|
|
|
|
The first term refers to event loglikelihood and the second term (including the negative sign) refers to the non-event loglikelihood. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Neural Hawkes Process (NHP) |
|
|
=================================== |
|
|
|
|
|
We implement NHP based on author's official pytorch code `Github:nce-mpp <https://github.com/hongyuanmei/nce-mpp/blob/main/ncempp/models/nhp.py>`_. |
|
|
|
|
|
1. A continuous-time LSTM is introduced, with the code mainly come from `Github:nce-mpp <https://github.com/hongyuanmei/nce-mpp/blob/main/ncempp/models/nhp.py>`_. |
|
|
2. A `forward` function in NHP class that recursively update the states: we compute the event embedding, pass to the LSTM cell and then decay afterwards. Noted that for i-th event, we should use (i+1)-th dt for the decay. So we do not consider the last event as it has no decay time. |
|
|
|
|
|
Attentive Neural Hawkes Process (AttNHP) |
|
|
======================================== |
|
|
|
|
|
|
|
|
We implement AttNHP based on the authors' official pytorch code `Github:anhp-andtt <https: |
|
|
and similar to NHP, we factorize it into based model and inherited model. |
|
|
|
|
|
The forward functions is implemented faithfully to that of the author's repo. |
|
|
|
|
|
|
|
|
Transformer Hawkes Process (THP) |
|
|
======================================== |
|
|
|
|
|
We implement THP based on a fixed version of pytorch code `Github:anhp-andtt/thp <https://github.com/yangalan123/anhp-andtt/tree/master/thp>`_ |
|
|
and we factorize it into based model and inherited model. |
|
|
|
|
|
|
|
|
Self-Attentive Hawkes Process (SAHP) |
|
|
======================================== |
|
|
|
|
|
We implement SAHP based on a fixed version of pytorch code `Github:anhp-andtt/sahp <https://github.com/yangalan123/anhp-andtt/tree/master/sahp>`_ |
|
|
and we factorize it into based model and inherited model. |
|
|
|
|
|
`SAHP` basically shares very similar structure to that of `THP`. |
|
|
|
|
|
|
|
|
|
|
|
Recurrent Marked Temporal Point Processes (RMTPP) |
|
|
==================================================== |
|
|
|
|
|
We implement RMTPP faithfully to the author's paper. |
|
|
|
|
|
|
|
|
Intensity Free Learning of Temporal Point Process (IntensityFree) |
|
|
================================================================== |
|
|
|
|
|
We implement the model based on the author's torch code `Github:ifl-tpp <https://github.com/shchur/ifl-tpp>`_. |
|
|
|
|
|
A small difference between our implementation and the author's is we ignore the `context_init` (the initial state of the RNN) because in our data setup, we do not need a learnable initial RNN state. This modification generally makes little impact on the learning process. |
|
|
|
|
|
It is worth noting that the thinning algorithm can not be applied to this model because it is intensity-free. When comparing the performance of the model, we only look at its log-likelihood learning curve. |
|
|
|
|
|
|
|
|
Fully Neural Network based Model for General Temporal Point Processes (FullyNN) |
|
|
=============================================================================== |
|
|
|
|
|
We implement the model based on the author's keras code `Github:NeuralNetworkPointProcess <https://github.com/omitakahiro/NeuralNetworkPointProcess>`_. |
|
|
|
|
|
|
|
|
ODE-based Temporal Point Process (ODETPP) |
|
|
========================================= |
|
|
|
|
|
We implement a TPP with Neural ODE state evolution, which is a simplified version of `Neural Spatio-Temporal Point Processes <https://arxiv.org/abs/2011.04583>`_. The ODE implementation uses the code from the `blog <https://msurtsukov.github.io/Neural-ODE/>`_ |
|
|
|
|
|
|
|
|
Attentive Neural Hawkes Network (ANHN) |
|
|
====================================== |
|
|
|
|
|
We implement the model based on the author's paper: the attentive model without the graph regularizer is named ANHN. |
|
|
|