TUHs's picture
Upload 207 files
29b9c56
Notes on Speed
==============
Under tests/, the `profile_xfms`
script tests the speed of several layers of the DTCWT for working on a moderately sized input
:math:`X \in \mathbb{R}^{10 \times 10 \times 128 \times 128}`. As a reference, an 11 by 11
convolution takes 2.53ms for a tensor of this size.
A single layer DTCWT using the 'near_sym_a' filters (lengths 5 and 7) has 6 convolutional calls. I timed them at 238us
each for a total of 1.43ms. Unfortunately, there is also a bit of overhead in calculating the DTCWT, and not all non
convolutional operations are free. In addition to the 6 convolutions, there were:
- 6 move ops @ 119us = 714us
- 10 pointwise add ops @ 122us = 465us
- 12 copy ops @ 35us = 381us
- 6 different add ops @ 38us = 232us
- 6 subtraction ops @ 37us = 220us
- 3 constant division ops @ 57us = 173us
- 6 more move ops @ 28us = 171us
Making the overheads 2.3ms, and 3.7ms total time.
For a two layer DTCWT, there are now 12 convolutional ops. The second layer kernels are slightly larger (10 taps each)
so although they act over 1/4 the sample size, they take up an extra 1.1ms (2.5ms total for the 12 convs). The overhead
for non convolution operations is 4.4ms, making 6.9ms. Roughly 3 times a long as an 11 by 11 convolution.
There is an option to not calculate the highpass coefficients for the first scale, as these often have limited useful
information (see the `skip_hps` option). For a two scale transform, this takes the convolution run time down to 1.13ms
and the overhead down to 2.49ms, totaling 3.6ms, or roughly the same time as the 1 layer transform.
A single layer inverse transform takes: 1.43ms (conv) + 2.7ms (overhead) totaling 4.1ms, slightly longer than the 3.7ms
for the forward transform.
A two layer inverse transform takes: 2.24 (conv) + 5.9 (overhead) totaling 8.1ms, again slightly longer than the 6.9ms
for the forward transform.
A single layer end to end transform takes 2.86ms (conv) + 5.8ms (overhead) = 8.6ms :math:`\approx` 3.7 (forward) + 4.1 (inverse).
Similarly, a two layer end to end transform takes 4.4ms (conv) + 10.4ms (overhead) = 14.8ms :math:`\approx` 6.9 (forward) + 8.1
(inverse).
If we use the `near_sym_b` filters for layer 1 (13 and 19 taps), the overhead doesn't increase, but the time taken to do
each convolution unsurprisingly triples to 600us each (up from 200us for `near_sym_a`).