In the paper's Supplement, section S3.1: This input passes through a 1D convolutional layer with an input size of 12×5000, where 64 filters of size 64×64 are applied...
There are 12 1D inputs of length 5000 being convolved. Is the kernel size of 64x64 a typo?