ncnn / docs /developer-guide /how-to-write-a-neon-optimized-op-kernel.md
camenduru's picture
thanks to ncnn ❤
be903e2
# benchmark
op
# naive C with openmp
for for for
# unroll, first try
h
# register allocation
kernels
# unroll, second try
simd
# neon intrinsics
optional
# naive neon assembly with pld
asm
# pipeline optimize, first try
more register load mla
# pipeline optimize, second try
interleave load mla
# pipeline optimize, third try
loop tail
# usual practice, load/save
233
# usual practice, unroll
233
# usual practice, save register
233