File size: 445 Bytes
be903e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# benchmark
op

# naive C with openmp
for for for

# unroll, first try
h

# register allocation
kernels

# unroll, second try
simd

# neon intrinsics
optional

# naive neon assembly with pld
asm

# pipeline optimize, first try
more register load mla

# pipeline optimize, second try
interleave load mla

# pipeline optimize, third try
loop tail

# usual practice, load/save
233

# usual practice, unroll
233

# usual practice, save register
233