Comparison between Hikey960 and Hikey970

Recently, I am trying to compare performance of GPUs between Hikey960 and Hikey970.
The result is strange to me. Any idea about it?

- TinkerBoard HIkey960 Hikey970
alexnet 135 31 31
googlenet 224 46 52
inception_v3 802 177 195
inception_v4 β€” 542 525
lenet 8 9 14
mobilenet 124 21 32
mobilenet_qasymm8 121 26 178
resnet50 546 96 112
resnext50 907 221 252
squeezenet 132 23 159
squeezenet_v1_1 70 19 110
vgg16 887 180 188
vgg19 1069 244 232

All test-cases are the examples of https://github.com/ARM-software/ComputeLibrary. Every value is msec and minimum in 100 times exectuion.
Hikey960 and Hikey970 ran on AOSP, TinkerBoard ran on TinkerOS using OpenCL.

It is understandable for me that Mali-G71 in Hikey960 is much faster then Mali-T764 in TinkerBoard. But Mali-G72 in Hikey970 is equal or slower than Mali-G71 in Hikey960.

It may be just a problem of software tuning.
I expected that Mali-G72 is faster because of the number of shader core.
Does anybody have idea why such result come from?

1 Like

Hi no_maddo, I have found a significant performance regression (and its cause!) with ArmCL v18.05 compared to ArmCL v18.03, especially for 1x1 convolutions used e.g. in SqueezeNet and MobileNets. As your results show a big slowdown between HiKey960 and HiKey970 especially for these models, I am wondering if it’s due to the different library versions deployed on your platforms?

Thanks. In my case, I only used v18.05.

@no_maddo Could you please share how you built compute library on Hikey 970? Have you tried any tensorflow Models?