Build tensorflow for Hikey960

I am trying to build tensorflow with gpu support for Hikey 960 (python 3.6)
I saw some tutorials and there are some issues in each of them.
Building on my host computer using toolchain as described here:
is not working for python 3.6 , I can’t find toolchain for python 3.6.
I am trying on python 2.7 and I get this error:
//tensorflow/python:framework/fast_tensor_util.so’ failed (Exit 1): compute failed: error executing command

Trying to build on hikey itself as described here:
(After adding a swap file becuase of memory issues)
I get errors becuase of cpuid.h is missing.

Any suggestions would be most welcomed.
Thanks,
Omer