Could you tell me advices to fix problems to build TensorFlow


#1

Hello.
Since I found some problems when I building machine learning(TensorFlow + Jupyter notebook) in docker environment on SynQuacer, I would like to share it. :slight_smile:
Could you tell me some advices to fix these problems?

  1. Failed to install TensorFlow from pip
  2. “pip install” command is too slow
  3. Failed to build TensorFlow because of -mfpu option
  4. local_resorces option is not working when build TF with bazel

Here is Dockerfile for reproduction:

And here is my SynQuacer spec:

RAM 4GB (I couldn’t find compatible DIMM, which should I use?)
HDD 1TB
Host OS Debian
Kernel 4.14.32.linaro.281-1
Container OS linaro/base-arm64-ubuntu:xenial

1. Failed to install TensorFlow from pip:

I tried to install TensorFlow from pip, but it was failed.

Here is error log:

root@5e2a3172b85b:~# pip3 install tensorflow
Collecting tensorflow
  Could not find a version that satisfies the requirement tensorflow (from versions: )
No matching distribution found for tensorflow
You are using pip version 8.1.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

It seems there is no prebuild aarch64 TensorFlow binary. So, I need to build it from source code.
However, I want to install from pip because it is not easy to use.

2.pip install command is too slow:

I ran below command, but I had to wait for about 3 hour to finish this command. :frowning:

# pip3 --no-cache-dir install Pillow ipykernel  jupyter gast grpcio absl-py protobuf tensorboard scipy

Here is top command result when executing pip install command.

image2

The pip install command executing the cc1plus using single core( to build native extension?).
I thinking it is cause of this problem.

Similar problems are discussed at StackOverflow, but I could not find good answer.

I want to fix this problem to shorten the time on low-power multicore machine(like SynQuacer).
However, I didn’t know where to fix. Could someone help me to solve it?

3. Failed to build TensorFlow because of -mfpu option

I couldn’t build TensorFlow because TensorFlow’s build system sets -mfpu=neon option to gcc.

Here is error log:

 /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/arm-opt/bin/tensorflow/contrib/lite/kernels/internal/_objs/neon_tensor_utils/tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.pic.d '-frandom-seed=bazel-out/arm-opt/bin/tensorflow/contrib/lite/kernels/internal/_objs/neon_tensor_utils/tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.pic.o' -fPIC -iquote . -iquote bazel-out/arm-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/arm-opt/genfiles/external/bazel_tools -iquote external/arm_neon_2_x86_sse -iquote bazel-out/arm-opt/genfiles/external/arm_neon_2_x86_sse -iquote external/gemmlowp -iquote bazel-out/arm-opt/genfiles/external/gemmlowp -funsafe-math-optimizations -ftree-vectorize -fomit-frame-pointer -O3 '-mfpu=neon' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.cc -o bazel-out/arm-opt/bin/tensorflow/contrib/lite/kernels/internal/_objs/neon_tensor_utils/tensorflow/contrib/lite/kernels/internal/reference/portable_tensor_utils.pic.o)
gcc: error: unrecognized command line option '-mfpu=neon'

I created the ad-hockery patch to avoid this problem, but it is not fundamental solution.

I think it is cause of this problem that TensorFlow’s build system recognizes SynQuacer as ARM32 environment.

4. local_resorces option is not working when build TF with bazel:

I set local_resources option to bazel to limit the memory because my SynQuacer’s RAM is 4GB. (I’m using bazel 0.50)

Here is the command to build TensorFlow:

# bazel build -c opt \
     --copt="-mcpu=cortex-a53+fp" \
     --verbose_failures tensorflow/tools/pip_package:build_pip_package \
     --local_resources 3072,24.0,1.0

However, local_resorces option seems not working.

virtual memory exhausted: Cannot allocate memory
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 24530.257s, Critical Path: 23750.57s
INFO: 180 processes: 180 local.
FAILED: Build did NOT complete successfully
root@949cda07f529:~/tensorflow-1.9.0-rc2#

I got “Cannot allocate memory” error when swap enabled.
And crash the docker process when swap disabled.

I change the build command to avoid this problem, however it is not good way.

bazel build -c opt \
     --copt="-mcpu=cortex-a53+fp" \
     --verbose_failures tensorflow/tools/pip_package:build_pip_package \
     -j 3

I think the bazel should adjust it.


#2

Neon/ASIMD was optional for ARMv7 and is mandatory with ARMv8 [1], AFAIK this is why it’s not a valid option with aarch64 toolchaain.

[1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJHECGIH.html


#3

I recently build Tensorflow, keras and jupyter for Developerbox and experienced pretty much the same set of problems you did.

I was not as heavily impacted by bazel’s memory usage because I have a 8GB installed in my Developerbox. Having said that, with the default bazel arguments even an 8GB was still swapping heavily. I chose to use -ram_utilization_factor to limit the parallelism and avoid swapping.


#4

I chose to use -ram_utilization_factor to limit the parallelism and avoid swapping.

I tried --ram_utilization_factor 50, however the result is same as --local_resources 3072,24.0,1.0.


#5

I chose to use -ram_utilization_factor to limit the parallelism and avoid swapping.

I tried --ram_utilization_factor 50, however the result is same as --local_resources 3072,24.0,1.0.

According to the documentation the ram usage estimation is extremely
crude (e.g. known to be inaccurate) so both --ram_utilization_factor and
–local_resources are tuning (they don’t hard limit the amount of RAM
used).

Given you have less RAM than me then you should try a more aggressive
value (-j3 is very aggressive on a 24 core system) for one or both of
these values.

Maybe even try something extreme such as setting the ram utilization to
10 and leaving overnight with some logging to help detect thrashing.