PYNQ vs ZYNQ: Performance?

I have an Ultra96 board. I am developing a custom IP which would accelerate a neural network. I have two choices:

  1. Use my Ultra96 board as a PYNQ platform, build my IP into a overlay.
  2. Build a usual ZYNQ project and use the board with petalinux running on ARM.

What are the pros and cons of each way? (1) sounds relatively easy. But I am afraid of two things:

  1. Would python significantly slow down my overall application?
  2. Would I lose control over the DDR4 memory interface? Would random read/writes from PS (pynq) interrupt my (burst) data flow between PL and DDR4?