I have to say up front that I disagree entirely that the boot flow for hikey is bloated. Being broken down into small reusable components is absolutely not the same thing as bloat.
However, if you are determined that code from the u-boot repo must boot before any TF-A code runs, then take a look at how the Allwinner parts boot: https://github.com/ARM-software/arm-trusted-firmware/blob/master/docs/plat/allwinner.rst
On these platforms u-boot SPL is used instead of TF-A BL1/BL2. u-boot also supplies an l-loader equivalent as pre-assembled hex bytes to avoid having to depend on a 32-bit toolset.
The two bits of software end up with more or less the same role so swapping them in and out offers little technical benefit. It just means you now need TF-A to build u-boot instead of needing u-boot to build TF-A. The reduction in build complexity this would bring comes from having the AArch32 shim pre-assembled (and a pre-assembled shim could equally have been added to TF-A anyway).
Of course even if there is no technical benefit, trying to replace BL1/BL2 with u-boot SPL is a great learning opportunity!