How to reduce huge FastRPC overhead on Linaro

Hi,
I followed this guide to test the Hexagon DSP. It works, but running the profiler example from the SDK shows FastRPC overhead relative to file size:

[Test_Name Buffer_Size]  Avg FastRPC overhead per iteration(usecs)
[noop        0K]          75us
[inbuf      32K]          238us
[routbuf    32K]          226us
[inbuf      64K]          360us
[routbuf    64K]          318us
[inbuf     128K]          663us
[routbuf   128K]          597us
[inbuf       1M]          4016us
[routbuf     1M]          3027us
[inbuf       4M]          13957us
[routbuf     4M]          8668us
[inbuf       8M]          27249us
[routbuf     8M]          16635us

This implies data is copied instead of shared between the DSP and host.
But rpcmem_alloc should allocate a shared DMA-buffer.
Does anyone have some leads on how to fix this? Thanks!

Hi,
The issue I see here is to do with how we link rpcmem library. Most of the hexagon SDK apps statically link to this rpcmem library. And the behaviour by default is to use ION memory for andriod and non-andriod fallback to a simple malloc and copy.

In Upstream FastRPC we do support dma-buf m/c to share buffers between userspace and dsp without copying. This is also supported in our userspace library rpcmem_android.c « src - working/qualcomm/fastrpc.git - [no description] . To use this we should dynamically link rpcmem in the test app that you are using so that it can pick the symbols from linux libadsprpc.so or libcdsprpc.so library.

It should be easy to remove this by editing UbutuARm.min file in your hexagon SDK test app and remove link that statically links to rpcmem and link to the libraries built from working/qualcomm/fastrpc.git - [no description]

thanks,
srini

@srini
Using the rpcmem_alloc from https://git.linaro.org/landing-teams/working/qualcomm/fastrpc.git does not help. To make 100% sure the correct library is used I explicitly loaded the symbols via dlopen from libcdsprpc.so.
Is the released 5.15.0 kernel+debian sufficient or do I need to build my own?

@Caradhras if you look at rpcmem_android.c « src - working/qualcomm/fastrpc.git - [no description], rpcmem_alloc() we allocate DMA BUF using FASTRPC_IOCTL_ALLOC_DMA_BUFF by default.
you could try LD_PRELOAD to load symbols from the library.
But if the rpcmem is statically linked to the test binary then this might not work. In this case you should recompile the test binary by removing the static linking.

–srini

@srini
I did remove the static link from UbutuARM.min and redeployed everything. FASTRPC_IOCTL_ALLOC_DMA_BUFF is definitely called by the application. I added printf("using nice allocator\n") to rpcmem_android.c’s rpcmem_alloc to verify.
Still no shared memory:

[noop        0K]          85us
[inbuf      32K]    
using nice allocator!
948us
[routbuf    32K]      
using nice allocator!
236us
[inbuf      64K]        
using nice allocator!
379us
[routbuf    64K]     
using nice allocator!
377us
[inbuf     128K]        
using nice allocator!
682us
[routbuf   128K]         
using nice allocator!
664us
[inbuf       1M]         
using nice allocator!
3996us
[routbuf     1M]         
using nice allocator!
2862us
[inbuf       4M]    
using nice allocator!
13757us
[routbuf     4M]       
using nice allocator!

@srini
It appears that rpcmem_android.c does not register the buffer for mapping during FastRPC calls.
So the data is copied regardless. I tried to register it manually with remote_register_buf
this creates a mapping during fastrpc_create_maps but crashes on subsequent calls with

Sep 22 15:48:23 linaro-gnome kernel: Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000fffff7fef000
Sep 22 15:48:23 linaro-gnome kernel: Mem abort info:
Sep 22 15:48:23 linaro-gnome kernel:   ESR = 0x9600000f
Sep 22 15:48:23 linaro-gnome kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Sep 22 15:48:23 linaro-gnome kernel:   SET = 0, FnV = 0
Sep 22 15:48:23 linaro-gnome kernel:   EA = 0, S1PTW = 0
Sep 22 15:48:23 linaro-gnome kernel:   FSC = 0x0f: level 3 permission fault
Sep 22 15:48:23 linaro-gnome kernel: Data abort info:
Sep 22 15:48:23 linaro-gnome kernel:   ISV = 0, ISS = 0x0000000f
Sep 22 15:48:23 linaro-gnome kernel:   CM = 0, WnR = 0
Sep 22 15:48:23 linaro-gnome kernel: user pgtable: 4k pages, 48-bit VAs, pgdp=000000014658e000
Sep 22 15:48:23 linaro-gnome kernel: [0000fffff7fef000] pgd=080000014641c003, p4d=080000014641c003, pud=08000001219da003, pmd=0800000104e90003, pte=0068000101767fcb
Sep 22 15:48:23 linaro-gnome kernel: Internal error: Oops: 9600000f [#2] PREEMPT SMP
Sep 22 15:48:23 linaro-gnome kernel: Modules linked in: michael_mic rfcomm af_alg snd_soc_wsa881x regmap_sdw bnep q6asm_dai q6routing q6afe_dai q6afe_clocks q6adm q6asm q6afe q6dsp_common q6core snd_soc_hdmi_code>
Sep 22 15:48:23 linaro-gnome kernel:  qcom_usb_vbus_regulator spi_geni_qcom i2c_qcom_geni pinctrl_lpass_lpi
Sep 22 15:48:23 linaro-gnome kernel: CPU: 5 PID: 2006 Comm: profiling Tainted: G      D W         5.15.0-qcomlt-arm64 #252
Sep 22 15:48:23 linaro-gnome kernel: Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
Sep 22 15:48:23 linaro-gnome kernel: pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Sep 22 15:48:23 linaro-gnome kernel: pc : __arch_copy_to_user+0x180/0x220
Sep 22 15:48:23 linaro-gnome kernel: lr : fastrpc_internal_invoke+0xa60/0xd90 [fastrpc]
Sep 22 15:48:23 linaro-gnome kernel: sp : ffff80001179bc60
Sep 22 15:48:23 linaro-gnome kernel: x29: ffff80001179bc60 x28: 0000000000000018 x27: ffff800020ce9000
Sep 22 15:48:23 linaro-gnome kernel: x26: 0000000000000002 x25: 000000007fffffff x24: 0000ffffffffffff
Sep 22 15:48:23 linaro-gnome kernel: x23: ffff5ea20305b660 x22: ffff5ea2218d4880 x21: ffff5ea20305be30
Sep 22 15:48:23 linaro-gnome kernel: x20: ffff5ea20305be00 x19: ffff5ea20305b600 x18: 0000000000000000
Sep 22 15:48:23 linaro-gnome kernel: x17: 0000000000000000 x16: ffffbaca24366740 x15: 0000fffff7fef000
Sep 22 15:48:23 linaro-gnome kernel: x14: 000000000000011b x13: 0000000000000051 x12: 071c71c71c71c71c
Sep 22 15:48:23 linaro-gnome kernel: x11: 0000000000000051 x10: 0000000000000a20 x9 : ffff80001179bae0
Sep 22 15:48:23 linaro-gnome kernel: x8 : ffff5ea2218d5300 x7 : ffff5ea202862200 x6 : 0000fffff7fef000
Sep 22 15:48:23 linaro-gnome kernel: x5 : 0000fffff7ff7000 x4 : 0000000000000000 x3 : 0000fffff7fef000
Sep 22 15:48:23 linaro-gnome kernel: x2 : 0000000000007f80 x1 : 0000fffff7fef000 x0 : 0000fffff7fef000
Sep 22 15:48:23 linaro-gnome kernel: Call trace:
Sep 22 15:48:23 linaro-gnome kernel:  __arch_copy_to_user+0x180/0x220
Sep 22 15:48:23 linaro-gnome kernel:  fastrpc_device_ioctl+0x570/0x844 [fastrpc]
Sep 22 15:48:23 linaro-gnome kernel:  __arm64_sys_ioctl+0xac/0xf0
Sep 22 15:48:23 linaro-gnome kernel:  invoke_syscall+0x48/0x114
Sep 22 15:48:23 linaro-gnome kernel:  el0_svc_common.constprop.0+0x44/0xfc
Sep 22 15:48:23 linaro-gnome kernel:  do_el0_svc+0x2c/0x94
Sep 22 15:48:23 linaro-gnome kernel:  el0_svc+0x28/0x80
Sep 22 15:48:23 linaro-gnome kernel:  el0t_64_sync_handler+0xa8/0x130
Sep 22 15:48:23 linaro-gnome kernel:  el0t_64_sync+0x1a0/0x1a4
Sep 22 15:48:23 linaro-gnome kernel: Code: d503201f d503201f d503201f d503201f (a8c12027) 
Sep 22 15:48:23 linaro-gnome kernel: ---[ end trace bbd663a568ccad6d ]---

Any idea what I am doing wrong?

@Caradhras yes, you seems to be doing the correct thing, by passing the dmabuf fd in next invoke. It looks like we are crashing some where at fastrpc_put_args, can you recheck the structure that is passed to invoke and make sure every member is correctly initialised. That is the only thing I can think off. Only buffers can be copied back, if some of the invoke args are not intialized then we might be copying to incorrect address…