Hikey960 Kernel panic - not syncing: Out of memory and no killable processes


#1

Hi

I am hitting this issue consistently (Kernel panic - not syncing: Out of memory and no killable processes…) when we ran some multiple programs/process on Hikey960.
Can you please let me know the solution to prevent kernel panic, and why is this issue seen even with 3GB of RAM.

Thanks


#2

You need to collect and share more info so that we can help you (starting by the OS, Android ?).

For example sharing the output of:
$ top
$ cat /proc/meminfo
$ cat /proc/slabinfo

In case you are able to rebuild the kernel, enabling kmemleak could be a great help:
https://www.kernel.org/doc/html/latest/dev-tools/kmemleak.html


#3

Hi

This issue is seen during iperf run for about ~2 hours.
I am using Android AOSP master, and kernel 4.4.82.

cat /proc/meminfo
MemTotal: 3032416 kB
MemFree: 153108 kB

please find below the kernel logs:

<0>[ 4149.645181] Kernel panic - not syncing: Out of memory and no killable processes…
<0>[ 4149.645181]
<4>[ 4149.645183] CPU: 6 PID: 1437 Comm: sh Tainted: G W L 4.4.82-g8a1c18d-dirty #1
<4>[ 4149.645184] Hardware name: HiKey960 (DT)
<4>[ 4149.645184] Call trace:
<4>[ 4149.645187] [] dump_backtrace+0x0/0x1e8
<4>[ 4149.645189] [] show_stack+0x20/0x28
<4>[ 4149.645191] [] dump_stack+0xa8/0xe0
<4>[ 4149.645195] [] panic+0xf4/0x240
<4>[ 4149.645197] [] pagefault_out_of_memory+0x0/0x88
<4>[ 4149.645198] [] __alloc_pages_nodemask+0x9f0/0xa24
<4>[ 4149.645200] [] alloc_kmem_pages_node+0x38/0xa4
<4>[ 4149.645202] [] copy_process.isra.43+0x178/0x1644
<4>[ 4149.645203] [] _do_fork+0x78/0x39c
<4>[ 4149.645204] [] SyS_clone+0x44/0x50
<4>[ 4149.645206] [] el0_svc_naked+0x24/0x28
<2>[ 4149.645210] CPU4: stopping
<4>[ 4149.645212] CPU: 4 PID: 15305 Comm: sh Tainted: G W L 4.4.82-g8a1c18d-dirty #1
<4>[ 4149.645213] Hardware name: HiKey960 (DT)
<4>[ 4149.645213] Call trace:
<4>[ 4149.645216] [] dump_backtrace+0x0/0x1e8
<4>[ 4149.645218] [] show_stack+0x20/0x28
<4>[ 4149.645221] [] dump_stack+0xa8/0xe0
<4>[ 4149.645223] [] handle_IPI+0x2e8/0x2f4
<4>[ 4149.645226] [] gic_handle_irq+0x90/0xa8
<4>[ 4149.645228] Exception stack(0xffffffc0740c7700 to 0xffffffc0740c7830)
<4>[ 4149.645231] 7700: 0000000004208040 0000000000000000 ffffff80089856ac 0000000000000000
<4>[ 4149.645233] 7720: ffffff8009278000 ffffffc0b61a5548 ffffffc0be144090 0000000000000000
<4>[ 4149.645235] 7740: 0000000000000003 ffffffc0740c7740 ffffffbdc1180540 ffffffbdc1180540
<4>[ 4149.645236] 7760: 0000000000000000 0000000000000001 0000000000000005 0000000000000000
<4>[ 4149.645238] 7780: 0000000000000005 00000000008f4dd1 0000000000a03613 0000000000000056
<4>[ 4149.645239] 77a0: ffffff800934c000 000000000000980f ffffff80092889d0 0000000000000384
<4>[ 4149.645240] 77c0: 0000000000000000 ffffffc0b61a5100 ffffffc0708a8080 ffffffc089dc7080
<4>[ 4149.645242] 77e0: 000000000000d800 ffffffc0740c7850 ffffff80089856ac ffffffc0740c7830
<4>[ 4149.645243] 7800: ffffff8008985550 0000000020000105 ffffffc0740c7850 ffffff80089856ac
<4>[ 4149.645245] 7820: 0000008000000000 ffffff800898551c
<4>[ 4149.645246] [] el1_irq+0xbc/0x134
<4>[ 4149.645252] [] lowmem_scan+0x250/0x538
<4>[ 4149.645255] [] shrink_slab.part.41+0x1ec/0x48c
<4>[ 4149.645258] [] shrink_zone+0x30c/0x310
<4>[ 4149.645259] [] do_try_to_free_pages+0x158/0x468
<4>[ 4149.645261] [] try_to_free_pages+0xcc/0x220
<4>[ 4149.645262] [] __alloc_pages_nodemask+0x560/0xa24
<4>[ 4149.645264] [] alloc_kmem_pages_node+0x38/0xa4
<4>[ 4149.645265] [] copy_process.isra.43+0x178/0x1644
<4>[ 4149.645267] [] _do_fork+0x78/0x39c
<4>[ 4149.645268] [] SyS_clone+0x44/0x50
<4>[ 4149.645270] [] el0_svc_naked+0x24/0x28
<2>[ 4149.645271] CPU7: stopping
<4>[ 4149.645274] CPU: 7 PID: 14541 Comm: sh Tainted: G W L 4.4.82-g8a1c18d-dirty #1
<4>[ 4149.645275] Hardware name: HiKey960 (DT)
<4>[ 4149.645275] Call trace:
<4>[ 4149.645278] [] dump_backtrace+0x0/0x1e8
<4>[ 4149.645280] [] show_stack+0x20/0x28
<4>[ 4149.645283] [] dump_stack+0xa8/0xe0
<4>[ 4149.645284] [] handle_IPI+0x2e8/0x2f4
<4>[ 4149.645286] [] gic_handle_irq+0x90/0xa8
<4>[ 4149.645287] Exception stack(0xffffffc04806f700 to 0xffffffc04806f830)
<4>[ 4149.645289] f700: 0000000004208060 0000000000000000 ffffff80089856ac 0000000000000000
<4>[ 4149.645291] f720: ffffff8009278000 ffffffc0b5ab35c8 ffffffc0be183090 0000000000000000
<4>[ 4149.645292] f740: ffffffc04806f8b8 ffffffc04806f620 00000000000009f0 ffffffbdc1180540
<4>[ 4149.645294] f760: 0000000000000000 0000000000000001 000000000000002e 0000000000000000
<4>[ 4149.645296] f780: 0000000000000001 00000000008dc2b4 0000000000825689 0000000000000056
<4>[ 4149.645297] f7a0: ffffff800934c000 000000000000980f ffffff80092889d0 0000000000000384
<4>[ 4149.645299] f7c0: 0000000000000000 ffffffc0b5ab3180 ffffffc048539100 ffffffc089dc7080
<4>[ 4149.645300] f7e0: 000000000000d800 ffffffc04806f850 ffffff80089856ac ffffffc04806f830
<4>[ 4149.645301] f800: ffffff8008985540 0000000020000145 ffffffc04806f850 ffffff80089856ac
<4>[ 4149.645302] f820: 0000008000000000 ffffff800898551c
<4>[ 4149.645304] [] el1_irq+0xbc/0x134
<4>[ 4149.645306] [] lowmem_scan+0x240/0x538
<4>[ 4149.645308] [] shrink_slab.part.41+0x1ec/0x48c
<4>[ 4149.645310] [] shrink_zone+0x30c/0x310
<4>[ 4149.645312] [] do_try_to_free_pages+0x158/0x468
<4>[ 4149.645313] [] try_to_free_pages+0xcc/0x220
<4>[ 4149.645315] [] __alloc_pages_nodemask+0x560/0xa24
<4>[ 4149.645316] [] alloc_kmem_pages_node+0x38/0xa4
<4>[ 4149.645318] [] copy_process.isra.43+0x178/0x1644
<4>[ 4149.645320] [] _do_fork+0x78/0x39c
<4>[ 4149.645322] [] SyS_clone+0x44/0x50
<4>[ 4149.645323] [] el0_svc_naked+0x24/0x28
<2>[ 4149.645333] CPU1: stopping
<4>[ 4149.645348] CPU: 1 PID: 4488 Comm: sh Tainted: G W L 4.4.82-g8a1c18d-dirty #1
<4>[ 4149.645352] Hardware name: HiKey960 (DT)

Thanks


#4

Also tried changing the water mark level to 7348kB from 22528kB (min_free_kbytes).
But still the same issue is seen.
Is there any other memory parameters tuning is required? please suggest.


#5

Seems there is no user task to kill, so it could be a memleak in the kernel itself.
Does it only occur during iperf tests (network usage) ?
Do you meet the same panic with kernel 4.9 ?


#6

Yes, it is only during iperf test running with some parallel dhcp, wpa_supplicant process.

Do you meet the same panic with kernel 4.9 ?
I have not tried with 4.9 kernel, because some time back when we tried lspci(no pci devices are shown, even if it is connected) itself was not working.
Is the 4.9 is stable? if so I will try with it.

I think its better if you can share any patches or tuning parameters need to be applied for 4.4 kernel to fix this issue.


#7

I need to instrument the system/kernel and try to reproduce, could you share the reproduction steps ?
I encourage you to try enabling kmemleak on your side ;-).


#8

CONFIG_HAVE_DEBUG_KMEMLEAK is already enabled.
But why 4.9 kernel is not showing up any connected pci devices?


#9

Just curious if can execute below commands for memory leaking analysis [1]:

echo clear > /sys/kernel/debug/kmemleak
Run some test case
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak

[1] https://wiki.linaro.org/Linux-kernel-debug-workshop-hikey