Kexec fails to work


#1

I was not aware of Petitboot before reading the following thread so thank you ribalda.
https://discuss.96boards.org/t/bootloader-with-ufs-support/5481

A little research indicated that it would be just the thing for booting from my hard drive. So I too started to investigate kexec.

I installed kexec-tools on my Debian install (linaro-buster-alip-dragonboard-820c-221.img):
root@linaro-alip:~# apt-get install kexec-tools
root@linaro-alip:~# kexec -v
kexec-tools 2.0.16

The kexec userspace tool does not seem to handle device tree reserved-memory nodes with an alloc-ranges property correctly.
Little Kernel does appear to handle them correctly. I have examined /proc/device-tree/reserved-memory/rmtfs@86700000 and it seems to be correct.

If I transfer the Image.gz, initramfs and .dtb files and /lib/modules/4.18.0 from:
https://snapshots.linaro.org/96boards/dragonboard820c/linaro/linux-integration/107/

and boot the dragonboard by fastboot with:
# fastboot boot boot-rootfs-linux-integration-v4.18-273-gdc6c774913bc-107-dragonboard820c.img

then load the Image.gz with:
root@linaro-alip:~# kexec -d -l /boot/Image.gz --initrd=/boot/initramfs-bootrr-image-dragonboard-820c-20180829152612-44.rootfs.cpio.gz --command-line="root=/dev/disk/by-partlabel/rootfs rw rootwait console=ttyMSM0,115200n8 androidboot.bootdevice=624000.ufshc androidboot.serialno=683393bf androidboot.baseband=apq mdss_mdp.panel=0 earlyprintk earlycon=msm_serial_dm,0x75b0000 debug memblock=debug ignore_loglevel ftrace=function ftrace_dump_on_oops" --dtb=/boot/apq8096-db820c.dtb

then execute the kernel using:
root@linaro-alip:~# systemctl kexec

the second kernel fails to reserve memory for the node with the alloc-ranges property, rmtfs@86700000, and a panic ensues. The other nodes in /reserved-memory seem to be handled correctly, according to memblock=debug.

...
[    0.000000] OF: reserved mem: failed to allocate memory for node 'rmtfs@86700000'
[    0.000000] Reserved memory: created DMA memory pool at 0x0000000090b00000, size 10 MiB
[    0.000000] OF: reserved mem: initialized node gpu@8f200000, compatible id shared-dma-pool
[    0.000000] cma: Failed to reserve 16 MiB
[    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000001000 bytes below 0x0000000000000000.
...

However, if I do not include a device tree with the kexec -l and, presumably, reuse the already loaded device tree, the second kernel does allocate memory for rmtfs@86700000.

The second kernel still fails to boot, but at a later stage, while bringing up the secondary cpus, as ribalda also found. I am guessing that this is because the interrupt controller has not been set up correctly.

The boot messages from the first kernel, at this point, are:
[ 0.047809] smp: Bringing up secondary CPUs …
[ 0.080347] Detected PIPT I-cache on CPU1
[ 0.080376] GICv3: CPU1: found redistributor 1 region 0:0x0000000009c40000
[ 0.080411] CPU1: Booted secondary processor 0x0000000001 [0x511f2112]
[ 0.115085] Detected PIPT I-cache on CPU2
[ 0.115962] GICv3: CPU2: found redistributor 100 region 0:0x0000000009c80000
[ 0.116822] CPU2: Booted secondary processor 0x0000000100 [0x511f2052]
[ 0.154786] Detected PIPT I-cache on CPU3
[ 0.155607] GICv3: CPU3: found redistributor 101 region 0:0x0000000009cc0000
[ 0.156422] CPU3: Booted secondary processor 0x0000000101 [0x511f2052]
[ 0.159716] smp: Brought up 1 node, 4 CPUs
The line, ‘smp: Bringing up secondary CPUs …’, being the last line from the second kernel.
and, at the end of the first kernel shutdown, I get:
[ 52.365624] kexec_core: Starting new kernel
[ 52.372260] Disabling non-boot CPUs …
[ 52.448981] CPU1: shutdown
[ 52.455721] psci: CPU1 killed.
[ 52.520984] CPU2: shutdown
[ 52.527117] psci: CPU2 killed.
[ 52.568144] cpu cpu3: opp_migrate_dentry: Failed to rename link from: cpu2 to cpu3
[ 52.576075] CPU3: shutdown
[ 52.583083] psci: CPU3 killed.
[ 52.606336] Bye!

There seems to be some recent traffic on http://lists.infradead.org/mailman/listinfo/kexec related to this but I cannot discern if a solution is imminent. The posts appear to be more concerned with the effect on kdump which would appear to be difficult to resolve.

Can anyone shed any light on if/when kexec will work?


#2

Now I am confused:
$ …/mkbootimg_tools/mkboot boot-rootfs-linux-integration-v4.18-273-gdc6c774913bc-107-dragonboard820c.img kernel
Unpack & decompress boot-rootfs-linux-integration-v4.18-273-gdc6c774913bc-107-dragonboard820c.img to kernel
kernel : kernel
ramdisk : ramdisk
page size : 4096
kernel size : 8923011
ramdisk size : 18678085
base : 0x80000000
kernel offset : 0x00008000
ramdisk offset : 0x04000000
tags offset : 0x00000100
cmd line : root=/dev/ram0 init=/init rw console=tty0 console=ttyMSM0,115200n8
ramdisk is gzip format.
Unpack completed.

There is no .dbt in these images.

However, if I build the commit dc6c774913bc kernel myself with the same ramdisk and a QCDT device tree blob:
$ …/mkbootimg_tools/mkboot boot.img-4.18.0-60569-gdc6c774913bc kernel
Unpack & decompress boot.img-4.18.0-60569-gdc6c774913bc to kernel
kernel : kernel
ramdisk : ramdisk
page size : 4096
kernel size : 9784530
ramdisk size : 18678085
dtb size : 208896
base : 0x80000000
kernel offset : 0x00008000
ramdisk offset : 0x02000000
tags offset : 0x00000100
dtb img : dt.img
cmd line : root=/dev/disk/by-partlabel/rootfs rw rootwait console=tty0 console=ttyMSM0,115200n8
ramdisk is gzip format.
Unpack completed.

I get the same results as with snapshot 107, (with the caveat that it frequently hangs at the first boot with ‘random: crng init done’ or ‘qcom_adsp_pil adsp-pil: start timed out’).


#3

Well, I answered one question.
kernel/kernel generated by:
$ ../mkbootimg_tools/mkboot boot-rootfs-linux-integration-v4.18-273-gdc6c774913bc-107-dragonboard820c.img kernel
is Image.gz concatenated with apq8096-db820c.dtb:
$ cat Image.gz apq8096-db820c.dtb > kernel1
$ diff kernel1 kernel/kernel
$

I was wondering if I could use kexec -s with this file, (or, indeed, if that was the intention), but CONFIG_KEXEC_FILE is not set in these builds and it appears that the intention is that kexec_file_load() should enforce the re-use of the first kernel’s .dtb, https://lkml.org/lkml/2018/8/1/163.


#4

You can use dbootimg tool [1] to extract DTB and Kernel from the boot image file or partition:

https://github.com/96boards/dt-update.git
cd dt_update
make && make install
dbootimg /dev/disk/by-partlabel/boot -x dtb > apq8096-db820c.dtb
dbootimg /dev/disk/by-partlabel/boot -x kernel > Image.gz

#5

Thanks for that Loic. I learn a little more every day.:slight_smile: