Kexec hang still not resolved

I have been investigating why a kexec booted kernel will always hang in the early boot phase on the DragonBoard 820c. This hang has persisted since my first attempt to use kexec with numerous kernels and many different command lines so I thought it was time to try to nail it. Unfortunately, I have failed.

I use the same Image.gz and initrd for the first and second kernels.
The OS installed on the board is Debian snapshot 355:

boot-linaro-buster-dragonboard-820c-355.img.gz
linaro-buster-alip-dragonboard-820c-355.img.gz

The initrd used comes from this Debian snapshot with /lib/firmware added and /lib/modules replaced by that built with the kernel. The kernel comes from:

$ git clone https://git.linaro.org/landing-teams/working/qualcomm/kernel.git
$ git checkout -t origin/integration-linux-qcomlt

This is the kernel build information:

Linux version 5.2.0-02469-g44b4395655ff-dirty (dixon@computer2) (gcc version 8.3.0 (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36))) #2 SMP PREEMPT Mon Jul 29 10:38:30 BST 2019

The boot image for the first kernel is built and booted with:

$ ../skales/dtbTool -o dt.img -s 4096 ./arch/arm64/boot/dts/qcom/
$ ../skales/mkbootimg --output boot.img --cmdline "root=/dev/disk/by-partlabel/rootfs rw rootwait console=tty0 console=ttyMSM0,115200n8" --base 0x80000000 --pagesize 4096 --kernel ./arch/arm64/boot/Image.gz --dt dt.img --ramdisk ../snapshot-355/contents/ramdisk.new.packed
# fastboot boot boot.img

The first kernel seems to boot fine. (Although, recently, a problem with the ahci driver has appeared, but that is for a separate investigation.)

The second kernel is loaded and run with:

root@linaro-alip:~# kexec -v
kexec-tools 2.0.18
root@linaro-alip:~# kexec -d -l /boot/Image.gz --initrd=/boot/ramdisk.new.packed --command-line='maxcpus=1 root=/dev/disk/by-partlabel/rootfs rw rootwait console=tty0 console=ttyMSM0,115200n8 earlycon earlyprintk initcall_debug debug reset_devices ignore_loglevel dynamic_debug.verbose=1 dyndbg="file regmap-mmio.c +p; file regmap.c +p; file qcom-apcs-ipc-mailbox.c +p; file mailbox.c +p; file qcom_glink_native.c +p; file qcom_glink_rpm.c +p; file dd.c +p; file bus.c +p; file driver.c +p; file platform.c +p"'
root@linaro-alip:~# systemctl kexec

and so uses the device tree from the first kernel.

This is the tail of the boot log from the second kernel. The first line is normally the last line seen before the hang when the debugging kernel parameters are not given:

[    9.056670] EDAC MC: Ver: 3.0.0
[    9.060870] bus: 'edac': registered
[    9.063754] bus: 'edac': add device mc
[    9.067694] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[    9.075382] initcall edac_init+0x0/0x90 returned 0 after 15625 usecs
[    9.076790] calling  mmc_init+0x0/0x48 @ 1
[    9.083584] bus: 'mmc': registered
[    9.087952] bus: 'sdio': registered
[    9.090481] initcall mmc_init+0x0/0x48 returned 0 after 3906 usecs
[    9.094162] calling  leds_init+0x0/0x68 @ 1
[    9.100447] initcall leds_init+0x0/0x68 returned 0 after 0 usecs
[    9.104532] calling  qcom_scm_init+0x0/0x20 @ 1
[    9.110589] __platform_driver_register
[    9.114833] driver_register
[    9.118503] bus: 'platform': add driver qcom_scm
[    9.121359] bus_add_driver
[    9.126156] driver_attach
[    9.128570] bus_for_each_dev
[    9.131377] __driver_attach
[    9.134216] device_driver_attach
[    9.136830] driver_probe_device
[    9.140207] bus: 'platform': driver_probe_device: matched device firmware:scm with driver qcom_scm
[    9.143175] really_probe_debug
[    9.152135] bus: 'platform': really_probe: probing driver qcom_scm with device firmware:scm
[    9.155275] really_probe
[    9.163692] driver: 'qcom_scm': driver_bound: bound to device 'firmware:scm'
[    9.166441] bus: 'platform': really_probe: bound device firmware:scm to driver qcom_scm
[    9.173516] probe of firmware:scm returned 1 after 12000 usecs
[    9.181580] initcall qcom_scm_init+0x0/0x20 returned 0 after 39062 usecs
[    9.187207] calling  scmi_bus_init+0x0/0x44 @ 1
[    9.194063] bus: 'scmi_protocol': registered
[    9.198273] initcall scmi_bus_init+0x0/0x44 returned 0 after 3906 usecs
[    9.202613] calling  scmi_clock_init+0x0/0x20 @ 1
[    9.208988] initcall scmi_clock_init+0x0/0x20 returned 20 after 0 usecs
[    9.213809] calling  scmi_perf_init+0x0/0x20 @ 1
[    9.220197] initcall scmi_perf_init+0x0/0x20 returned 19 after 0 usecs
[    9.225060] calling  scmi_power_init+0x0/0x20 @ 1
[    9.231380] initcall scmi_power_init+0x0/0x20 returned 17 after 0 usecs
[    9.236110] calling  scmi_sensors_init+0x0/0x20 @ 1
[    9.242520] initcall scmi_sensors_init+0x0/0x20 returned 21 after 0 usecs
[    9.247397] calling  glink_rpm_init+0x0/0x20 @ 1
[    9.254321] __platform_driver_register
[    9.259003] driver_register
[    9.262513] bus: 'platform': add driver qcom_glink_rpm
[    9.265280] bus_add_driver
[    9.270444] driver_attach
[    9.273101] bus_for_each_dev
[    9.275873] __driver_attach
[    9.278747] device_driver_attach
[    9.281301] driver_probe_device
[    9.284739] bus: 'platform': driver_probe_device: matched device rpm-glink with driver qcom_glink_rpm
[    9.287707] really_probe_debug
[    9.297014] bus: 'platform': really_probe: probing driver qcom_glink_rpm with device rpm-glink
[    9.300073] really_probe
[    9.308676] glink_rpm_probe ffff000012ca0804 ffff000012ca0c00
[    9.311457] qcom_glink_native_probe
[    9.316905] qcom_glink_send_version
[    9.320177] qcom_glink_tx
[    9.323628] mbox_send_message
[    9.326406] msg_submit
[    9.329349] qcom_apcs_ipc_send_data
[    9.331630] regmap_write
[    9.334994] _regmap_write
[    9.337769] _regmap_bus_reg_write
[    9.340294] regmap_mmio_write
[    9.343589] regmap_mmio_write32le ffff00001000d000 10 1

At this point the watchdog times out, the primary boot loader starts and the normal boot sequence ensues.

Much of the output in the log is from pr_debugs inserted by myself to trace the calls. I can pastebin the full boot log from the first and second kernels if anyone is willing to read it.

This is the source for the last C function before the hang, from ./drivers/base/regmap/regmap_mmio.c.

static void regmap_mmio_write32le(struct regmap_mmio_context *ctx,
				  unsigned int reg,
				  unsigned int val)
{
	pr_debug("%s %llx %x %x\n", __func__, (u64)ctx->regs, reg, val);
	writel(val, ctx->regs + reg);
}

Disassembling ./drivers/base/regmap/regmap_mmio.o with Radare2 + Cutter gives the following for this function:

/ (fcn) sym.regmap_mmio_write32le 64
|   sym.regmap_mmio_write32le (int32_t arg3, int32_t arg2, int32_t arg1, int32_t arg_20h);
|           ; arg int32_t arg_20h @ sp+0x20
|           ; arg int32_t arg3 @ x2
|           ; arg int32_t arg2 @ x1
|           ; arg int32_t arg1 @ x0
|           0x080002d8      stp x29, x30, [sp, -0x30]!
|           0x080002dc      mov x29, sp
|           0x080002e0      stp x19, x20, [sp, 0x10]
|           0x080002e4      mov w19, w1 ; arg2
|           0x080002e8      mov w20, w2 ; arg3
|           0x080002ec      str x21, [sp + arg_20h]
|           0x080002f0      mov x21, x0 ; arg1
|           0x080002f4      nop
|           0x080002f8      dsb st
|           0x080002fc      ldr x0, [x21] ; [0x8000000:4]=0x464c457f ; loc.imp.__devm_regmap_init
|           0x08000300      add x19, x0, w19, uxtw
|           0x08000304      str w20, [x19]
|           0x08000308      ldp x19, x20, [sp, 0x10]
|           0x0800030c      ldr x21, [sp + arg_20h] ; [0x20:4]=-1 ; 32
|           0x08000310      ldp x29, x30, [sp], 0x30
\           0x08000314      ret

This is the call stack.
Determined in the most labour intensive manner imaginable, because I do not know any better.:slight_smile:

hangs in	./arch/arm64/include/asm/io.h:writel
called by	./drivers/base/regmap/regmap-mmio.c:regmap_mmio_write32le
called through	ctx->reg_write(ctx, reg, val)
called by	./drivers/base/regmap/regmap-mmio.c:regmap_mmio_write
called through	map->bus->reg_write(map->bus_context, reg, val)
called by	./drivers/base/regmap/regmap.c:_regmap_bus_reg_write
called through	map->reg_write(context, reg, val)
called by	./drivers/base/regmap/regmap.c:_regmap_write
called by	./drivers/base/regmap/regmap.c:regmap_write
called by	./drivers/mailbox/qcom-apcs-ipc-mailbox.c:qcom_apcs_ipc_send_data
called through	chan->mbox->ops->send_data(chan, chan->msg_data[idx])
called by	./drivers/mailbox/mailbox.c:msg_submit
called by	./drivers/mailbox/mailbox.c:mbox_send_message
called by	./drivers/rpmsg/qcom_glink_native.c:qcom_glink_tx
called by	./drivers/rpmsg/qcom_glink_native.c:qcom_glink_send_version
called by	./drivers/rpmsg/qcom_glink_native.c:qcom_glink_native_probe
called by	./drivers/rpmsg/qcom_glink_rpm.c:glink_rpm_probe
called through	ret = drv->probe(dev)
called by	./drivers/base/dd.c:really_probe
called by	./drivers/base/dd.c:really_probe_debug
called by	./drivers/base/dd.c:driver_probe_device
called by	./drivers/base/dd.c:device_driver_attach
called by	./drivers/base/dd.c:__driver_attach
called through	fn(dev, data)
called by	./drivers/base/bus.c:bus_for_each_dev
called by	./drivers/base/dd.c:driver_attach
called by	./drivers/base/bus.c:bus_add_driver
called by	./drivers/base/driver.c:driver_register
called by	./drivers/base/platform.c:__platform_driver_register
called as	./include/linux/platform_device.h:platform_driver_register
called by	./drivers/rpmsg/qcom_glink_rpm.c:glink_rpm_init
called through initcall mechanism

There are two calls of writel immediately before the call to qcom_glink_native_probe in ./drivers/rpmsg/qcom_glink_rpm.c:glink_rpm_probe. The pr_debug, inserted by me, is included to show where the values in the log come from.

	writel(0, tx_pipe->head);
	writel(0, rx_pipe->tail);

	pr_debug("%s %llx %llx\n", __func__, (u64)tx_pipe->head, (u64)rx_pipe->tail);

	glink = qcom_glink_native_probe(&pdev->dev,
					0,
					&rx_pipe->native,
					&tx_pipe->native,
					true);

These calls compile to:

|  || ||:   0x08000448      dsb st
|  || ||:   0x0800044c      ldr x0, [x20, 0x30] ; [0x30:4]=-1 ; '0' ; 48
|  || ||:   0x08000450      str wzr, [x0]
|  || ||:   0x08000454      dsb st
|  || ||:   0x08000458      ldr x4, [x19, 0x28] ; [0x28:4]=-1 ; '(' ; 40
|  || ||:   0x0800045c      str wzr, [x4]

Prior to that, the only calls to regmap_mmio_write32le occur as follows

[    6.983167] driver_probe_device
[    6.988049] bus: 'platform': driver_probe_device: matched device 9820000.mailbox with driver qcom_apcs_ipc
[    6.990926] really_probe_debug
[    7.000688] bus: 'platform': really_probe: probing driver qcom_apcs_ipc with device 9820000.mailbox
[    7.003749] really_probe
[    7.012957] driver: 'qcom_apcs_ipc': driver_bound: bound to device '9820000.mailbox'
[    7.015611] bus: 'platform': really_probe: bound device 9820000.mailbox to driver qcom_apcs_ipc
[    7.023595] probe of 9820000.mailbox returned 1 after 12000 usecs
[    7.031655] OF:    create child: /soc/clock-controller@300000
[    7.038113] bus: 'platform': add device 300000.clock-controller
[    7.043887] driver_probe_device
[    7.049483] bus: 'platform': driver_probe_device: matched device 300000.clock-controller with driver gcc-msm8996
[    7.052388] really_probe_debug
[    7.062832] bus: 'platform': really_probe: probing driver gcc-msm8996 with device 300000.clock-controller
[    7.065730] really_probe
[    7.075721] _regmap_write
[    7.077826] _regmap_bus_reg_write
[    7.080345] regmap_mmio_write
[    7.083642] regmap_mmio_write32le ffff000012300000 7d024 80282001
[    7.086785] _regmap_write
[    7.092664] _regmap_bus_reg_write
[    7.095274] regmap_mmio_write
[    7.098570] regmap_mmio_write32le ffff000012300000 7d024 80000000
[    7.101683] _regmap_write
[    7.107595] _regmap_bus_reg_write
[    7.110206] regmap_mmio_write
[    7.113501] regmap_mmio_write32le ffff000012300000 7d034 80282000
[    7.116609] _regmap_write
[    7.122525] _regmap_bus_reg_write
[    7.125135] regmap_mmio_write
[    7.128432] regmap_mmio_write32le ffff000012300000 7d038 80282001
[    7.131519] _regmap_write
[    7.137456] _regmap_bus_reg_write
[    7.140067] regmap_mmio_write
[    7.143361] regmap_mmio_write32le ffff000012300000 7d038 80000000
[    7.152943] driver: 'gcc-msm8996': driver_bound: bound to device '300000.clock-controller'
[    7.153333] bus: 'platform': really_probe: bound device 300000.clock-controller to driver gcc-msm8996
[    7.160904] probe of 300000.clock-controller returned 1 after 36000 usecs

At this point, I am stuck. I have insufficient understanding of the arm architecture.
I am guessing that the data synchronisation barrier is hanging because kexec has omitted some necessary action, such as modifying a system register, when preparing for the second kernel.

Any suggestions on what I could check next would be very welcome. Or where I should be posting to to find someone who could help.

1 Like

Pinging @bamse, just in case. Look like after kexec, kernel hangs on accessing rpm memory device (qcom_glink_send_version). Not sure why, but we are probably not in the same state than after a standard boot (via LK). Maybe dumping and comparing rpm and/or clock registers (standard boot vs LK) would help? Also you can try to have a look at LK [1] and check if you see any related commit or piece of code (e.g. reinit glink channel…).

[1] working/qualcomm/lk.git - [no description]

Thanks for that loic. I had already started to look at LK, as it obviously sets the soc up correctly. The problem I was having is that the hang occurs while setting up glink-rpm which is a Qualcomm specific thing and I have been unable to find any information on glink, beyond the source code in the kernel and LK, which would give me some idea of what that code was trying to do.
I was going to try building LK with the debug messages turned on to see if they could give me some more clues.

BTW don’t worry too much about the dsb instruction… it enforces ordering (in this case waiting for preceding stores in the instruction stream). I think you can still read the code at the C level: writing zero (wzr - 32-bit word sized zero register) to the address makes the system fail.

Dumb question but what happens if you modify one of the kernel’s DT so the drivers that crash when they are reinitalized without a reset don’t get reinitialized. For kernel-as-bootloader use-cases probably best to disable in the first kernel… for kdump use-cases probably best to disable in the second kernel.

Maybe losing a comms system will cause too many dependant components to fail but might be worth a try to get unblocked…

Thanks Daniel. I will try that in a while.

I have been trying to compile LK origin/release/LA.HB.1.3.2-19600-8x96.0 with debug messages with gcc-arm-8.3-2019.03-x86_64-arm-eabi.

I presume warnings are errors has been set in this toolchain. It throws a lot of errors. I have been investigating them and some are serious others are possibly serious, as the intent is unclear, one or two are cosmetic and there is a configuration issue with MDTP requiring VERIFIED_BOOT.

I am currently trying to resolve incompatible function casts and will report when I get a clean compile.

Eventually, I did get a clean build but, when I flash it to the board I get the following from the secondary boot loader:

Error code 125 at boot_config_process_entry Line 244

I also notice that there is no EXIDX (Exception Index Table) segment in the .mbn. Would this be enough to cause this error? If so, how do I generate one? -funwind-tables does not do it.

Rather late in the day, I found:
https://git.linaro.org/ci/job/configs.git/tree/lt-qcom-bootloader-dragonboard820c.yaml

So I guess you have to use a six year old toolchain with linker scripts ‘Modified for Android’ and who knows what other ‘proprietary enhancments’:
https://source.codeaurora.org/quic/la/platform/prebuilts/gcc/linux-x86/arm/arm-eabi-4.8/
Note, there is a tag LA.HB.1.3.2-19600-8x96.0 in this repository. It seems all tags point to the same commit:
26e93f6 [linux-x86] Refresh arm-eabi-gcc 4.8 to fix kernel compilation regression.

This is what I used to compile:

$ git clone https://git.linaro.org/landing-teams/working/qualcomm/lk.git
$ cd lk
$ git fetch --tags
$ git checkout -t origin/release/LA.HB.1.3.2-19600-8x96.0
$ export PATH=/path-to-toolchain/gcc-arm-8.3-2019.03-x86_64-arm-eabi/bin:$PATH
$ export DEBUG=2
$ export VERIFIED_BOOT=1
$ make msm8996 clean
$ make -k -j4 msm8996
$ git clone https://git.linaro.org/landing-teams/working/qualcomm/signlk.git
$ mv build-msm8996/emmc_appsboot.mbn build-msm8996/emmc_appsboot_unsigned.mbn
$ ./signlk/signlk.sh -i=./build-msm8996/emmc_appsboot_unsigned.mbn -o=./build-msm8996/emmc_appsboot.mbn

First, the makefile fails to set BUILD_VERSION using ‘git describe’.

fatal: No tags can describe '15a6532d9fe4ac14e376036eee792cf85c9ada57'.
Try --always, or create some tags.

So I created a tag.

$ git tag -m 'for kexec testing' debian-qcom-dragonboard820c-LA.HB.1.3.2-19600-8x96.0

These are the files I had to modify to get a clean build:

$ git status
On branch release/LA.HB.1.3.2-19600-8x96.0
Your branch is up to date with 'origin/release/LA.HB.1.3.2-19600-8x96.0'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   app/aboot/mdtp.h
	modified:   app/aboot/recovery.c
	modified:   app/tests/kauth_test.c
	modified:   dev/pmic/pmi8994/pm_app_smbchg.c
	modified:   dev/pmic/pmi8994/pm_smbchg_bat_if.c
	modified:   dev/qpnp_wled/qpnp_wled.c
	modified:   include/string.h
	modified:   lib/openssl/crypto/asn1/t_x509.c
	modified:   lib/openssl/crypto/bf/bf_locl.h
	modified:   lib/openssl/crypto/bio/b_print.c
	modified:   lib/openssl/crypto/bn/bn_lib.c
	modified:   lib/openssl/crypto/bn/bn_mul.c
	modified:   lib/openssl/crypto/evp/evp_enc.c
	modified:   lib/openssl/crypto/pkcs7/pk7_asn1.c
	modified:   lib/openssl/crypto/rc2/rc2_locl.h
	modified:   lib/openssl/crypto/rc4/rc4_enc.c
	modified:   lib/zlib_inflate/inflate.c
	modified:   platform/msm_shared/glink/glink_core_intentless_xport.c
	modified:   platform/msm_shared/i2c_qup.c
	modified:   platform/msm_shared/include/glink_os_type.h
	modified:   platform/msm_shared/include/partition_parser.h
	modified:   platform/msm_shared/partition_parser.c
	modified:   platform/msm_shared/shutdown_detect.c
	modified:   platform/msm_shared/spmi.c

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	signlk/

no changes added to commit (use "git add" and/or "git commit -a")

This is what I had to do to those files.

This change is probably nitpicking.

diff --git a/app/aboot/mdtp.h b/app/aboot/mdtp.h
index 4958a4ea..704a13ff 100644
--- a/app/aboot/mdtp.h
+++ b/app/aboot/mdtp.h
@@ -48,7 +48,7 @@
 
 #ifdef MDTP_SUPPORT
 #ifndef VERIFIED_BOOT
-#error MDTP feature requires VERIFIED_BOOT feature
+#error "MDTP feature requires VERIFIED_BOOT feature"
 #endif
 #endif
 

This one is fairly cosmetic.
-Werror=unused-const-variable=

diff --git a/app/aboot/recovery.c b/app/aboot/recovery.c
index 96441e23..3ed91332 100644
--- a/app/aboot/recovery.c
+++ b/app/aboot/recovery.c
@@ -52,7 +52,7 @@
 #define UPDATE_STATUS	2
 #define ROUND_TO_PAGE(x,y) (((x) + (y)) & (~(y)))
 
-static const int MISC_PAGES = 3;			// number of pages to save
+/*static const int MISC_PAGES = 3; NotUsed*/			// number of pages to save
 static const int MISC_COMMAND_PAGE = 1;		// bootloader command is this page
 static char buf[4096];
 

Clearly, this is not a pure function.
-Werror=attributes

diff --git a/include/string.h b/include/string.h
index 1d987c35..0d704f00 100644
--- a/include/string.h
+++ b/include/string.h
@@ -57,7 +57,7 @@ char       *strtok_r(char *s, const char *delim, char **last);
 int         strcoll(const char *s1, const char *s2) __PURE;
 size_t      strxfrm(char *dest, const char *src, size_t n) __PURE;
 char       *strdup(const char *str) __MALLOC;
-void        strrev(unsigned char *str) __PURE;
+void        strrev(unsigned char *str);
 
 #ifdef __cplusplus
 } /* extern "C" */

I think I have got this one the right way round. The original is always true and rather worrying.
-Werror=tautological-compare

diff --git a/dev/pmic/pmi8994/pm_app_smbchg.c b/dev/pmic/pmi8994/pm_app_smbchg.c
index 7ee25d97..603b811e 100644
--- a/dev/pmic/pmi8994/pm_app_smbchg.c
+++ b/dev/pmic/pmi8994/pm_app_smbchg.c
@@ -807,7 +807,7 @@ static void pm_app_pmi8994_read_voltage(uint32_t *voltage)
 	pm_comm_read_byte(sid, 0x4440, &val, 0);
 
 	//Request for FG access
-	if ((val & BIT(7)) != 1)
+	if (!(val & BIT(7)))
 		pm_comm_write_byte(sid, 0x4440, 0x80, 0);
 
 	pm_comm_read_byte(sid, 0x4410, &val, 0);

I think these two are correct.
-Werror=misleading-indentation

diff --git a/dev/qpnp_wled/qpnp_wled.c b/dev/qpnp_wled/qpnp_wled.c
index 9cedbb7e..f3652249 100644
--- a/dev/qpnp_wled/qpnp_wled.c
+++ b/dev/qpnp_wled/qpnp_wled.c
@@ -38,7 +38,7 @@ static int fls(uint16_t n)
 {
 	int i = 0;
 	for (; n; n >>= 1, i++);
-	  return i;
+	return i;
 }
 
 static struct qpnp_wled *gwled;

-Werror=misleading-indentation

diff --git a/app/tests/kauth_test.c b/app/tests/kauth_test.c
index e92922e7..0242dab0 100644
--- a/app/tests/kauth_test.c
+++ b/app/tests/kauth_test.c
@@ -121,7 +121,7 @@ void kauth_test(const char *arg, void *data, unsigned sz)
 	{
 		if (vboot_ret[i] != vboot_expected[i])
 			test_pass = false;
-			ret = i;
+		ret = i;
 	}
 
 	if (test_pass)

Now, this is serious, in crypto code. What is the intention?
-Werror=misleading-indentation

diff --git a/lib/openssl/crypto/asn1/t_x509.c b/lib/openssl/crypto/asn1/t_x509.c
index e061f2ff..37676fdc 100644
--- a/lib/openssl/crypto/asn1/t_x509.c
+++ b/lib/openssl/crypto/asn1/t_x509.c
@@ -296,7 +296,7 @@ int X509_signature_print(BIO *bp, X509_ALGOR *sigalg, ASN1_STRING *sig)
 		{
 		if ((i%18) == 0)
 			if (BIO_write(bp,"\n        ",9) <= 0) return 0;
-			if (BIO_printf(bp,"%02x%s",s[i],
+		if (BIO_printf(bp,"%02x%s",s[i],
 				((i+1) == n)?"":":") <= 0) return 0;
 		}
 	if (BIO_write(bp,"\n",1) != 1) return 0;

This break should be included or, if not, there should be a comment to explain why not.
-Werror=implicit-fallthrough=

diff --git a/dev/pmic/pmi8994/pm_smbchg_bat_if.c b/dev/pmic/pmi8994/pm_smbchg_bat_if.c
index 5e8253a2..ede72993 100644
--- a/dev/pmic/pmi8994/pm_smbchg_bat_if.c
+++ b/dev/pmic/pmi8994/pm_smbchg_bat_if.c
@@ -390,6 +390,7 @@ pm_err_flag_type pm_smbchg_bat_if_get_min_sys_volt(uint32 device_index, uint32 *
             break;
         case 0x01:
             *min_sys_millivolt = 354;
+            break;
         default:
             *min_sys_millivolt = 360;
         }

I think the next six are fairly clear. But still, it is crypto code!
-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/bf/bf_locl.h b/lib/openssl/crypto/bf/bf_locl.h
index cc7c3ec9..2ce56fd2 100644
--- a/lib/openssl/crypto/bf/bf_locl.h
+++ b/lib/openssl/crypto/bf/bf_locl.h
@@ -111,12 +111,19 @@
 			l1=l2=0; \
 			switch (n) { \
 			case 8: l2 =((unsigned long)(*(--(c))))    ; \
+				__attribute__ ((fallthrough))      ; \
 			case 7: l2|=((unsigned long)(*(--(c))))<< 8; \
+				__attribute__ ((fallthrough))      ; \
 			case 6: l2|=((unsigned long)(*(--(c))))<<16; \
+				__attribute__ ((fallthrough))      ; \
 			case 5: l2|=((unsigned long)(*(--(c))))<<24; \
+				__attribute__ ((fallthrough))      ; \
 			case 4: l1 =((unsigned long)(*(--(c))))    ; \
+				__attribute__ ((fallthrough))      ; \
 			case 3: l1|=((unsigned long)(*(--(c))))<< 8; \
+				__attribute__ ((fallthrough))      ; \
 			case 2: l1|=((unsigned long)(*(--(c))))<<16; \
+				__attribute__ ((fallthrough))      ; \
 			case 1: l1|=((unsigned long)(*(--(c))))<<24; \
 				} \
 			}
@@ -126,12 +133,19 @@
 			c+=n; \
 			switch (n) { \
 			case 8: *(--(c))=(unsigned char)(((l2)    )&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 7: *(--(c))=(unsigned char)(((l2)>> 8)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 6: *(--(c))=(unsigned char)(((l2)>>16)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 5: *(--(c))=(unsigned char)(((l2)>>24)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 4: *(--(c))=(unsigned char)(((l1)    )&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 3: *(--(c))=(unsigned char)(((l1)>> 8)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 2: *(--(c))=(unsigned char)(((l1)>>16)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 1: *(--(c))=(unsigned char)(((l1)>>24)&0xff); \
 				} \
 			}

-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/rc2/rc2_locl.h b/lib/openssl/crypto/rc2/rc2_locl.h
index 565cd176..472737c4 100644
--- a/lib/openssl/crypto/rc2/rc2_locl.h
+++ b/lib/openssl/crypto/rc2/rc2_locl.h
@@ -69,12 +69,19 @@
 			l1=l2=0; \
 			switch (n) { \
 			case 8: l2 =((unsigned long)(*(--(c))))<<24L; \
+				__attribute__ ((fallthrough))       ; \
 			case 7: l2|=((unsigned long)(*(--(c))))<<16L; \
+				__attribute__ ((fallthrough))       ; \
 			case 6: l2|=((unsigned long)(*(--(c))))<< 8L; \
+				__attribute__ ((fallthrough))       ; \
 			case 5: l2|=((unsigned long)(*(--(c))));     \
+				__attribute__ ((fallthrough))       ; \
 			case 4: l1 =((unsigned long)(*(--(c))))<<24L; \
+				__attribute__ ((fallthrough))       ; \
 			case 3: l1|=((unsigned long)(*(--(c))))<<16L; \
+				__attribute__ ((fallthrough))       ; \
 			case 2: l1|=((unsigned long)(*(--(c))))<< 8L; \
+				__attribute__ ((fallthrough))       ; \
 			case 1: l1|=((unsigned long)(*(--(c))));     \
 				} \
 			}
@@ -91,12 +98,19 @@
 			c+=n; \
 			switch (n) { \
 			case 8: *(--(c))=(unsigned char)(((l2)>>24L)&0xff); \
+				__attribute__ ((fallthrough))             ; \
 			case 7: *(--(c))=(unsigned char)(((l2)>>16L)&0xff); \
+				__attribute__ ((fallthrough))             ; \
 			case 6: *(--(c))=(unsigned char)(((l2)>> 8L)&0xff); \
+				__attribute__ ((fallthrough))             ; \
 			case 5: *(--(c))=(unsigned char)(((l2)     )&0xff); \
+				__attribute__ ((fallthrough))             ; \
 			case 4: *(--(c))=(unsigned char)(((l1)>>24L)&0xff); \
+				__attribute__ ((fallthrough))             ; \
 			case 3: *(--(c))=(unsigned char)(((l1)>>16L)&0xff); \
+				__attribute__ ((fallthrough))             ; \
 			case 2: *(--(c))=(unsigned char)(((l1)>> 8L)&0xff); \
+				__attribute__ ((fallthrough))             ; \
 			case 1: *(--(c))=(unsigned char)(((l1)     )&0xff); \
 				} \
 			}
@@ -107,12 +121,19 @@
 			l1=l2=0; \
 			switch (n) { \
 			case 8: l2 =((unsigned long)(*(--(c))))    ; \
+				__attribute__ ((fallthrough))      ; \
 			case 7: l2|=((unsigned long)(*(--(c))))<< 8; \
+				__attribute__ ((fallthrough))      ; \
 			case 6: l2|=((unsigned long)(*(--(c))))<<16; \
+				__attribute__ ((fallthrough))      ; \
 			case 5: l2|=((unsigned long)(*(--(c))))<<24; \
+				__attribute__ ((fallthrough))      ; \
 			case 4: l1 =((unsigned long)(*(--(c))))    ; \
+				__attribute__ ((fallthrough))      ; \
 			case 3: l1|=((unsigned long)(*(--(c))))<< 8; \
+				__attribute__ ((fallthrough))      ; \
 			case 2: l1|=((unsigned long)(*(--(c))))<<16; \
+				__attribute__ ((fallthrough))      ; \
 			case 1: l1|=((unsigned long)(*(--(c))))<<24; \
 				} \
 			}
@@ -122,12 +143,19 @@
 			c+=n; \
 			switch (n) { \
 			case 8: *(--(c))=(unsigned char)(((l2)    )&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 7: *(--(c))=(unsigned char)(((l2)>> 8)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 6: *(--(c))=(unsigned char)(((l2)>>16)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 5: *(--(c))=(unsigned char)(((l2)>>24)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 4: *(--(c))=(unsigned char)(((l1)    )&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 3: *(--(c))=(unsigned char)(((l1)>> 8)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 2: *(--(c))=(unsigned char)(((l1)>>16)&0xff); \
+				__attribute__ ((fallthrough))            ; \
 			case 1: *(--(c))=(unsigned char)(((l1)>>24)&0xff); \
 				} \
 			}

-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/bio/b_print.c b/lib/openssl/crypto/bio/b_print.c
index 143a7cfe..d89f051a 100644
--- a/lib/openssl/crypto/bio/b_print.c
+++ b/lib/openssl/crypto/bio/b_print.c
@@ -346,6 +346,7 @@ _dopr(
                 break;
             case 'E':
                 flags |= DP_F_UP;
+		__attribute__ ((fallthrough));
             case 'e':
                 if (cflags == DP_C_LDOUBLE)
                     fvalue = va_arg(args, LDOUBLE);
@@ -354,6 +355,7 @@ _dopr(
                 break;
             case 'G':
                 flags |= DP_F_UP;
+		__attribute__ ((fallthrough));
             case 'g':
                 if (cflags == DP_C_LDOUBLE)
                     fvalue = va_arg(args, LDOUBLE);

-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/bn/bn_lib.c b/lib/openssl/crypto/bn/bn_lib.c
index 5470fbe6..76de2474 100644
--- a/lib/openssl/crypto/bn/bn_lib.c
+++ b/lib/openssl/crypto/bn/bn_lib.c
@@ -362,8 +362,11 @@ static BN_ULONG *bn_expand_internal(const BIGNUM *b, int words)
 		switch (b->top&3)
 			{
 		case 3:	A[2]=B[2];
+			__attribute__ ((fallthrough));
 		case 2:	A[1]=B[1];
+			__attribute__ ((fallthrough));
 		case 1:	A[0]=B[0];
+			__attribute__ ((fallthrough));
 		case 0: /* workaround for ultrix cc: without 'case 0', the optimizer does
 		         * the switch table by doing a=top&3; a--; goto jump_table[a];
 		         * which fails for top== 0 */
@@ -517,8 +520,11 @@ BIGNUM *BN_copy(BIGNUM *a, const BIGNUM *b)
 	switch (b->top&3)
 		{
 		case 3: A[2]=B[2];
+			__attribute__ ((fallthrough));
 		case 2: A[1]=B[1];
+			__attribute__ ((fallthrough));
 		case 1: A[0]=B[0];
+			__attribute__ ((fallthrough));
 		case 0: ; /* ultrix cc workaround, see comments in bn_expand_internal */
 		}
 #else

-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/bn/bn_mul.c b/lib/openssl/crypto/bn/bn_mul.c
index ce7ca1d8..e61c23dc 100644
--- a/lib/openssl/crypto/bn/bn_mul.c
+++ b/lib/openssl/crypto/bn/bn_mul.c
@@ -168,9 +168,11 @@ BN_ULONG bn_sub_part_words(BN_ULONG *r,
 				case 1:
 					r[1] = a[1];
 					if (--dl <= 0) break;
+					__attribute__ ((fallthrough));
 				case 2:
 					r[2] = a[2];
 					if (--dl <= 0) break;
+					__attribute__ ((fallthrough));
 				case 3:
 					r[3] = a[3];
 					if (--dl <= 0) break;
@@ -264,9 +266,11 @@ BN_ULONG bn_add_part_words(BN_ULONG *r,
 				case 1:
 					r[1] = b[1];
 					if (++dl >= 0) break;
+					__attribute__ ((fallthrough));
 				case 2:
 					r[2] = b[2];
 					if (++dl >= 0) break;
+					__attribute__ ((fallthrough));
 				case 3:
 					r[3] = b[3];
 					if (++dl >= 0) break;
@@ -340,9 +344,11 @@ BN_ULONG bn_add_part_words(BN_ULONG *r,
 				case 1:
 					r[1] = a[1];
 					if (--dl <= 0) break;
+					__attribute__ ((fallthrough));
 				case 2:
 					r[2] = a[2];
 					if (--dl <= 0) break;
+					__attribute__ ((fallthrough));
 				case 3:
 					r[3] = a[3];
 					if (--dl <= 0) break;

-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/rc4/rc4_enc.c b/lib/openssl/crypto/rc4/rc4_enc.c
index 8c4fc6c7..14a25c60 100644
--- a/lib/openssl/crypto/rc4/rc4_enc.c
+++ b/lib/openssl/crypto/rc4/rc4_enc.c
@@ -187,12 +187,19 @@ void RC4(RC4_KEY *key, size_t len, const unsigned char *indata,
 				switch (len&(sizeof(RC4_CHUNK)-1))
 					{
 					case 7:	otp  = RC4_STEP<<i, i-=8;
+						__attribute__ ((fallthrough));
 					case 6:	otp |= RC4_STEP<<i, i-=8;
+						__attribute__ ((fallthrough));
 					case 5:	otp |= RC4_STEP<<i, i-=8;
+						__attribute__ ((fallthrough));
 					case 4:	otp |= RC4_STEP<<i, i-=8;
+						__attribute__ ((fallthrough));
 					case 3:	otp |= RC4_STEP<<i, i-=8;
+						__attribute__ ((fallthrough));
 					case 2:	otp |= RC4_STEP<<i, i-=8;
+						__attribute__ ((fallthrough));
 					case 1:	otp |= RC4_STEP<<i, i-=8;
+						__attribute__ ((fallthrough));
 					case 0: ; /*
 						   * it's never the case,
 						   * but it has to be here
@@ -240,12 +247,19 @@ void RC4(RC4_KEY *key, size_t len, const unsigned char *indata,
 				switch (len&(sizeof(RC4_CHUNK)-1))
 					{
 					case 7:	otp  = RC4_STEP,    i+=8;
+						__attribute__ ((fallthrough));
 					case 6:	otp |= RC4_STEP<<i, i+=8;
+						__attribute__ ((fallthrough));
 					case 5:	otp |= RC4_STEP<<i, i+=8;
+						__attribute__ ((fallthrough));
 					case 4:	otp |= RC4_STEP<<i, i+=8;
+						__attribute__ ((fallthrough));
 					case 3:	otp |= RC4_STEP<<i, i+=8;
+						__attribute__ ((fallthrough));
 					case 2:	otp |= RC4_STEP<<i, i+=8;
+						__attribute__ ((fallthrough));
 					case 1:	otp |= RC4_STEP<<i, i+=8;
+						__attribute__ ((fallthrough));
 					case 0: ; /*
 						   * it's never the case,
 						   * but it has to be here

But these three? What was the intention?
-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/evp/evp_enc.c b/lib/openssl/crypto/evp/evp_enc.c
index bead6a21..b6186e68 100644
--- a/lib/openssl/crypto/evp/evp_enc.c
+++ b/lib/openssl/crypto/evp/evp_enc.c
@@ -204,6 +204,7 @@ skip_to_init:
 			case EVP_CIPH_OFB_MODE:
 
 			ctx->num = 0;
+			__attribute__ ((fallthrough));
 
 			case EVP_CIPH_CBC_MODE:
 

-Werror=implicit-fallthrough=

diff --git a/lib/openssl/crypto/pkcs7/pk7_asn1.c b/lib/openssl/crypto/pkcs7/pk7_asn1.c
index b7ec2883..7a5705dc 100644
--- a/lib/openssl/crypto/pkcs7/pk7_asn1.c
+++ b/lib/openssl/crypto/pkcs7/pk7_asn1.c
@@ -90,6 +90,7 @@ static int pk7_cb(int operation, ASN1_VALUE **pval, const ASN1_ITEM *it,
 		case ASN1_OP_STREAM_PRE:
 		if (PKCS7_stream(&sarg->boundary, *pp7) <= 0)
 			return 0;
+		__attribute__ ((fallthrough));
 		case ASN1_OP_DETACHED_PRE:
 		sarg->ndef_bio = PKCS7_dataInit(*pp7, sarg->out);
 		if (!sarg->ndef_bio)

-Werror=implicit-fallthrough=
-Werror=shift-negative-value

diff --git a/lib/zlib_inflate/inflate.c b/lib/zlib_inflate/inflate.c
index e3413009..0292e237 100644
--- a/lib/zlib_inflate/inflate.c
+++ b/lib/zlib_inflate/inflate.c
@@ -818,6 +818,7 @@ int flush;
             strm->adler = state->check = ZSWAP32(hold);
             INITBITS();
             state->mode = DICT;
+	    __attribute__ ((fallthrough));
         case DICT:
             if (state->havedict == 0) {
                 RESTORE();
@@ -825,8 +826,10 @@ int flush;
             }
             strm->adler = state->check = adler32(0L, Z_NULL, 0);
             state->mode = TYPE;
+	    __attribute__ ((fallthrough));
         case TYPE:
             if (flush == Z_BLOCK || flush == Z_TREES) goto inf_leave;
+	    __attribute__ ((fallthrough));
         case TYPEDO:
             if (state->last) {
                 BYTEBITS();
@@ -877,8 +880,10 @@ int flush;
             INITBITS();
             state->mode = COPY_;
             if (flush == Z_TREES) goto inf_leave;
+	    __attribute__ ((fallthrough));
         case COPY_:
             state->mode = COPY;
+	    __attribute__ ((fallthrough));
         case COPY:
             copy = state->length;
             if (copy) {
@@ -1018,8 +1023,10 @@ int flush;
             Tracev((stderr, "inflate:       codes ok\n"));
             state->mode = LEN_;
             if (flush == Z_TREES) goto inf_leave;
+	    __attribute__ ((fallthrough));
         case LEN_:
             state->mode = LEN;
+	    __attribute__ ((fallthrough));
         case LEN:
             if (have >= 6 && left >= 258) {
                 RESTORE();
@@ -1069,6 +1076,7 @@ int flush;
             }
             state->extra = (unsigned)(here.op) & 15;
             state->mode = LENEXT;
+	    __attribute__ ((fallthrough));
         case LENEXT:
             if (state->extra) {
                 NEEDBITS(state->extra);
@@ -1079,6 +1087,7 @@ int flush;
             Tracevv((stderr, "inflate:         length %u\n", state->length));
             state->was = state->length;
             state->mode = DIST;
+	    __attribute__ ((fallthrough));
         case DIST:
             for (;;) {
                 here = state->distcode[BITS(state->distbits)];
@@ -1106,6 +1115,7 @@ int flush;
             state->offset = (unsigned)here.val;
             state->extra = (unsigned)(here.op) & 15;
             state->mode = DISTEXT;
+	    __attribute__ ((fallthrough));
         case DISTEXT:
             if (state->extra) {
                 NEEDBITS(state->extra);
@@ -1122,6 +1132,7 @@ int flush;
 #endif
             Tracevv((stderr, "inflate:         distance %u\n", state->offset));
             state->mode = MATCH;
+	    __attribute__ ((fallthrough));
         case MATCH:
             if (left == 0) goto inf_leave;
             copy = out - left;
@@ -1210,6 +1221,7 @@ int flush;
             }
 #endif
             state->mode = DONE;
+	    __attribute__ ((fallthrough));
         case DONE:
             ret = Z_STREAM_END;
             goto inf_leave;
@@ -1506,7 +1518,8 @@ z_streamp strm;
 {
     struct inflate_state FAR *state;
 
-    if (strm == Z_NULL || strm->state == Z_NULL) return -1L << 16;
+    /*if (strm == Z_NULL || strm->state == Z_NULL) return -1L << 16;*/
+    if (strm == Z_NULL || strm->state == Z_NULL) return 0xFFFF0000;
     state = (struct inflate_state FAR *)strm->state;
     return ((long)(state->back) << 16) +
         (state->mode == COPY ? state->length :

Now, the following actually creates the error that the warning is intended to guard against, at platform/msm_shared/partition_parser.c:1087.
-Werror=multistatement-macros

diff --git a/platform/msm_shared/include/partition_parser.h b/platform/msm_shared/include/partition_parser.h
index af69e033..86af786e 100644
--- a/platform/msm_shared/include/partition_parser.h
+++ b/platform/msm_shared/include/partition_parser.h
@@ -132,19 +132,19 @@
             ((uint32_t)*(x+2) << 16) | \
             ((uint32_t)*(x+3) << 24))
 
-#define PUT_LONG(x, y)   *(x) = y & 0xff;     \
+#define PUT_LONG(x, y) do { *(x) = y & 0xff;     \
     *(x+1) = (y >> 8) & 0xff;     \
     *(x+2) = (y >> 16) & 0xff;    \
-    *(x+3) = (y >> 24) & 0xff;
+    *(x+3) = (y >> 24) & 0xff; } while (0)
 
-#define PUT_LONG_LONG(x,y)    *(x) =(y) & 0xff; \
+#define PUT_LONG_LONG(x,y) do { *(x) =(y) & 0xff; \
      *((x)+1) = (((y) >> 8) & 0xff);    \
      *((x)+2) = (((y) >> 16) & 0xff);   \
      *((x)+3) = (((y) >> 24) & 0xff);   \
      *((x)+4) = (((y) >> 32) & 0xff);   \
      *((x)+5) = (((y) >> 40) & 0xff);   \
      *((x)+6) = (((y) >> 48) & 0xff);   \
-     *((x)+7) = (((y) >> 56) & 0xff);
+     *((x)+7) = (((y) >> 56) & 0xff); } while (0)
 
 /* Unified mbr and gpt entry types */
 struct partition_entry {

I presume this is what is intended.
-Werror=switch-unreachable

diff --git a/platform/msm_shared/partition_parser.c b/platform/msm_shared/partition_parser.c
index 65d58d4a..ddd376cc 100644
--- a/platform/msm_shared/partition_parser.c
+++ b/platform/msm_shared/partition_parser.c
@@ -802,8 +802,8 @@ unsigned int write_partition(unsigned size, unsigned char *partition)
 static void
 mbr_fill_name(struct partition_entry *partition_ent, unsigned int type)
 {
+	memset(partition_ent->name, 0, MAX_GPT_NAME_SIZE);
 	switch (type) {
-		memset(partition_ent->name, 0, MAX_GPT_NAME_SIZE);
 	case MBR_MODEM_TYPE:
 	case MBR_MODEM_TYPE2:
 		/* if already assigned last name available then return */

Finally, the incompatable function types.
-Werror=cast-function-type

diff --git a/platform/msm_shared/glink/glink_core_intentless_xport.c b/platform/msm_shared/glink/glink_core_intentless_xport.c
index 2feba42a..7fd4ad4c 100644
--- a/platform/msm_shared/glink/glink_core_intentless_xport.c
+++ b/platform/msm_shared/glink/glink_core_intentless_xport.c
@@ -84,7 +84,7 @@ LOCAL FUNCTION DEFINITIONS
 ===========================================================================*/
 
 /*===========================================================================
-  FUNCTION      glink_core_stub_intentless
+  FUNCTION      glink_channel_init_stub_intentless
 ===========================================================================*/
 /**
 
@@ -96,11 +96,29 @@ LOCAL FUNCTION DEFINITIONS
 */
 /*=========================================================================*/
 
-static glink_err_type glink_core_stub_intentless(void)
+static glink_err_type glink_channel_init_stub_intentless(glink_channel_ctx_type *arg)
 {
   return GLINK_STATUS_SUCCESS;
 }
 
+/*===========================================================================
+  FUNCTION      glink_channel_cleanup_stub_intentless
+===========================================================================*/
+/**
+
+  Stub for intentless transport functionality.
+
+  @return     void
+
+  @sideeffects  None.
+*/
+/*=========================================================================*/
+
+static void glink_channel_cleanup_stub_intentless(glink_channel_ctx_type *arg)
+{
+  return ;
+}
+
 /*===========================================================================
   FUNCTION      glink_verify_open_cfg_intentless
 ===========================================================================*/
@@ -220,8 +238,8 @@ void glink_core_setup_intentless_xport(glink_transport_if_type *if_ptr)
 {
   if_ptr->glink_core_if_ptr = glink_core_get_intentless_interface();
   if_ptr->glink_core_priv->verify_open_cfg = glink_verify_open_cfg_intentless;
-  if_ptr->glink_core_priv->channel_init = (channel_init_fn)glink_core_stub_intentless;
-  if_ptr->glink_core_priv->channel_cleanup = (channel_cleanup_fn)glink_core_stub_intentless;
+  if_ptr->glink_core_priv->channel_init = glink_channel_init_stub_intentless;
+  if_ptr->glink_core_priv->channel_cleanup = glink_channel_cleanup_stub_intentless;
   if_ptr->glink_core_priv->channel_submit_pkt = glink_channel_submit_pkt_intentless;
   if_ptr->glink_core_priv->channel_receive_pkt = glink_channel_receive_pkt_intentless;
 }

-Werror=cast-function-type

diff --git a/platform/msm_shared/i2c_qup.c b/platform/msm_shared/i2c_qup.c
index dc8481de..560b7dd0 100644
--- a/platform/msm_shared/i2c_qup.c
+++ b/platform/msm_shared/i2c_qup.c
@@ -146,7 +146,7 @@ static inline void qup_print_status(struct qup_i2c_dev *dev)
 }
 #endif
 
-static irqreturn_t qup_i2c_interrupt(void)
+static irqreturn_t qup_i2c_interrupt(void *arg)
 {
 	struct qup_i2c_dev *dev = dev_addr;
 	if (!dev) {

-Werror=cast-function-type

diff --git a/platform/msm_shared/include/glink_os_type.h b/platform/msm_shared/include/glink_os_type.h
index c904ee4f..042a15d0 100644
--- a/platform/msm_shared/include/glink_os_type.h
+++ b/platform/msm_shared/include/glink_os_type.h
@@ -43,6 +43,7 @@ IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #include <sys/types.h>
 #include <string.h>
 #include <assert.h>
+#include "glink.h"
 
 /*===========================================================================
                         MACRO DEFINITIONS
@@ -87,7 +88,8 @@ typedef struct
   DALSYSEventObj    dal_obj_memory;
 }os_event_type;
 
-typedef void ( *os_isr_cb_fn )( void *cb_data );
+/*typedef void ( *os_isr_cb_fn )( void *cb_data );*/
+typedef glink_err_type (*os_isr_cb_fn)(void *cb_data);
 
 typedef struct os_ipc_intr_struct
 {

-Werror=cast-function-type

diff --git a/platform/msm_shared/shutdown_detect.c b/platform/msm_shared/shutdown_detect.c
index 9bde63d6..bac61dd3 100644
--- a/platform/msm_shared/shutdown_detect.c
+++ b/platform/msm_shared/shutdown_detect.c
@@ -90,7 +90,7 @@ static uint32_t is_pwrkey_pon_reason()
  * (PWRKEY_LONG_PRESS_COUNT/MPM_SLEEP_TIMETICK_COUNT) seconds.
  */
 static enum handler_return long_press_pwrkey_timer_func(struct timer *p_timer,
-	void *arg)
+	time_t arg1, void *arg2)
 {
 	uint32_t sclk_count = platform_get_sclk_count();
 

-Werror=cast-function-type

diff --git a/platform/msm_shared/spmi.c b/platform/msm_shared/spmi.c
index 72ec7cb1..1ddf034b 100644
--- a/platform/msm_shared/spmi.c
+++ b/platform/msm_shared/spmi.c
@@ -363,7 +363,7 @@ int spmi_acc_irq(uint32_t periph_acc_irq, uint32_t status)
 		return 0;
 }
 
-void spmi_irq()
+enum handler_return spmi_irq(void *arg)
 {
 	int i;
 	uint32_t status;
@@ -377,9 +377,10 @@ void spmi_irq()
 		if (status)
 			if (!spmi_acc_irq(i, status))
 				/* Not the correct interrupt, continue to wait */
-				return;
+				return 0;
 	}
 	mask_interrupt(EE0_KRAIT_HLOS_SPMI_PERIPH_IRQ);
+	return 0;
 }
 
 /* Enable interrupts on a particular peripheral: periph_id */

HI,

I am very interested on this. Could you share your code as a git repo somewhere? It will be much easier to review/test?

Thanks!

Hi ribalda. It was a post of yours that set me on this path.:slight_smile:

I am aiming for LinuxBoot as it seems the most flexible and open project.

This stuff is all new to me. The last time I did any kernel work was about 15 years ago so everything I am doing now is like starting from scratch. There was no git in those days.

The situation at present, as I see it, is that LK is acting as the interface between the android world and the linux world.
As LK is loaded by the secondary boot loader, which is proprietory and firmly in the android world, it is going to have to be built using the six year old Qualcomm tool chain. Unless ARM or Linaro release something suitable.

I am currently replacing the attribute ((fallthrough)) inserts with comments as gcc-4.8 does not define this attribute. Then I will compile LK with the old toolchain and see if I can get some more information on Glink from it.
Curiously, the makefile default value for DEBUG is 0 but the official build contains INFO dprintf strings (DEBUG=1) but not SPEW strings (DEBUG=2).

Then it is a matter of trawling through the LK sourcecode to find what it is doing that the kernel or purgatory kexec code is not.

If I come up with something usefull I will try to publish it somewhere. But that is another thing I have not done before so will take some time.

The LK I built using:

$ export PATH=/path-to-toolchain/arm-eabi-4.8/bin:$PATH
$ export DEBUG=2
$ export VERIFIED_BOOT=1
$ make msm8996 clean
$ make -k -j4 msm8996
$ mv build-msm8996/emmc_appsboot.mbn build-msm8996/emmc_appsboot_unsigned.mbn
$ ./signlk/signlk.sh -i=./build-msm8996/emmc_appsboot_unsigned.mbn -o=./build-msm8996/emmc_appsboot.mbn

boots and boots the Linux kernel

D -     34770 - APPSBL Image Loaded, Delta - (748772 Bytes)
B -    807609 - SBL1, End
D -    724863 - SBL1, Delta
S - Flash Throughput, 95000 KB/s  (2959080 Bytes,  31068 us)
S - DDR Frequency, 1017 MHz
Android Bootloader - UART_DM Initialized!!!
[0] BUILD_VERSION=debian-qcom-dragonboard820c-LA_HB_1_3_2-19600-8x96_0-dirty
[0] BUILD_DATE=11:43:02 - Aug  5 2019
[0] welcome to lk

But does not report anything more than the official build. Moreover, emmc_appsboot_unsigned.mbn contains INFO dprintf strings but not the SPEW strings I was hoping to see. So setting DEBUG=2 does not seem to have had any effect, or is being overridden, or I have missunderstood its purpose.
Perhaps this is the offending code in ./project/msm8996.mk

ifeq ($(TARGET_BUILD_VARIANT),user)
DEBUG := 0
else
DEBUG := 1
endif

So we comment it out and:

D -     35563 - APPSBL Image Loaded, Delta - (769612 Bytes)
B -    804041 - SBL1, End
D -    721295 - SBL1, Delta
S - Flash Throughput, 96000 KB/s  (2979920 Bytes,  31038 us)
S - DDR Frequency, 1017 MHz
Android Bootloader - UART_DM Initialized!!!
[0] BUILD_VERSION=debian-qcom-dragonboard820c-LA_HB_1_3_2-19600-8x96_0-dirty
[0] BUILD_DATE=17:06:27 - Aug  5 2019
[0] welcome to lk

[0] calling constructors
[0] initializing heap
[0] initializing threads
[0] initializing dpc
[0] initializing timers
[0] creating bootstrap completion thread
[10] top of bootstrap2()
[10] initializing platform
[10] platform_init()
[10] initializing target
[10] target_init()
[10] RPM GLink Init
[20] xport_rpm_init:1025: RPM Transport INIT
[20] xport_rpm_init:1061: Initialize Edges
[20] Register interrupt: 200
[30] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[30] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[30] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[30] Opening RPM Glink Port success
[30] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[30] Opening SSR Glink Port success
[40] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[40] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[40] Glink Connection between APPS and RPM established
[40] Glink Connection between APPS and RPM established
[60] GPIO 2 status is 1
[60] UFS init success
[100] qseecom_init called
[100] Qseecom Init Done in Appsbl
[100] secure app region addr=0x86600000 size=0x2200000[110] qseecom_scm_call called
[110] qseecom_scm_call2 called
[110] allocate_extra_arg_buffer called
[120] allocate_extra_arg_buffer:fn_id:838861061, desc->arginfo:34 desc->args[0]:2254438400 desc->args[1]:35651584 desc->args[2]:0 desc->args[3]:0 desc->args[4]:0
[130] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:0 desc->ret[2]:0
[140] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0xd, smc_id = 0x32000105, param_id = 0x22
[150] scm_resp->result = 0x0, scm_resp->resp_type = 0x0, scm_resp->data = 0x0
[160] TZ App region notif returned with status:0 addr:86600000 size:35651584
[160] __qseecom_uvirt_to_kphys called
[170] qseecom_scm_call called
[170] qseecom_scm_call2 called
[170] allocate_extra_arg_buffer called
[180] allocate_extra_arg_buffer:fn_id:838861062, desc->arginfo:34 desc->args[0]:2439876608 desc->args[1]:4096 desc->args[2]:0 desc->args[3]:0 desc->args[4]:0
[190] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:0 desc->ret[2]:0
[200] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0xe, smc_id = 0x32000106, param_id = 0x22
[200] scm_resp->result = 0x0, scm_resp->resp_type = 0x0, scm_resp->data = 0x0
[210] _disp_log_stats called
[210] TZ App log region register returned with status:0 addr:916d9000 size:4096
[220] Qseecom TZ Init Done in Appsbl
[230] qseecom_register_listener called
[230] __qseecom_check_listener_exists called
[230] __qseecom_uvirt_to_kphys called
[240] qseecom_scm_call called
[240] qseecom_scm_call2 called
[240] allocate_extra_arg_buffer called
[240] allocate_extra_arg_buffer:fn_id:838861313, desc->arginfo:131 desc->args[0]:8192 desc->args[1]:2439884800 desc->args[2]:25600 desc->args[3]:0 desc->args[4]:0
[260] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:0 desc->ret[2]:0
[270] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0x4, smc_id = 0x32000201, param_id = 0x83
[270] scm_resp->result = 0x0, scm_resp->resp_type = 0x0, scm_resp->data = 0x0
[280] _disp_log_stats called
[280] qseecom_start_app called
[290] qseecom_load_commonlib_image called
[290] __qseecom_uvirt_to_kphys called
[300] qseecom_scm_call called
[300] qseecom_scm_call2 called
[300] QSEE_LOAD_SERV_IMAGE_COMMAND mdt_len:0 img_len:262144 phy_addr:2439917568
[310] allocate_extra_arg_buffer called
[310] allocate_extra_arg_buffer:fn_id:838861063, desc->arginfo:3 desc->args[0]:0 desc->args[1]:262144 desc->args[2]:2439917568 desc->args[3]:0 desc->args[4]:0
[340] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:0 desc->ret[2]:0
[350] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0xb, smc_id = 0x32000107, param_id = 0x3
[350] scm_resp->result = 0x0, scm_resp->resp_type = 0x0, scm_resp->data = 0x0
[360] _disp_log_stats called
[360] Loading cmnlib done
[370] __qseecom_check_app_exists called
[370] qseecom_start_app: Loading app keymaster for the first time'
[370] __qseecom_load_app called
[380] __qseecom_uvirt_to_kphys called
[390] phy_addr:2439917568 img_len:524288
[390] qseecom_scm_call called
[390] qseecom_scm_call2 called
[390] args[0]:0 args[1]:524288 args[2]:2439917568
[400] mdt_len:0 img_len:524288 phy_addr:2439917568
[400] allocate_extra_arg_buffer called
[410] allocate_extra_arg_buffer:fn_id:838861057, desc->arginfo:3 desc->args[0]:0 desc->args[1]:524288 desc->args[2]:2439917568 desc->args[3]:0 desc->args[4]:0
[440] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:60929 desc->ret[2]:4
[450] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0x1, smc_id = 0x32000101, param_id = 0x3
[450] scm_resp->result = 0x0, scm_resp->resp_type = 0xee01, scm_resp->data = 0x4
[460] _disp_log_stats called
[460] <8>keymaster: "\"KEYMASTER Init \""
[470] __qseecom_add_app_entry called
[470] __qseecom_add_app_entry: Adding app:keymaster app_id:4 to list
[480] calling apps_init()
[490] Setting display_panel to none
[490] Display Init: Start
[490] Selected panel: none
Skip panel configuration
[500] Display Init: Done
[500] serial number: 683393bf
[500] poff_reason1: 0
[500] poff_reason2: 0
[500] pm8x41_get_is_cold_boot: cold boot
[510] Unable to locate /bootselect partition
[520] boot_verifier: Device is in ORANGE boot state.
[520] Device is unlocked! Skipping verification...
[520] Loading (boot) image (13623296): start
[580] Loading (boot) image (13623296): done
[580] use_signed_kernel=1, is_unlocked=1, is_tampered=0.
[590] Your device has been unlocked and can't be trusted.
[590] mdtp: mdtp_img loaded
[600] mdtp: is_test_mode: test mode is set to 1
[600] mdtp: read_metadata: SUCCESS 
[620] LK SEC APP Handle: 0x1
[620] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[620] Received SUCCESS REQ ACK 
[620] Return value from recv_data: 14
[630] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[630] Received SUCCESS REQ ACK 
[630] Return value from recv_data: 14
[640] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[640] Received SUCCESS REQ ACK 
[640] Return value from recv_data: 14
[650] Offset value is 8 
[650] Offset value is 16 
[650] Data length for BAM transfer is 36
[650] Data length for BAM transfer is 128
[660] Offset value is 24 
[660] Offset value is 8 
[660] Offset value is 32 
[660] Digest: [670] 0x5c11bc11 [670] 0x9c322602 [670] 0xfea7359a [670] 0x1f9cd2e1 [670] 0xf7e04852 [670] 0x82571d84 [670] 0x8227509b [680] 0x13911463 [680] 
[680] Sending Root of Trust to trustzone: start
[680] qseecom_send_command called
[680] __qseecom_check_handle_exists called
[690] __qseecom_send_cmd called
[690] __qseecom_uvirt_to_kphys called
[690] __qseecom_uvirt_to_kphys called
[700] qseecom_scm_call called
[700] qseecom_scm_call2 called
[700] allocate_extra_arg_buffer called
[710] allocate_extra_arg_buffer:fn_id:805306369, desc->arginfo:2181 desc->args[0]:4 desc->args[1]:2443182080 desc->args[2]:44 desc->args[3]:2443186176 desc->args[4]:4
[720] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:60929 desc->ret[2]:4
[730] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0x6, smc_id = 0x30000001, param_id = 0x885
[740] scm_resp->result = 0x0, scm_resp->resp_type = 0xee01, scm_resp->data = 0x4
[750] _disp_log_stats called
[750] sending cmd_req->rsp size: 4, ptr: 0x0x916d13f4
[750] Sending Root of Trust to trustzone: end
[760] decompressing kernel image: start
[990] decompressing kernel image: done
[1000] qcom,msm-id entry not found
[1000] Only one appended non-skales DTB, select it.
[1000] cmdline: root=/dev/disk/by-partlabel/rootfs rw rootwait console=tty0 console=ttyMSM0,115200n8 androidboot.bootdevice=624000.ufshc androidboot.verifiedbootstate=orange androidboot.veritymode=enforcing androidboot.serialno=683393bf androidboot.baseband=apq [1030] Updating device tree: start
[1030] smem ram ptable found: ver: 2 len: 5
[1030] Updating device tree: done
[1040] qseecom_send_command called
[1040] __qseecom_check_handle_exists called
[1040] __qseecom_send_cmd called
[1050] __qseecom_uvirt_to_kphys called
[1050] __qseecom_uvirt_to_kphys called
[1050] qseecom_scm_call called
[1050] qseecom_scm_call2 called
[1060] allocate_extra_arg_buffer called
[1060] allocate_extra_arg_buffer:fn_id:805306369, desc->arginfo:2181 desc->args[0]:4 desc->args[1]:2443182080 desc->args[2]:4 desc->args[3]:2443186176 desc->args[4]:4
[1080] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:60929 desc->ret[2]:4
[1080] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0x6, smc_id = 0x30000001, param_id = 0x885
[1090] scm_resp->result = 0x0, scm_resp->resp_type = 0xee01, scm_resp->data = 0x4
[1100] _disp_log_stats called
[1100] sending cmd_req->rsp size: 4, ptr: 0x0x916d13f0
[1110] Offset value is 40 
[1110] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[1120] Received SUCCESS REQ ACK 
[1120] Return value from recv_data: 14
[1130] RPM GLINK UnInit
[1130] xport_rpm_send_event:184: Notify RPM with IPC interrupt
[1130] qseecom_deregister_listener called
[1140] __qseecom_check_listener_exists called
[1140] qseecom_scm_call called
[1140] qseecom_scm_call2 called
[1150] allocate_extra_arg_buffer called
[1150] allocate_extra_arg_buffer:fn_id:838861314, desc->arginfo:1 desc->args[0]:8192 desc->args[1]:0 desc->args[2]:0 desc->args[3]:0 desc->args[4]:0
[1160] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:0 desc->ret[2]:0
[1170] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0x5, smc_id = 0x32000202, param_id = 0x1
[1180] scm_resp->result = 0x0, scm_resp->resp_type = 0x0, scm_resp->data = 0x0
[1180] _disp_log_stats called
[1190] qseecom_exit called
[1190] Qseecom De-Init Done in Appsbl
[1190] booting linux @ 0x80080000, ramdisk @ 0x84000000 (5675133), tags/device tree @ 0x83e00000
[1200] Jumping to kernel via monitor
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.14.0-qcomlt-arm64 (abuild@worksonarm) (gcc version 8.2.0 (Debian 8.2.0-15)) #1 SMP PREEMPT Wed Jan 30 04:14:16 UTC 2019

Its a start.:slight_smile:

1 Like

Adding the following to ./project/msm8996.mk gives a few more messages from Glink

# Get more Glink debug messages
DEFINES += DEBUG_GLINK=1

The message 'RPM GLINK UnInit, indicates LK calls
./platform/msm_shared/rpm-glink.c:rpm_glink_uninit
just before jumping to the kernel.
It sends it to ssr_glink_port which seems to be a separate channel for teardown.
I do not think the kernel kexec code does this. So that might explain the hang if the second kernel tries to open a glink channel to the RPM when the RPM thinks it is already open.
So I made the following mods to LK, (I should have reduced the loop count as well.:slight_smile: ):

diff --git a/platform/msm_shared/rpm-glink.c b/platform/msm_shared/rpm-glink.c
index f564480a..66a87126 100644
--- a/platform/msm_shared/rpm-glink.c
+++ b/platform/msm_shared/rpm-glink.c
@@ -370,7 +370,8 @@ void rpm_glink_uninit()
 {
 	rpm_ssr_req req;
 	glink_err_type ret;
-	uint32_t len_to_rpm, loop = 100000;
+	/*uint32_t len_to_rpm, loop = 100000;*/
+	uint32_t loop = 100000;
 
 	// update ssr request
 	req.version = 0;
@@ -379,14 +380,14 @@ void rpm_glink_uninit()
 	memset(req.name, 0, sizeof(req.name));
 	strlcpy(req.name, "apss", sizeof(req.name));
 	req.namelength = strlen(req.name);
-	len_to_rpm = sizeof(rpm_ssr_req);
+	/*len_to_rpm = sizeof(rpm_ssr_req);*/
 	dprintf(INFO, "RPM GLINK UnInit\n");
-	ret = glink_tx(ssr_glink_port, NULL, (const void *)&req, len_to_rpm, 0);
-
+	/*ret = glink_tx(ssr_glink_port, NULL, (const void *)&req, len_to_rpm, 0);*/
+	ret = 1;
 	if (ret)
 	{
 		dprintf(CRITICAL, "Glink SSR Channel: Tx for link tear down request failure with error code: 0x%x\n", ret);
-		ASSERT(0);
+		/*ASSERT(0);*/
 	}
 
 #ifdef DEBUG_GLINK
@@ -402,6 +403,6 @@ void rpm_glink_uninit()
 	if (!loop)
 	{
 		dprintf(INFO, "%s:%d, Tearing down Glink SSR Channel Timed out\n", __func__, __LINE__);
-		ASSERT(0);
+		/*ASSERT(0);*/
 	}
 }

I flashed it to the board and booted the kernel with fastboot:

# fastboot --cmdline 'maxcpus=1 root=/dev/disk/by-partlabel/rootfs rw rootwait console=tty0 console=ttyMSM0,115200n8 earlycon initcall_debug debug ignore_loglevel dynamic_debug.verbose=1 dyndbg="file regmap-mmio.c +p; file regmap.c +p; file qcom-apcs-ipc-mailbox.c +p; file mailbox.c +p; file qcom_glink_native.c +p; file qcom_glink_rpm.c +p; file dd.c +p; file bus.c +p; file driver.c +p; file platform.c +p"' boot boot.img

with the following result:
First from LK:

[6530] RPM GLINK UnInit
[6530] Glink SSR Channel: Tx for link tear down request failure with error code: 0x1
[6540] rpm_glink_uninit:394, Wait till we receive response from RPM
[106950] rpm_glink_uninit:405, Tearing down Glink SSR Channel Timed out
[106950] qseecom_deregister_listener called
[106960] __qseecom_check_listener_exists called
[106960] qseecom_scm_call called
[106960] qseecom_scm_call2 called
[106970] allocate_extra_arg_buffer called
[106970] allocate_extra_arg_buffer:fn_id:838861314, desc->arginfo:1 desc->args[0]:8192 desc->args[1]:0 desc->args[2]:0 desc->args[3]:0 desc->args[4]:0
[106980] allocate_extra_arg_buffer:ret:0, desc->ret[0]]:0 desc->ret[1]:0 desc->ret[2]:0
[106990] svc_id = 0xfc, tz_cmd_id = 0x1, qseos_cmd_id = 0x5, smc_id = 0x32000202, param_id = 0x1
[107000] scm_resp->result = 0x0, scm_resp->resp_type = 0x0, scm_resp->data = 0x0
[107010] _disp_log_stats called
[107010] qseecom_exit called
[107010] Qseecom De-Init Done in Appsbl
[107020] booting linux @ 0x80080000, ramdisk @ 0x84000000 (17662594), tags/device tree @ 0x83e00000
[107030] Jumping to kernel via monitor
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x511f2112]
[    0.000000] Linux version 5.2.0-02469-g44b4395655ff-dirty (dixon@computer2) (gcc version 8.3.0 (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36))) #2 SMP PREEMPT Mon Jul 29 10:38:30 BST 2019

then the tail of the kernel log:

[    9.297308] EDAC MC: Ver: 3.0.0
[    9.301518] bus: 'edac': registered
[    9.304397] bus: 'edac': add device mc
[    9.308359] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[    9.316027] initcall edac_init+0x0/0x90 returned 0 after 15625 usecs
[    9.317429] calling  mmc_init+0x0/0x48 @ 1
[    9.324223] bus: 'mmc': registered
[    9.328632] bus: 'sdio': registered
[    9.331119] initcall mmc_init+0x0/0x48 returned 0 after 3906 usecs
[    9.334807] calling  leds_init+0x0/0x68 @ 1
[    9.341093] initcall leds_init+0x0/0x68 returned 0 after 0 usecs
[    9.345183] calling  qcom_scm_init+0x0/0x20 @ 1
[    9.351229] __platform_driver_register
[    9.355474] driver_register
[    9.359140] bus: 'platform': add driver qcom_scm
[    9.361999] bus_add_driver
[    9.366794] driver_attach
[    9.369207] bus_for_each_dev
[    9.372018] __driver_attach
[    9.374851] device_driver_attach
[    9.377468] driver_probe_device
[    9.380845] bus: 'platform': driver_probe_device: matched device firmware:scm with driver qcom_scm
[    9.383815] really_probe_debug
[    9.392773] bus: 'platform': really_probe: probing driver qcom_scm with device firmware:scm
[    9.395917] really_probe
[    9.404358] driver: 'qcom_scm': driver_bound: bound to device 'firmware:scm'
[    9.407095] bus: 'platform': really_probe: bound device firmware:scm to driver qcom_scm
[    9.414176] probe of firmware:scm returned 1 after 12000 usecs
[    9.422243] initcall qcom_scm_init+0x0/0x20 returned 0 after 39062 usecs
[    9.427867] calling  scmi_bus_init+0x0/0x44 @ 1
[    9.434716] bus: 'scmi_protocol': registered
[    9.438928] initcall scmi_bus_init+0x0/0x44 returned 0 after 3906 usecs
[    9.443257] calling  scmi_clock_init+0x0/0x20 @ 1
[    9.449632] initcall scmi_clock_init+0x0/0x20 returned 20 after 0 usecs
[    9.454450] calling  scmi_perf_init+0x0/0x20 @ 1
[    9.460833] initcall scmi_perf_init+0x0/0x20 returned 19 after 0 usecs
[    9.465698] calling  scmi_power_init+0x0/0x20 @ 1
[    9.472016] initcall scmi_power_init+0x0/0x20 returned 17 after 0 usecs
[    9.476748] calling  scmi_sensors_init+0x0/0x20 @ 1
[    9.483157] initcall scmi_sensors_init+0x0/0x20 returned 21 after 0 usecs
[    9.488033] calling  glink_rpm_init+0x0/0x20 @ 1
[    9.494956] __platform_driver_register
[    9.499643] driver_register
[    9.503151] bus: 'platform': add driver qcom_glink_rpm
[    9.505917] bus_add_driver
[    9.511083] driver_attach
[    9.513738] bus_for_each_dev
[    9.516514] __driver_attach
[    9.519382] device_driver_attach
[    9.521938] driver_probe_device
[    9.525374] bus: 'platform': driver_probe_device: matched device rpm-glink with driver qcom_glink_rpm
[    9.528348] really_probe_debug
[    9.537650] bus: 'platform': really_probe: probing driver qcom_glink_rpm with device rpm-glink
[    9.540719] really_probe
[    9.549319] glink_rpm_probe ffff000012cb0804 ffff000012cb0c00
[    9.552109] qcom_glink_native_probe
[    9.557547] qcom_glink_send_version
[    9.560816] qcom_glink_tx
[    9.564264] mbox_send_message
[    9.567042] msg_submit
[    9.569986] qcom_apcs_ipc_send_data
[    9.572268] regmap_write
[    9.575632] _regmap_write
[    9.578406] _regmap_bus_reg_write
[    9.580930] regmap_mmio_write
[    9.584226] regmap_mmio_write32le ffff00001000d000 10 1
[    9.587360] qcom_glink_rpm rpm-glink: Invalid open ack packet
[    9.592301] qcom_glink_rpm rpm-glink: Invalid open ack packet
[    9.598253] qcom_glink_tx
[    9.603846] mbox_send_message
[    9.606450] msg_submit
[    9.609395] qcom_apcs_ipc_send_data
[    9.611665] regmap_write
[    9.615040] _regmap_write
[    9.617815] _regmap_bus_reg_write
[    9.620340] regmap_mmio_write
[    9.623636] regmap_mmio_write32le ffff00001000d000 10 1
[    9.626693] driver: 'qcom_glink_rpm': driver_bound: bound to device 'rpm-glink'
[    9.631965] bus: 'platform': really_probe: bound device rpm-glink to driver qcom_glink_rpm
[    9.639260] probe of rpm-glink returned 1 after 36000 usecs
[    9.647849] initcall glink_rpm_init+0x0/0x20 returned 0 after 62500 usecs
[    9.653261] bus: 'rpmsg': add device rpm-glink.rpm_requests.-1.-1
[    9.659880] calling  qcom_smd_init+0x0/0x20 @ 1
[    9.665859] __platform_driver_register
[    9.670607] driver_probe_device
[    9.673907] bus: 'rpmsg': driver_probe_device: matched device rpm-glink.rpm_requests.-1.-1 with driver qcom_smd_rpm
[    9.677295] driver_register
[    9.687470] really_probe_debug
[    9.690145] bus: 'rpmsg': really_probe: probing driver qcom_smd_rpm with device rpm-glink.rpm_requests.-1.-1
[    9.693518] qcom_glink_tx
[    9.703239] mbox_send_message
[    9.705755] msg_submit
[    9.708702] qcom_apcs_ipc_send_data
[    9.710972] regmap_write
[    9.714346] _regmap_write
[    9.717121] _regmap_bus_reg_write
[    9.719647] regmap_mmio_write
[    9.722941] regmap_mmio_write32le ffff00001000d000 10 1
[    9.726044] bus: 'platform': add driver qcom-smd
[    9.731007] bus_add_driver
[    9.735825] driver_attach
[    9.738304] bus_for_each_dev
[    9.741534] initcall qcom_smd_init+0x0/0x20 returned 0 after 31250 usecs
[    9.743994] calling  rpmsg_init+0x0/0x48 @ 1
[    9.750909] driver_register
[    9.754992] bus: 'virtio': add driver virtio_rpmsg_bus
[    9.757445] bus_add_driver
[    9.762724] qcom_glink_tx
[    9.765300] mbox_send_message
[    9.767994] msg_submit
[    9.770939] qcom_apcs_ipc_send_data
[    9.773208] regmap_write
[    9.776583] _regmap_write
[    9.779361] _regmap_bus_reg_write
[    9.781886] regmap_mmio_write
[    9.785182] regmap_mmio_write32le ffff00001000d000 10 1
[    9.788282] driver_attach
[    9.793205] bus_for_each_dev
[    9.796157] initcall rpmsg_init+0x0/0x48 returned 0 after 15625 usecs
[    9.799168] calling  devfreq_init+0x0/0xbc @ 1
[    9.809426] initcall devfreq_init+0x0/0xbc returned 0 after 3906 usecs
[    9.809618] calling  devfreq_event_init+0x0/0x70 @ 1
[    9.816592] initcall devfreq_event_init+0x0/0x70 returned 0 after 0 usecs
[    9.821611] calling  devfreq_simple_ondemand_init+0x0/0x1c @ 1
[    9.828113] initcall devfreq_simple_ondemand_init+0x0/0x1c returned 0 after 0 usecs
[    9.833849] calling  devfreq_performance_init+0x0/0x1c @ 1
[    9.841327] initcall devfreq_performance_init+0x0/0x1c returned 0 after 0 usecs

Format: Log Type - Time(microsec) - Message - Optional Info
Log Type: B - Since Boot(Power On Reset),  D - Delta,  S - Statistic
S - QC_IMAGE_VERSION_STRING=BOOT.XF.1.0-00301
S - IMAGE_VARIANT_STRING=M8996LAB
S - OEM_IMAGE_VERSION_STRING=crm-ubuntu68

it careers on a little further than before but I think it could be considered a little damning.

I have made some significant progress in my continuing investigations.
First, there is a glink ssr driver. The patch submission is here:
http://lkml.iu.edu/hypermail/linux/kernel/1707.3/01050.html
and several files have subsequently been renamed:
http://lkml.iu.edu/hypermail/linux/kernel/1809.3/01769.html
I have this built into my kernel. The driver is probed but the registered notifier, qcom_glink_ssr_notify, is never called, at halt or reboot, but this is what is needed, before a kexec, to teardown the fifos.
The following insertion reports nothing, without the additional changes further below:

diff --git a/drivers/soc/qcom/glink_ssr.c b/drivers/soc/qcom/glink_ssr.c
index d7babe3d67bc..c75198334a25 100644
--- a/drivers/soc/qcom/glink_ssr.c
+++ b/drivers/soc/qcom/glink_ssr.c
@@ -98,6 +98,8 @@ static int qcom_glink_ssr_notify(struct notifier_block *nb, unsigned long event,
        msg.name_len = cpu_to_le32(strlen(ssr_name));
        strlcpy(msg.name, ssr_name, sizeof(msg.name));
 
+       pr_debug("%s %s %d\n", __func__, msg.name, msg.seq_num);
+
        ret = rpmsg_send(ssr->ept, &msg, sizeof(msg));
        if (ret < 0)
                dev_err(ssr->dev, "failed to send cleanup message\n");

As far as I can see, there is only one driver for a MSM8996 which uses RPM Glink, qcom_q6v5_pas, so I thought I would try to get it to issue a SSR notification at shutdown.
There seem to be few drivers with a shutdown method, and qcom_q6v5_pas is no exception, so, after a few variations on this theme, I made the following modification:

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
index db4b3c4bacd7..4af778b8ea1a 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -335,6 +335,15 @@ static int adsp_remove(struct platform_device *pdev)
        return 0;
 }
 
+static void adsp_shutdown(struct platform_device *pdev)
+{
+       struct qcom_adsp *adsp = platform_get_drvdata(pdev);
+
+       pr_debug("%s\n", __func__);
+       qcom_remove_glink_subdev(adsp->rproc, &adsp->glink_subdev);
+       pr_debug("%s\n", __func__);
+}
+
 static const struct adsp_data adsp_resource_init = {
                .crash_reason_smem = 423,
                .firmware_name = "adsp.mdt",
@@ -390,6 +399,7 @@ MODULE_DEVICE_TABLE(of, adsp_of_match);
 static struct platform_driver adsp_driver = {
        .probe = adsp_probe,
        .remove = adsp_remove,
+       .shutdown = adsp_shutdown,
        .driver = {
                .name = "qcom_q6v5_pas",
                .of_match_table = adsp_of_match,

This got me past the hang in ‘calling glink_rpm_init+0x0/0x20 @ 1’ to the return of ‘calling qcom_smd_init+0x0/0x20 @ 1’.

[    9.037000] EDAC MC: Ver: 3.0.0
[    9.041185] bus: 'edac': registered
[    9.044075] bus: 'edac': add device mc
[    9.048027] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[    9.055709] initcall edac_init+0x0/0x90 returned 0 after 15625 usecs
[    9.057117] calling  mmc_init+0x0/0x48 @ 1
[    9.063912] bus: 'mmc': registered
[    9.068281] bus: 'sdio': registered
[    9.070808] initcall mmc_init+0x0/0x48 returned 0 after 3906 usecs
[    9.074491] calling  leds_init+0x0/0x68 @ 1
[    9.080772] initcall leds_init+0x0/0x68 returned 0 after 0 usecs
[    9.084862] calling  qcom_scm_init+0x0/0x20 @ 1
[    9.090921] __platform_driver_register
[    9.095160] driver_register
[    9.098831] bus: 'platform': add driver qcom_scm
[    9.101685] bus_add_driver
[    9.106483] driver_attach
[    9.108897] bus_for_each_dev
[    9.111703] __driver_attach
[    9.114542] device_driver_attach
[    9.117160] driver_probe_device
[    9.120536] bus: 'platform': driver_probe_device: matched device firmware:scm with driver qcom_scm
[    9.123503] really_probe_debug
[    9.132463] bus: 'platform': really_probe: probing driver qcom_scm with device firmware:scm
[    9.135601] really_probe
[    9.144013] driver: 'qcom_scm': driver_bound: bound to device 'firmware:scm'
[    9.146773] bus: 'platform': really_probe: bound device firmware:scm to driver qcom_scm
[    9.153842] probe of firmware:scm returned 1 after 12000 usecs
[    9.161919] initcall qcom_scm_init+0x0/0x20 returned 0 after 39062 usecs
[    9.167530] calling  scmi_bus_init+0x0/0x44 @ 1
[    9.174383] bus: 'scmi_protocol': registered
[    9.178597] initcall scmi_bus_init+0x0/0x44 returned 0 after 3906 usecs
[    9.182946] calling  scmi_clock_init+0x0/0x20 @ 1
[    9.189316] initcall scmi_clock_init+0x0/0x20 returned 20 after 0 usecs
[    9.194140] calling  scmi_perf_init+0x0/0x20 @ 1
[    9.200522] initcall scmi_perf_init+0x0/0x20 returned 19 after 0 usecs
[    9.205386] calling  scmi_power_init+0x0/0x20 @ 1
[    9.211705] initcall scmi_power_init+0x0/0x20 returned 17 after 0 usecs
[    9.216437] calling  scmi_sensors_init+0x0/0x20 @ 1
[    9.222849] initcall scmi_sensors_init+0x0/0x20 returned 21 after 0 usecs
[    9.227725] calling  glink_rpm_init+0x0/0x20 @ 1
[    9.234647] __platform_driver_register
[    9.239332] driver_register
[    9.242839] bus: 'platform': add driver qcom_glink_rpm
[    9.245608] bus_add_driver
[    9.250772] driver_attach
[    9.253428] bus_for_each_dev
[    9.256202] __driver_attach
[    9.259073] device_driver_attach
[    9.261629] driver_probe_device
[    9.265066] bus: 'platform': driver_probe_device: matched device rpm-glink with driver qcom_glink_rpm
[    9.268035] really_probe_debug
[    9.277341] bus: 'platform': really_probe: probing driver qcom_glink_rpm with device rpm-glink
[    9.280387] really_probe
[    9.288950] glink_rpm_probe 68000 6dfff memory@68000 200 0
[    9.291623] glink_rpm_probe ffff000012cb8804 ffff000012cb8c00
[    9.296907] qcom_glink_native_probe rpm-glink
[    9.302912] qcom_glink_native_probe
[    9.307036] qcom_glink_send_version
[    9.310315] mbox_send_message
[    9.313766] msg_submit
[    9.316882] qcom_apcs_ipc_send_data
[    9.319163] regmap_write
[    9.322527] _regmap_write
[    9.325301] _regmap_bus_reg_write
[    9.327826] regmap_mmio_write
[    9.331121] regmap_mmio_write32le ffff00001000d000 10 1
[    9.334138] driver: 'qcom_glink_rpm': driver_bound: bound to device 'rpm-glink'
[    9.339441] bus: 'platform': really_probe: bound device rpm-glink to driver qcom_glink_rpm
[    9.346723] probe of rpm-glink returned 1 after 32000 usecs
[    9.355302] initcall glink_rpm_init+0x0/0x20 returned 0 after 58593 usecs
[    9.360586] calling  qcom_smd_init+0x0/0x20 @ 1
[    9.367303] __platform_driver_register
[    9.371642] driver_register
[    9.375307] bus: 'platform': add driver qcom-smd
[    9.378155] bus_add_driver
[    9.382963] driver_attach
[    9.385373] bus_for_each_dev
[    9.388599] initcall qcom_smd_init+0x0/0x20 returned 0 after 15625 usecs

Format: Log Type - Time(microsec) - Message - Optional Info
Log Type: B - Since Boot(Power On Reset),  D - Delta,  S - Statistic
S - QC_IMAGE_VERSION_STRING=BOOT.XF.1.0-00301
S - IMAGE_VARIANT_STRING=M8996LAB
S - OEM_IMAGE_VERSION_STRING=crm-ubuntu68

Investigating the log of the first kernel I see that the next initcall is ‘calling rpmsg_init+0x0/0x48 @ 1’.
Crucially, there are two rpmsg_inits:

$ grep -r "rpmsg_init" .
...
./drivers/rpmsg/virtio_rpmsg_bus.c:static int __init rpmsg_init(void)
./drivers/rpmsg/virtio_rpmsg_bus.c:subsys_initcall(rpmsg_init);
./drivers/rpmsg/rpmsg_core.c:static int __init rpmsg_init(void)
./drivers/rpmsg/rpmsg_core.c:postcore_initcall(rpmsg_init);
...

The initcall in question is the subsys initcall of virtio_rpmsg_bus. So I try removing all the virtio drivers from the kernel and we get a lot further:

[   23.428506] NET: Registered protocol family 17
[   23.428634] initcall packet_init+0x0/0x94 returned 0 after 8134 usecs
[   23.431905] calling  init_rpcsec_gss+0x0/0x7c @ 1
[   23.448516] initcall init_rpcsec_gss+0x0/0x7c returned 0 after 9751 usecs
[   23.448614] calling  strp_dev_init+0x0/0x44 @ 1
[   23.464813] initcall strp_dev_init+0x0/0x44 returned 0 after 10197 usecs
[   23.464916] calling  init_dns_resolver+0x0/0x128 @ 1
[   23.476467] Key type dns_resolver registered
[   23.476564] initcall init_dns_resolver+0x0/0x128 returned 0 after 5756 usecs
[   23.479886] calling  qcom_smd_qrtr_driver_init+0x0/0x20 @ 1
[   23.496441] driver_register
[   23.496495] bus: 'rpmsg': add driver qcom_smd_qrtr
[   23.498104] bus_add_driver
[   23.508433] driver_attach
[   23.508485] bus_for_each_dev
[   23.512457] initcall qcom_smd_qrtr_driver_init+0x0/0x20 returned 0 after 24908 usecs
[   23.513061] calling  qrtr_tun_init+0x0/0x3c @ 1
[   23.532730] initcall qrtr_tun_init+0x0/0x3c returned 0 after 11649 usecs
[   23.532837] calling  arm_smmu_legacy_bus_init+0x0/0x30 @ 1
[   23.544504] initcall arm_smmu_legacy_bus_init+0x0/0x30 returned 0 after 5762 usecs
[   23.544692] calling  init_oops_id+0x0/0x40 @ 1
[   23.556450] initcall init_oops_id+0x0/0x40 returned 0 after 4857 usecs
[   23.556547] calling  sched_init_debug+0x0/0x50 @ 1
[   23.568505] initcall sched_init_debug+0x0/0x50 returned 0 after 5992 usecs
[   23.568600] calling  pm_qos_power_init+0x0/0xb8 @ 1
[   23.582172] initcall pm_qos_power_init+0x0/0xb8 returned 0 after 7625 usecs
[   23.582275] calling  pm_debugfs_init+0x0/0x34 @ 1
[   23.596561] initcall pm_debugfs_init+0x0/0x34 returned 0 after 8303 usecs
[   23.596662] calling  printk_late_init+0x0/0x144 @ 1
[   23.608453] initcall printk_late_init+0x0/0x144 returned 0 after 5870 usecs
[   23.608555] calling  rcu_verify_early_boot_tests+0x0/0x68 @ 1
[   23.620439] initcall rcu_verify_early_boot_tests+0x0/0x68 returned 0 after 5982 usecs
[   23.620547] calling  swiotlb_create_debugfs+0x0/0x8c @ 1
[   23.636513] initcall swiotlb_create_debugfs+0x0/0x8c returned 0 after 8288 usecs
[   23.636617] calling  tk_debug_sleep_time_init+0x0/0x34 @ 1
[   23.648469] initcall tk_debug_sleep_time_init+0x0/0x34 returned 0 after 5277 usecs
[   23.648575] calling  taskstats_init+0x0/0x4c @ 1
[   23.664481] registered taskstats version 1
[   23.664552] initcall taskstats_init+0x0/0x4c returned 0 after 8446 usecs
[   23.667531] calling  load_system_certificate_list+0x0/0x11c @ 1
[   23.684433] Loading compiled-in X.509 certificates
[   23.684510] initcall load_system_certificate_list+0x0/0x11c returned 0 after 9853 usecs
[   23.688192] calling  fault_around_debugfs+0x0/0x34 @ 1
[   23.708469] initcall fault_around_debugfs+0x0/0x34 returned 0 after 12077 usecs
[   23.708572] calling  max_swapfiles_check+0x0/0x8 @ 1
[   23.720442] initcall max_swapfiles_check+0x0/0x8 returned 0 after 5640 usecs
[   23.720541] calling  split_huge_pages_debugfs+0x0/0x38 @ 1
[   23.732465] initcall split_huge_pages_debugfs+0x0/0x38 returned 0 after 5423 usecs
[   23.732571] calling  check_early_ioremap_leak+0x0/0x5c @ 1
[   23.748442] initcall check_early_ioremap_leak+0x0/0x5c returned 0 after 8477 usecs
[   23.748548] calling  pstore_init+0x0/0x1c @ 1
[   23.760470] initcall pstore_init+0x0/0x1c returned 0 after 5363 usecs
[   23.760559] calling  init_root_keyring+0x0/0x14 @ 1
[   23.772641] initcall init_root_keyring+0x0/0x14 returned 0 after 6509 usecs
[   23.772742] calling  integrity_fs_init+0x0/0x68 @ 1
[   23.784536] initcall integrity_fs_init+0x0/0x68 returned 0 after 5857 usecs
[   23.784634] calling  prandom_reseed+0x0/0x40 @ 1
[   23.796496] initcall prandom_reseed+0x0/0x40 returned 0 after 5963 usecs
[   23.796595] calling  pci_resource_alignment_sysfs_init+0x0/0x24 @ 1
[   23.808460] initcall pci_resource_alignment_sysfs_init+0x0/0x24 returned 0 after 5947 usecs
[   23.808574] calling  pci_sysfs_init+0x0/0x60 @ 1
[   23.824443] initcall pci_sysfs_init+0x0/0x60 returned 0 after 7592 usecs
[   23.824537] calling  clk_debug_init+0x0/0x128 @ 1
[   24.006599] initcall clk_debug_init+0x0/0x128 returned 0 after 172158 usecs
[   24.006713] calling  deferred_probe_initcall+0x0/0xb0 @ 1
[   66.512445] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 44s!
[   66.512631] Showing busy workqueues and worker pools:
[   66.519391] workqueue events: flags=0x0
[   66.524536]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=5/256
[   66.528143]     in-flight: 12:request_firmware_work_func
[   66.534376]     pending: delayed_fput, vmstat_shepherd, free_work, deferred_probe_work_func BAR(1)
[   66.539761] workqueue mm_percpu_wq: flags=0x8
[   66.548467]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[   66.552873]     pending: vmstat_update
[   66.558993] workqueue ipv6_addrconf: flags=0x40008
[   66.562538]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
[   66.567280]     pending: addrconf_verify_work
[   66.573111] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=44s workers=3 idle: 908 5
[   97.232432] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 75s!
...
[  250.893155] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=229s workers=3 idle: 908 5
[  250.897683] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  250.912435]       Not tainted 5.2.0-02466-g3e09d51cf314-dirty #2
[  250.912521] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  250.924436] swapper/0       D    0     1      0 0x00000028
[  250.925346] Call trace:
[  250.936445]  __switch_to+0x94/0xd0
[  250.936508]  __schedule+0x31c/0x858
[  250.938763]  schedule+0x38/0xd8
[  250.948442]  schedule_timeout+0x250/0x350
[  250.948507]  wait_for_common+0x140/0x168
[  250.951460]  wait_for_completion+0x14/0x20
[  250.960439]  __flush_work+0x21c/0x438
[  250.960500]  flush_work+0x10/0x18
[  250.963105]  deferred_probe_initcall+0x58/0xb0
[  250.972441]  do_one_initcall+0x88/0x418
[  250.972509]  kernel_init_freeable+0x46c/0x510
[  250.975117]  kernel_init+0x10/0xfc
[  250.988436]  ret_from_fork+0x10/0x18
[  250.988516] 
[  250.988516] Showing all locks held in the system:
[  250.991137] 4 locks held by kworker/0:1/12:
[  251.004442] 1 lock held by khungtaskd/398:
[  251.004507]  #0: (____ptrval____) (rcu_read_lock){....}, at: debug_show_all_locks+0x2c/0x18c
[  251.007513] 
[  251.028435] =============================================
[  251.028435] 
[  281.552429] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 259s!

The workqueue lockup continues indefinitely.
The call trace shows ‘deferred_probe_initcall’ puts ‘deferred_probe_work_func’ on the ‘events’ workqueue, (aka ‘system_wq’ and ‘kernel-global’)

Unfortunately, when I put some pr_debugs in ‘_request_firmware’ and ‘request_firmware_nowait’ the second kernel hangs at ‘initcall qcom_smd_init+0x0/0x20 returned 0 after 15625 usecs’ again and, so far, I have been unable to reproduce the workqueue lockup.

I have made more progress in that I have booted the second kernel through the systemd init process to a root prompt from the OS. But I do not think I have everything working yet.

On further research it seems that qcom_q6v5_pas.c is not using RPM Glink. There is no glink-edge in the device tree, but there is a smd-edge. Perhaps it uses SMD or SMEM Glink and the SSR mechanism is more general than just Glink. The commit messages and mailing list entries indicate that it interacts with the trust zone code.

However, I have managed to get it to call ‘ssr_notify_stop’ at shutdown. The following patch calls ‘qcom_glink_ssr_notify’ through the notify list, ‘ssr_notifiers’. (The comment below the pr_debug shows the actual output to the log.)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
index db4b3c4bacd7..d5c013708583 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -335,6 +335,17 @@ static int adsp_remove(struct platform_device *pdev)
        return 0;
 }
 
+static void adsp_shutdown(struct platform_device *pdev)
+{
+       struct qcom_adsp *adsp = platform_get_drvdata(pdev);
+       struct rproc_subdev *subdev = &adsp->ssr_subdev.subdev;
+
+       pr_debug("%s %pS\n", __func__, subdev->stop); 
+       /*adsp_shutdown ssr_notify_stop+0x0/0x28*/
+
+       subdev->stop(subdev, 0);
+}
+
 static const struct adsp_data adsp_resource_init = {
                .crash_reason_smem = 423,
                .firmware_name = "adsp.mdt",
@@ -390,6 +401,7 @@ MODULE_DEVICE_TABLE(of, adsp_of_match);
 static struct platform_driver adsp_driver = {
        .probe = adsp_probe,
        .remove = adsp_remove,
+       .shutdown = adsp_shutdown,
        .driver = {
                .name = "qcom_q6v5_pas",
                .of_match_table = adsp_of_match,

The secondary boot loader, xbl.elf, loads both the trust zone QSEE Image, tz.mbn, and the RPM Image, rpm.mbn, so interactions with both these firmware blobs has to be negotiated through the transition from the first kernel to the second. The kernel cannot reload or reset them.

I then tried calling ‘qcom_glink_ssr_notify’ through a notifier list in ‘kernel_kexec’.
The following patch does the registration in qcom_glink_ssr_probe, (The devinfo and pr_debug insertions were made for earlier investigations):

diff --git a/drivers/soc/qcom/glink_ssr.c b/drivers/soc/qcom/glink_ssr.c
index d7babe3d67bc..e6e038c7b122 100644
--- a/drivers/soc/qcom/glink_ssr.c
+++ b/drivers/soc/qcom/glink_ssr.c
@@ -10,6 +10,8 @@
 #include <linux/rpmsg.h>
 #include <linux/remoteproc/qcom_rproc.h>
 
+extern int register_kexec_notifier(struct notifier_block *);
+
 /**
  * struct do_cleanup_msg - The data structure for an SSR do_cleanup message
  * version:     The G-Link SSR protocol version
@@ -60,6 +62,8 @@ static int qcom_glink_ssr_callback(struct rpmsg_device *rpdev,
        struct cleanup_done_msg *msg = data;
        struct glink_ssr *ssr = dev_get_drvdata(&rpdev->dev);
 
+       dev_info(ssr->dev, "%s %d %d %d\n", __func__, msg->version, msg->response, msg->seq_num);
+
        if (len < sizeof(*msg)) {
                dev_err(ssr->dev, "message too short\n");
                return -EINVAL;
@@ -98,6 +102,8 @@ static int qcom_glink_ssr_notify(struct notifier_block *nb, unsigned long event,
        msg.name_len = cpu_to_le32(strlen(ssr_name));
        strlcpy(msg.name, ssr_name, sizeof(msg.name));
 
+       pr_debug("%s %s %d\n", __func__, msg.name, msg.seq_num);
+
        ret = rpmsg_send(ssr->ept, &msg, sizeof(msg));
        if (ret < 0)
                dev_err(ssr->dev, "failed to send cleanup message\n");
@@ -112,6 +118,7 @@ static int qcom_glink_ssr_notify(struct notifier_block *nb, unsigned long event,
 static int qcom_glink_ssr_probe(struct rpmsg_device *rpdev)
 {
        struct glink_ssr *ssr;
+       int ret;
 
        ssr = devm_kzalloc(&rpdev->dev, sizeof(*ssr), GFP_KERNEL);
        if (!ssr)
@@ -125,6 +132,7 @@ static int qcom_glink_ssr_probe(struct rpmsg_device *rpdev)
 
        dev_set_drvdata(&rpdev->dev, ssr);
 
+       ret = register_kexec_notifier(&ssr->nb);
        return qcom_register_ssr_notifier(&ssr->nb);
 }

The following patch implements and calls the notifier list in ‘kernel_kexec’:

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index d5870723b8ad..7e53714e45de 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1110,6 +1110,23 @@ static int __init crash_notes_memory_init(void)
 subsys_initcall(crash_notes_memory_init);
 
 
+static BLOCKING_NOTIFIER_HEAD(kexec_notifier_list);
+/**
+ *     register_kexec_notifier - Register function to be called at kexec time
+ *     @nb: Info about notifier function to be called
+ *
+ *     Registers a function with the list of functions
+ *     to be called at kexec time.
+ *
+ *     Currently always returns zero, as blocking_notifier_chain_register()
+ *     always returns zero.
+ */
+int register_kexec_notifier(struct notifier_block *nb)
+{
+       return blocking_notifier_chain_register(&kexec_notifier_list, nb);
+}
+EXPORT_SYMBOL(register_kexec_notifier);
+
 /*
  * Move into place and start executing a preloaded standalone
  * executable.  If nothing was preloaded return an error.
@@ -1160,6 +1177,7 @@ int kernel_kexec(void)
        {
                kexec_in_progress = true;
                kernel_restart_prepare(NULL);
+               blocking_notifier_call_chain(&kexec_notifier_list, 0, "apss");
                migrate_to_reboot_cpu();
 
                /*

This seems to work but there is no response to the cleanup message as there is when called from qcom_q6v5_pas. Then I thought it could be because all the drivers should be shutdown at this point. So I used the same technique in ‘qcom_glink_rpm’. This worked also but there was still no response to the cleanup message. The remoteproc/rpmsg callback ‘qcom_glink_ssr_callback’ is either not being found or the RPM does not respond.
I have only a vague understanding of the plumbing involved with all this. There seems to be so many subsystems/frameworks involved I have yet to get my head around it all. Perhaps I need to register this callback somewere else, although, as the notifier seems to be getting through, you would have thought that the callback should be picked up.
I have used the SSR name, “apss”, (Application Processor Subsystem), as that is what LK uses in ‘rpm_glink_uninit’ and is the subsystem that will be going down, but, as far as I can see, nothing has opened anything in that context in the kernel so perhaps I need to do that, somehow, to get a response to the cleanup message.

The main thing that is not working in the second kernel is PCI. The first PCI port, qcom,pcie@600000, does seem to work. The WiFi/BT chip QCA6174 driver, ath10k, puts out the same messages to the log in the second kernel as in the first, with the exception of assigning IRQs 76 and 77 rather than 77 and 78. Interestingly, the configuration of this port is deferred, so it occurs after the other two, and the configuration space is unaltered, (according to kernel parameter pci=earlydump), so the second kernel sees the same configuration space as the first. The configuration of the two other ports, qcom,pcie@608000 and qcom,pcie@610000, is not deferred and their configuration spaces are altered by the first kernel with the result that my ahci controller, ASM1062, and the gigabit ethernet controller, AR8151, are not seen at all by the second kernel.

All LK does, on issuing the SSR in ‘rpm_glink_uninit’, is poll for all four fifo indices, (head and tail for tx and rx), to be set to zero by the RPM by calling ./platform/msm_shared/glink/xport_rpm.c:xport_rpm_wait_link_down through ssr_glink_port->if_ptr->wait_link_down.

In the kernel, by inserting pr_debugs into ‘glink_rpm_parse_toc_entry’, I have found that the first kernel starts with a RPM_CMD_VERSION, (le16)0, GLINK_VERSION_1, (le16)1, features = (le32)0, in the fifos from the RPM to the Modem Subsystem, “mpss”, (magic ‘r2mp’) and to the Low Power Audio Subsystem, “lpass”, (magic ‘r2ad’) and to a third fifo with magic, ‘r2sc’, for which I do not know the corresponding subsystem. All other fifos are empty, all four indices are zero, in particular the fifos to and from “apss” which presumably were reset in response to the SSR message from LK. Interestingly, the second kernel starts with the fifos in the same state with the exception that the fifo from the RPM to “lpass” is also empty, presumably as a result of the SSR message from qcom_q6v5_pas.

So, perhaps the glink-SSR implementation in ./drivers/soc/qcom/glink_ssr.c is incomplete.