Booting with no HDMI module

Hi,
I have a question about the best way to solve this issue:
Boot a 410 SOM without an HDMI module?

My issue:
When I boot debian and screen the debug uart port, I get the kernel booting and crashing constantly when HDMI device is not connected. It seems that the device-tree in the pmic tearing down because the mipi serial or i2c or i2s isn’t there.

Suggestions on how the dtsi/driver should I set up to gracefully fail and continue?

I will not always have the HDMI connector, and so ideally I would like it to be available to load on demand.

There are a few approaches that I see immediately:

  1. use a headless build (maybe not ideal for the immediate future) as I still need the display for few a weeks - but there’s still a display dependency, I think…
  2. remove it from the pmic device tree (but this means that the display will be disabled, and the wifi ssh bug blocks me to configure it back without either re-flashing or setting up an constant ping in rc.local to keep ssh alive)
  3. maybe some other option I’m not thinking of to detect and ignore/make optional load of hdmi?

–NORMAL_BOOT------------------------------------------------------------------------------

[ 2.481316] NET: Registered protocol family 17
[ 2.481515] Bluetooth: RFCOMM TTY layer initialized
[ 2.481534] Bluetooth: RFCOMM socket layer initialized
[ 2.481570] Bluetooth: RFCOMM ver 1.11
[ 2.481604] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[ 2.481687] Bluetooth: HIDP socket layer initialized
[ 2.482012] 9pnet: Installing 9P2000 support
[ 2.482226] Key type dns_resolver registered
[ 2.484115] registered taskstats version 1
[ 2.491355] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator
[ 2.492168] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator

[ 2.515129] msm 1a00000.qcom,mdss_mdp: bound 1a98000.qcom,mdss_dsi (ops dsi_ops)

[ 2.515173] msm 1a00000.qcom,mdss_mdp: bound 1c00000.qcom,adreno-3xx (ops a3xx_ops) [ 2.515246] 1a00000.qcom,mdss_mdp supply vdd not found, using dummy regulator
[ 2.534337] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.534341] [drm] No driver support for vblank timestamp query.
[ 2.781186] mmc0: new high speed MMC card at address 0001
[ 2.783302] mmcblk0: mmc0:0001 H8G1e 7.28 GiB
[ 2.783556] mmcblk0boot0: mmc0:0001 H8G1e partition 1 4.00 MiB
[ 2.783797] mmcblk0boot1: mmc0:0001 H8G1e partition 2 4.00 MiB
[ 2.784029] mmcblk0rpmb: mmc0:0001 H8G1e partition 3 4.00 MiB
[ 2.787966] Alternate GPT is invalid, using primary GPT.
[ 2.788005] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
[ 2.958413] Console: switching to colour frame buffer device 240x75
[ 3.045427] msm 1a00000.qcom,mdss_mdp: fb0: msm frame buffer device
[ 3.051369] msm 1a00000.qcom,mdss_mdp: registered panic notifier
[ 3.087718] [drm] Initialized msm 1.0.0 20130625 on minor 0
[ 3.089452] msm_otg 78d9000.phy: OTG regs = ffffff800051c000
[ 3.093104] msm_otg 78d9000.phy: no vddcx
[ 3.213597] msm_hsusb_host 78d9000.ehci: EHCI Host Controller
[ 3.215937] msm_hsusb_host 78d9000.ehci: new USB bus registered, assigned bus number 1

–CRASH_BOOT-------------------------------------------------------------------------------

[ 2.522402] NET: Registered protocol family 17
[ 2.522768] Bluetooth: RFCOMM TTY layer initialized
[ 2.522788] Bluetooth: RFCOMM socket layer initialized
[ 2.522826] Bluetooth: RFCOMM ver 1.11
[ 2.522860] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[ 2.522924] Bluetooth: HIDP socket layer initialized
[ 2.523259] 9pnet: Installing 9P2000 support
[ 2.523473] Key type dns_resolver registered
[ 2.525360] registered taskstats version 1
[ 2.533198] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator
[ 2.534079] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator

[ 2.589670] i2c_qup 78b8000.i2c: NACK from 39
[ 2.590501] adv7533: probe of 1a98000.qcom,mdss_dsi.0 failed with error -5
[ 2.590554] msm_dsi_manager_register: failed to register mipi dsi host for DSI 0
[ 2.592377] msm 1a00000.qcom,mdss_mdp: failed to bind 1a98000.qcom,mdss_dsi (ops dsi_ops): -517
[ 2.596175] msm 1a00000.qcom,mdss_mdp: master bind failed: -517

[ 2.598118] msm_otg 78d9000.phy: OTG regs = ffffff8000512000
[ 2.598883] msm_otg 78d9000.phy: no vddcx
[ 2.773990] msm_hsusb_host 78d9000.ehci: EHCI Host Controller
[ 2.775715] msm_hsusb_host 78d9000.ehci: new USB bus registered, assigned bus number 1
[ 2.779465] msm_hsusb_host 78d9000.ehci: irq 144, io mem 0x078d9000
[ 2.795666] CPR closed loop is enabled
[ 2.796457] CPR is enabled!
[ 2.798382] msm_hsusb_host 78d9000.ehci: USB 2.0 started, EHCI 1.00
[ 2.802396] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[ 2.807309] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 2.814253] usb usb1: Product: EHCI Host Controller

[ 2.818682] qcom-apq8016-sbc 7702000.sound: error getting codec dai name
[ 2.818688] qcom-apq8016-sbc 7702000.sound: Error resolving dai links: -517

[ 2.822038] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator
[ 2.822334] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator
[ 2.847065] usb usb1: Manufacturer: Linux 4.2.4-linaro-lt-qcom ehci_hcd
[ 2.854241] usb usb1: SerialNumber: 78d9000.ehci

[ 2.859459] i2c i2c-3: Failed to register i2c client dummy at 0x39 (-16)
[ 2.859804] adv7533: probe of 1a98000.qcom,mdss_dsi.0 failed with error -12
[ 2.859829] msm_dsi_manager_register: failed to register mipi dsi host for DSI 0
[ 2.860681] msm 1a00000.qcom,mdss_mdp: failed to bind 1a98000.qcom,mdss_dsi (ops dsi_ops): -517
[ 2.862747] msm 1a00000.qcom,mdss_mdp: master bind failed: -517

hi,

which release/build are you using?

thanks

@ndec
It’s the lastest debian flavor, from Dec 04.
4.2.4 kernel.

I can see that the device tree loads mdss_dsi0:
arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi : 204

And also per the pmic:
arch/arm64/boot/dts/qcom/msm8916-mdss.dtsi : 73

I assume it’s because the adv7533_probe failed?
drivers/gpu/drm/i2c/adv7511.c : 1333

-517 would imply EPROBE_DEFER?

From the schematic, it looks like there’s an i2c, i2s, and mspi connection to the chip.

I was looking at something else today-

I noticed in dmesg of a debian build with HDMI connector attached, seems to have most of the same -517 errors above.

It looks like the key difference are the i2c errors above… which seems to lead to the adv7533 probe failure.

Hi,

Could you confirm what the topmost commit of your kernel is? It should ideally be “wcn36xx: remove references to IFF_PROMISC” (id: 3f51812)

https://git.linaro.org/landing-teams/working/qualcomm/kernel.git/shortlog/refs/heads/release/qcomlt-4.2

In particular, there is a ADV7533 patch called “drm/i2c: adv7511: Init regulators” (54f82a0) that fixes i2c NACK issues seen with ADV7533.

Thanks,
Archit

@architt
Thanks for the reply.

As I stated, I am using the release from, Dec 04:
http://builds.96boards.org/releases/dragonboard410c/linaro/debian/latest/

I can see in the kernel git log, the patches were merged from debian-qcom-dragonboard410c-15.11 on Dec 03… so, I would think they would be in the release.

I have also pulled the most recent already, which is not in the release, “…IFF_PROMISC”, 3f51812… that you mentioned is in my local.

I can attempt to build the kernel module, and modprobe what’s there to see if I get a different result. I should be able to override KERNELRELEASE=uname -r without any instruction, and see if I can successfully install it.

…that quick attempt didn’t work out so well. I need to flash my board now, and get some sleep.

I did everything quick-like, and for some unknown reason built the wifi module(?) (well… probably because I was looking at wifi stuff, and didn’t think about about it really) Ha! Anyway, totally unrelated sorry. The result was missing symvers, so there’s no dependencies or modversions… which is probably not a good thing at all, but I thought hey… I’d give it a wrecking ball try at 11p before I call it quits.

I’ll give it another look with fresh eyes…

@ndec
whoo-hoo! - one step closer to headless. Removing FIF_PROMISC_IN_BSS from WCN36XX_SUPPORTED_FILTERS, seems to have fixed the ping/ssh issues I was previously having. I installed the kernel module and it works much better.

I don’t know if it was entirely necessary, but I killed a bunch of services prior, to rebooting… and still had issues rebooting even, because the kernel was not too happy with the changes on shutdown. But after the system booted back up, I was able to ping and ssh the 410 from multiple machines without any prior outbound traffic. Yay!

@architt
I took another look. I think it’s actually in the adv7533_probe, as I stated before.

drivers/gpu/drm/i2c/adv7511.c:1462:
adv->i2c_main = i2c_new_dummy(adapter, main_i2c_addr >> 1);

drivers/i2c/i2c-core.c:1001:
dev_err(&adap->dev, "Failed to register i2c client %s at 0x%02x (%d)\n", client->name, client->addr, status);

Since, main_i2c_addr == 0x72
0x72 >> 1 = 0x39

Just before it crashed we see:
[ 2.859459] i2c i2c-3: Failed to register i2c client dummy at 0x39 (-16)

-16 implying EBUSY, but the adv7533 is not connected… which probably looks like it’s not responding I suppose…

@hhony

Thanks for the additional data on the ADV7533 issue. I’m not able to reproduce this on my device for some reason.

Is it possible for you to share the entire kernel boot log? Also, it would be handy if you could add “debug drm.debug=0x1f” in the kernel bootargs.

Also, I noticed a possible issue where the ADV7533 driver registers the client at 0x39, and then defers without unregistering it, which could result in the probe to fail the next time. Could you try out this patch too?

http://paste.ubuntu.com/15098695/

Merci! Absolutely, I can test that tomorrow.

In my case, it maybe a little more severe… I might have to go as far as the drm_bridge_remove. I missed that label in my first pass.

It’s really more of an electrical issue (likely), as the bus is low. I’m using a som or breakout equivalent, and including the hdmi bridge isn’t entirely necessary always.

@architt
So it looks like that patch does not fix the issue, although it did introduce some different behavior. Now, instead of rebooting constantly, the kernel tries to unregister the devices forever. You will find a log here.

I also changed a few other places in the code, when patching my kernel and this patch can be found here. I expanded what you sent me a little bit, but please note that the additional edits in adv7511.c do not matter for this issue.

It appears that the failure is actually on the attempt from err_i2c_unregister_main label, after
ret = regmap_read(adv->regmap, ADV7511_REG_CHIP_REVISION, &val);
I verified this is the exact spot by adding my some printk messages, which do not appear.

The part of the log which is concerning:
[ 2.622764] i2c_qup 78b8000.i2c: NACK from 39
[ 2.622814] device_unregister: attempt to unregister device: ‘3-0039’
[ 2.625053] adv7533: probe of 1a98000.qcom,mdss_dsi.0 failed with error -5
[ 2.625103] msm_dsi_manager_register: failed to register mipi dsi host for DSI 0
[ 2.625109] device_unregister: attempt to unregister device: ‘1a98000.qcom,mdss_dsi.0’

So, I think -5 implies EIO returned from drivers/i2c/busses/i2c-qup.c

I added a printk inside the device_unregister function in core that reveals the dev_name as 3-0039. So, what’s that?

I checked arch/arm64/boot/dts/qcom/msm8916.dtsi:1313: blsp_i2c4: i2c@78b8000, and I find no mention of this. Is this part of the dummy registration?

Silly question in retrospect. The dev_name function is returning kobject_name(&dev->kobj), since dev->init_name does not exist. i2c4… duh… so bus 3, device 39… yea, may have needed more coffee this morning.

So, why does it keep hitting drm_platform_init? At first I thought it could be that put_device is not updating platform fully…

But that still didn’t explain the -517… and then I found this:

[ 1.973483] [drm:msm_drm_register] init

[ 2.043086] [drm:dsi_init] dsi probed=ffffffc036aeb018
[ 2.046877] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator
[ 2.051737] 1a98000.qcom,mdss_dsi supply gdsc not found, using dummy regulator
[ 2.058723] l2: supplied by s3
[ 2.066556] [drm:msm_dsi_host_init] Dsi Host 0 initialized
[ 2.069150] msm_dsi_manager_register: failed to register mipi dsi host for DSI 0
[ 2.074019] device_unregister: attempt to unregister device: ‘1a98000.qcom,mdss_dsi.0’
[ 2.082421] [drm:msm_dsi_host_destroy]
[ 2.092847] [drm:dsi_bind] yikes, there’s your problem (-517).
[ 2.096914] msm 1a00000.qcom,mdss_mdp: failed to bind 1a98000.qcom,mdss_dsi (ops dsi_ops): -517
[ 2.105428] msm 1a00000.qcom,mdss_mdp: master bind failed: -517

Working out a solution now…

There is a limitation with the DSI driver at the moment where it always tries to defer if it can’t find a connected bridge chip or panel. In the case when ADV7533 probe fails, the bind failure and deferral that we see is expected.

However, I didn’t expect that we defer indefinitely if probe fails. I tried to reproduce this in the linaro 4.4 kernel (along with some tiny fixes), and that doesn’t have this issue. This branch here is the qcomlt-4.4 branch with 2 additional patches from me. Could you give this a try on your side?

About the 4.2 kernel, I haven’t tried to debug what might be causing the indefinite probe. I’ll try this over the weekend.

I did find another issue with the adv7533 probe sequence which can be problematic. I squashed your patch with the new changes here:

http://paste.ubuntu.com/15129569/

Thanks,
Archit

@architt:
Funny that you bring up the issue about the audio_init, I saw that too. Sure I can test those patches for you… but I don’t think they will fix the original issue… because… I fixed it! :slight_smile:

So, it appears to be that the probe defers indefinitely from the perspective of the drivers/gpu/drm/msm/msm_drv.c code. See patch below:

Patch to fix booting headless without adv7533:
http://paste.ubuntu.com/15136192/

Log of session (booted without adv7533 chip, ssh’d in, shutdown):
http://paste.ubuntu.com/15136066/

The issue is not in the adv7511 code, after all… It is the fact that the adv7533 chip is just missing, and therefore the connectors are missing, and the drm platform device is the one who receives the deferred probe constantly as a result.

The device_unregister call only unregisters the sub-level host device, but the issue is actually the underlying hardware connectors in the msm8916-mdss.dtsi:60. You will see that the parent device is the mdss_mdp, and mdss_dsi0 is actually the ‘connectors’ in this parent device… You might also see that in apq8016-sbc.dtsi:207, that &mdss_dsi0 is called out to be the adv7533. Further, 1a98000.qcom,mdss_dsi.0 is the msm_dsi_host device and just a symptom of the underlying issue… which is the hdmi bridge is not even on the board.

You will notice I added:
[ 2.086816] [drm:msm_pdev_probe] component_master_add_with_match for device: 1a00000.qcom,mdss_mdp, parent: soc, (-517)

You cannot do a platform_device_unregister from msm_pdev_probe, unless you like crashing your kernel entirely. You also will not fix this by any means from adv7511, because that’s much too late in the driver load to do anything. msm_drv, is the correct spot to fix this because it looks for the ‘connectors’ from the device tree.


So one additional thing I did notice in the above log is the sound defers a probe:

[    2.834575] qcom-apq8016-sbc 7702000.sound: error getting codec dai name
[    2.834581] qcom-apq8016-sbc 7702000.sound: Error resolving dai links: -517

There’s also a nasty thing that happens on shutdown (possibly as a result):

2524: FAILED[0m] Failed to start Store Sound Card State.
2525: See 'systemctl status alsa-store.service' for details.
2526-2559: systemd-journald[1327]: Failed to forward syslog message: Connection refused

Also note:
There’s also an optical mouse connected to the system which I did not touch the entire time and there are several ‘usb disconnect messages’ along with the 7702000 errors…

Neither of the above are related to the original issue (which I consider working), and you should consider closed with my patch. If you would like me to start a new thread about the ‘sound dai’ or ‘usb disconnect’ I would be happy to collaborate.

Best,
Hans

@hhony

Thanks for working on this.

About the deferred probe issue fix, I’m afraid it will break the driver when a hdmi bridge is really there and a defer happens because of a missing resource like regulators/pinctrl etc.

I tried to look why we try retry probing indefinitely here. In normal circumstances, the kernel’s device driver core eventually stops trying to re-probe drivers that request deferrals once all other drivers are bound. Unfortunately, in our case, whenever we try to probe the adv7511 driver, we end up registering 3 dummy i2c device/drivers, resulting in the core attempting to retry deferred drivers again. This results in a sort of never ending loop. I can’t think of a way to fix this.

In the 4.4 kernel. The adv7533 is no longer a child under mdss_dsi in DT. It’s a node under the i2c adapter blsp_i2c4. This little change prevents the loop kicking in.

If you’re using a different/customized board, I think it would make more sense to remove the ADV7533 and external-dai-link nodes from the dtsi since your platform doesn’t have it.

Thanks,
Archit

@architt
Yes, you may be correct about the regulators or pinctrl… but I’m also not likely to be the only person in the world who will spin their own board without the hdmi bridge and try to run linaro.

I thought about that doing a substr (in C) to check the mdss_mdp is actually the device which is failing. In the event that it fails, say 5 or 10 times, it’s much more likely that it’s failing on the bind of the i2c, since the dummy regulators exist already - this was my thought.

You could conceivably give the msm device 5 or 10 tries, and then boot without the hdmi bridge (aka ‘connectors’). You are correct in that it might take more time for the regulators or pinctrl to come up, and in this case, I would think 5 or 10 failures might do it based on a normal boot sequence.

Either way, I have patched my fork, and I’ll likely remove the mdss_dsi0 references and children from the dtsi’s in my fork… as this was my original question. None the less, for the community, this thread could always serve as a ‘work-around’ I suppose, if applying the patch may break other things.

While I was in that code, I did see some overlap with the gpu, which I haven’t had time to explore yet. In theory, the devices should be isolated other than sharing the same base msm device. But I’ll need to do some more testing for confidence in that statement.

Thanks,
Hans