Blank monitor after software update

I have just reflashed my board with
dragonboard-820c-bootloader-ufs-linux-36
using QDL, and:
linaro-buster-alip-dragonboard-820c-221.img
boot-linaro-buster-dragonboard-820c-221.img
using fastboot.

The board boots but HDMI no longer functions and user LED 0 is flashing in pairs.

The display worked with the installation as delivered, which was:
Linux version 4.11.0-qcomlt (abuild@r2-a19) (gcc version 7.2.1 20171025 (Debian 7.2.0-12) ) #1 SMP PREEMPT Mon Nov 6 23:24:36 UTC 2017
LXQt 0.11.1
root@linaro-alip:~# cat /etc/debian_version
buster/sid

I am using an old DELL 1505FP VGA monitor (1024x768@75) with one of these:
StarTech HD2VGAMICRA

The only suspicious report in the console, that I can see, is that the adapter does not support HDCP:
[ 12.079589] msm_hdmi_hdcp_read_validate_aksv: AKSV QFPROM doesn’t have 20 1’s, 20 0’s
[ 12.079597] msm_hdmi_hdcp_read_validate_aksv: QFPROM AKSV chk failed (AKSV=0800000000)
[ 12.079643] msm_hdmi_hdcp_auth_prepare: ASKV validation failed
[ 12.079645] msm_hdmi_hdcp_auth_work: auth prepare failed -524
[ 12.079647] msm_hdmi_hdcp_auth_work: hdcp is not supported

Currently, I have no other monitors that I can try.
Has there been any further developments on why some monitors do not work with some builds?

I just occurred to me to check the initial boot log from the old system and I find this entry
[ 4.234447] hdmi_msm 9a0000.hdmi-tx: failed to init hdcp: disabled

The obvious thing for me to try next is to disable hdcp to see if that will free up the monitor. So I look for a kernel parameter. I cannot find one, but, after searching the commits for kernel 4.14.52, I find a commit from Rob Clark which suggests there is a kernel parameter, or possibly something to set in /sys:

feb46f02c3fa70e6d3e5307cb105cc69c60a3fe3
drm/msm: make HDCP support optional
It is already optional at runtime. But this at least simplifies backports to kernels without QCOM_SCM.

Well, if there is, I have failed, miserably, to find it. All I can find is the config parameter in that commit, CONFIG_DRM_MSM_HDMI_HDCP, which is set to ‘y’ in the linaro build.

So, I guess I have to make my own kernel build. I have not done that for many years.:slight_smile:

That should work anyway.
Do you have display modes detected in /sys/class/drm/card0-HDMI-A-1/modes?

root@linaro-alip:~# cat /sys/class/drm/card0-HDMI-A-1/modes
1024x768
1920x1080
1920x1080
1280x720
1280x720
1024x768
800x600
800x600
640x480
640x480
640x480
720x400

and /var/log/Xorg.0.log claims to be using the correct mode. Xorg is geting the EDID from the monitor through the adapter but also gets the modes the adapter is capable of.
[ 30.178] (II) modeset(0): First detailed timing is preferred mode
[ 30.178] (II) modeset(0): redX: 0.631 redY: 0.347 greenX: 0.306 greenY: 0.590
[ 30.178] (II) modeset(0): blueX: 0.150 blueY: 0.088 whiteX: 0.313 whiteY: 0.329
[ 30.178] (II) modeset(0): Supported established timings:
[ 30.178] (II) modeset(0): 720x400@70Hz
[ 30.178] (II) modeset(0): 640x480@60Hz
[ 30.178] (II) modeset(0): 640x480@75Hz
[ 30.178] (II) modeset(0): 800x600@60Hz
[ 30.178] (II) modeset(0): 800x600@75Hz
[ 30.178] (II) modeset(0): 1024x768@60Hz
[ 30.178] (II) modeset(0): 1024x768@75Hz
[ 30.178] (II) modeset(0): Manufacturer’s mask: 0
[ 30.178] (II) modeset(0): Supported detailed timing:
[ 30.178] (II) modeset(0): clock: 65.0 MHz Image Size: 304 x 228 mm
[ 30.179] (II) modeset(0): h_active: 1024 h_sync: 1048 h_sync_end 1184 h_blank_end 1344 h_border: 0
[ 30.179] (II) modeset(0): v_active: 768 v_sync: 771 v_sync_end 777 v_blanking: 806 v_border: 0
[ 30.179] (II) modeset(0): Serial No: W4901546GABY
[ 30.179] (II) modeset(0): Monitor name: DELL 1505FP
[ 30.179] (II) modeset(0): Ranges: V min: 56 V max: 76 Hz, H min: 30 H max: 61 kHz, PixClock max 329 MHz
[ 30.179] (II) modeset(0): Unknown vendor-specific block 0
[ 30.179] (II) modeset(0): Supported detailed timing:
[ 30.179] (II) modeset(0): clock: 328.3 MHz Image Size: 0 x 0 mm
[ 30.179] (II) modeset(0): h_active: 1077 h_sync: 1077 h_sync_end 1105 h_blank_end 1077 h_border: 0
[ 30.179] (II) modeset(0): v_active: 132 v_sync: 132 v_sync_end 132 v_blanking: 231 v_border: 0
[ 30.179] (II) modeset(0): Number of EDID sections to follow: 1

After reading:
https://discuss.96boards.org/t/fastboot-installation-method-for-the-db820c-issue-with-the-hdmi/4457
I thought I would try the suggestions of ndec in his first post in that thread.
(I get the stream of MultiMedia1: ASoC: reports at the end of every boot-up also.)

root@linaro-alip:~# systemctl stop sddm
root@linaro-alip:~# [   33.287932]  MultiMedia1: ASoC: no backend DAIs enabled for MultiMedia1
[   33.288037]  MultiMedia1: ASoC: no backend DAIs enabled for MultiMedia1
[   33.296419]  MultiMedia1: ASoC: no backend DAIs enabled for MultiMedia1
[   33.300807]  MultiMedia1: ASoC: no backend DAIs enabled for MultiMedia1
[   33.308563]  MultiMedia1: ASoC: no backend DAIs enabled for MultiMedia1
[   33.315900] q6asm-dai q6asm-dai: DSP returned error[1]
repeated many times

root@linaro-alip:~# fbset

mode "1024x768"
    geometry 1024 768 1024 768 32
    timings 0 0 0 0 0 0 0
    accel true
    rgba 8/16,8/8,8/0,0/0
endmode

root@linaro-alip:~# fbset -t 15385 160 24 29 3 136 6
ioctl FBIOPUT_VSCREENINFO: Invalid argument
root@linaro-alip:~# fbset

mode "1024x768"
    geometry 1024 768 1024 768 32
    timings 0 0 0 0 0 0 0
    accel true
    rgba 8/16,8/8,8/0,0/0
endmode

root@linaro-alip:~# X&
[1] 2678
root@linaro-alip:~# 
X.Org X Server 1.19.6
Release Date: 2017-12-20
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.9.0-5-arm64 aarch64 Debian
Current Operating System: Linux linaro-alip 4.14.0-qcomlt-arm64 #1 SMP PREEMPT Fri May 25 18:39:49 UTC 2018 aarch64
Kernel command line: root=/dev/disk/by-partlabel/rootfs rw rootwait console=tty0 console=ttyMSM0,115200n8 androidboot.bootdevice=624000.ufshc androidboot.verifiedbootstate=orange androidboot.veritymode=enforcing androidboot.serialno=683393bf androidboot.baseband=apq mdss_mdp.panel=0
Build Date: 26 January 2018  04:21:37PM
xorg-server 2:1.19.6-1 (https://www.debian.org/support) 
Current version of pixman: 0.34.0
	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sun Jul  8 10:01:37 2018
(==) Using system config directory "/usr/share/X11/xorg.conf.d"

root@linaro-alip:~# export DISPLAY=:0 

root@linaro-alip:~# xrandr
Screen 0: minimum 320 x 200, current 1024 x 768, maximum 65535 x 65535
HDMI-1 connected primary 1024x768+0+0 (normal left inverted right x axis y axis) 304mm x 228mm
   1024x768      60.00*+  75.03  
   1920x1080     60.00    59.94  
   1280x720      60.00    59.94  
   800x600       75.00    60.32  
   640x480       75.00    60.00    59.94  
   720x400       70.08  
root@linaro-alip:~# xrandr --output HDMI-1 --mode 1024x768
root@linaro-alip:~# 

The last command has no effect. My monitor will not do 1920x1080.
The fbset -t command applies the ‘First detailed timings’ from /var/log/Xorg.0.log (as shown in the post above) translated into fb.mode format.

I guess this does not get me much further, but I thought I should post it for the record.

I tried another VGA monitor, with the adapter, that is capable of 1920x1080 with the same result. The monitor remains blank. So the problem would not appear to be resolution related. I also tested the adapter with my PC, in case it was defective, but it worked perfectly with this monitor, which is the one normally connected to my PC directly.

Clearly, at least some of the hdmi hardware on the board was working as the EDID is read and everything I can think of testing in the X system is working as expected, including receiving events from the keyboard and mouse, but perhaps there has been some subtle hardware failure.

At this point I discovered that there were old kernel branches in the git repository I had cloned, working/qualcomm/kernel.git - Qualcomm Landing Team kernel.
I have very little experience with git, so it took a while. :slight_smile:
So I did:
$ git checkout -t origin/release/db820c/qcomlt-4.11
and I got 4.11.12-30705-g0e82eeffbc29.
I built it, transfered the modules across and booted it with fastboot. The monitor burst into life with linaro-buster-alip-dragonboard-820c-221 visible for the first time since installing it.

So I suppose I now have to try to bisect the commit that caused the problem. I have never done this and, to complicate matters, I understand that there are a number of patches to the kernel, that are not upstream, which are necessary for it to work.
Are there any howtos that might help me?

I am really struggling here:

$ git bisect log
# bad: [4579f43d0d8868e2b4615ee3c64dc49fd54ddb79] wcn36xx: Add support for Factory Test Mode (FTM)
# good: [0e82eeffbc29de09d7ba237812692bd2326618cd] clk: qcom: Add ACD path to CPU clock driver for msm8996
git bisect start '4579f43d0d88' '0e82eeffbc29'
# skip: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
git bisect skip a351e9b9fc24e982ec2f0e76379a49826036da12
# skip: [426b8eeb058a16c63759b3f48394601e1ed74e31] Merge tag 'rpmsg-v4.13' of git://github.com/andersson/remoteproc
git bisect skip 426b8eeb058a16c63759b3f48394601e1ed74e31
# skip: [b134165eadd6dd07c49f8db40b218185ca3130b0] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes
git bisect skip b134165eadd6dd07c49f8db40b218185ca3130b0
# skip: [89f23d51defcb94a5026d4b5da13faf4e1150a6f] uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
git bisect skip 89f23d51defcb94a5026d4b5da13faf4e1150a6f
# skip: [cb590700e04d4f59179c44f360217f5ad04ae262] scsi: qla2xxx: Fix recursive loop during target mode configuration for ISP25XX leaving system unresponsive
git bisect skip cb590700e04d4f59179c44f360217f5ad04ae262

Either the kernel fails to build or it will not boot and xbl loads XBLRamDump.
I am guessing that only a few commits have a chance of booting so is there a way to tell git bisect which ones they are?

bisecting out of tree board support code is really difficult because the
bisect algorithm doesn’t know how to keep the board bootable in the
intermediate steps. For example if the bisect lands near v4.12 then
there is not enough code present to boot the DB820C.

If you see this through then you will end up needing to do a lot of
rebasing since you end up having to rebase the v4.11 code base (or its
underlying feature branches) into the kernel under test for each bisect
step. Whilst git rerere will help this is still a huge amount of
detail oriented work.

Runtime tests and source code inspection, especially of freedreno driver
and related clock code, might be quicker. For example, trying to
side-by-side compare the v4.11 and v4.14 kernel logs with drm_debug=0xff
might yield some clues.

Thanks for that, Daniel. It is what I was suspecting. I tried drm_debug=0xff with the v4.14 code before finding v4.11 but could see nothing suspicious. I guess I now must try it with v4.11 and compare.

I noticed the builds at:

https://snapshots.linaro.org/96boards/dragonboard820c/linaro/linux-integration/latest/

I thought I would see if this issue had been resolved, so I copied the modules across and used fastboot as below:

# fastboot -c "root=/dev/disk/by-partlabel/rootfs rw rootwait console=tty0 console=ttyMSM0,115200n8" boot boot-rootfs-linux-integration-v4.18-rc8-256-g554d918dd75d-100-dragonboard820c.img

It booted fine, but still no hdmi.

I also built it myself, but still no hdmi.

I also checked that 4.11.12-30705-g0e82eeffbc29 still works. It does.

Not sure it’s related, but I had similar issue in the past with DB820C, switching from 4.11 to 4.14. I did not take a deep look but noticed that the pixel clock generated for HDMI was different between these kernel versions… forcing to the old pixel value fixed my problem.

If you are able to build 4.11 and 4.14 it would be interesting to had some debug in drivers/gpu/drm/msm/hdmi/hdmi_bridge.c msm_hdmi_bridge_mode_set function, to print and compare the applied hdmi->pixclock value.

Thanks for that Loic.
I had previously made some runs with drm_debug=0xff. Searching for pixclock in the logs I recorded returned the following:

v4.11.12
...
Jul 17 18:05:24 linaro-alip kernel: [    4.024807] [drm:msm_hdmi_audio_update] video: power_on=0, pixclock=65000000
...
Jul 17 18:05:24 linaro-alip kernel: [    4.025042] [drm:msm_hdmi_bridge_pre_enable] pixclock: 65000000
...
Jul 17 18:05:24 linaro-alip kernel: [    4.027001] [drm:msm_hdmi_audio_update] video: power_on=1, pixclock=65000000
...

v4.14.15
...
Jul 17 17:52:16 linaro-alip kernel: [   12.491828] [drm:msm_hdmi_audio_update [msm]] video: power_on=0, pixclock=65000000
...
Jul 17 17:52:16 linaro-alip kernel: [   12.495013] [drm:msm_hdmi_bridge_pre_enable [msm]] pixclock: 65000000
...
Jul 17 17:52:16 linaro-alip kernel: [   12.511671] [drm:msm_hdmi_audio_update [msm]] video: power_on=1, pixclock=65000000
...

I might try putting an additional DBG in msm_hdmi_bridge_mode_set but the above results indicate to me that I probably would find nothing.

I seem to remember that some functionality in the drm code had been transfered to the Power Management code between these versions, so the problem could be there, or indeed, almost anywhere. That is why I was hoping to bisect it.

Many years ago, I had a problem with the firmware for a USB ADSL modem which started loading intermittently after a kernel update. Sometimes the firmware loaded at boot, sometimes it did not. I could find nothing wrong with the device driver, which was quite simple. The issue was deeper in the kernel, almost certainly a race condition. I never did solve that one before replacing the modem with a modem/router.:slight_smile:
This feels like that.

I have obtained some diagnostics from linux-integration-146 which should shine some light on this issue.
Using:
# fastboot boot boot-rootfs-linux-integration-v4.19-256-ge3361db33191-146-dragonboard820c.img

I get the following with the adapter plugged in:
$ pastebinit -i boot-4.19.0.log -b http://pastebin.com
https://pastebin.com/jGrAAbeg

and the following with the adapter removed:
$ pastebinit -i boot-4.19.0_nomonitor.log -b http://pastebin.com
https://pastebin.com/xy68NSiT

Should I report this in the Bugzilla?