Stuck in Initramfs

custom_board

#1

Hello,

I have built my custom board with external memories: 4GB emmc (ref. micron MTFC4GACAJCN-1M WT) and 1GB lpddr3 RAM (ref. MT52L256M32D1PF).

I can get my board in fastboot mode and can flash into emmc. I tried linaro 18.01, official and “custom” versions.
However, when booting in both cases, the board gets stuck in “initramfs”:

ALERT! /dev/disk/by-partlabel/rootfs does not exist. Dropping to a shell!
(initramfs)

See boot logs here (18.01) and here (custom 18.01).

Any idea what could be the cause of the wrong boot? I guess it is something with the configuration but I am a bit stuck right now. I will try to boot from SD card but couldn’t make it yet (no SD card was planned on my board).

Thank you for your help,


#2

Try using the shell provided by the initramfs to have a poke about in /dev/disk/by-partlabel and see what is found there!

Basically this directory is populated from the labels found in the GPT for each partition. It looks like the kernel has not been able to read the “rootfs” label properly.


#3

Yes, btw If you know which partition is the rootfs one you can change the kernel command line to directly point to the correct partition (e.g. root=/dev/mmcblk0p10 instead of root=/dev/disk/by-partlabel/rootfs) and see what happens. If I’m not wrong the by-partlabel directory is populated by the initramfs/udev, generating symlinks. So the next step is to understand why the symlink is not generated, (bad GPT, udev rules…).


#4

from the kernel log, we can see that the kernel does not detect any partition on the eMMC.

[ 2.077204] mmc0: new HS200 MMC card at address 0001
[ 2.077741] mmcblk0: mmc0:0001 Q2J54A 3.64 GiB
[ 2.078015] mmcblk0boot0: mmc0:0001 Q2J54A partition 1 2.00 MiB
[ 2.078271] mmcblk0boot1: mmc0:0001 Q2J54A partition 2 2.00 MiB
[ 2.078529] mmcblk0rpmb: mmc0:0001 Q2J54A partition 3 512 KiB

You might want to check how you ‘flash your eMMC’ and check for errors. You might need to tweak gpt_*.bin files… From the boot log you seem to have an SD card, so I would recommend to make an SD card image and boot linux from SD. From there you can inspect more carefully the emmc and try to understand why the flashing failed.


#5

I blamed that on a logging error ;-). It looked like there is an interruption to the logs at this point and in the other logs (which have similar symptoms) the partitions detect OK.

@Vix: Any clue what’s happening with the missing bits of the first log?


#6

Hello, thank you for your answers;
@danielt @Loic I tried to change the root to point to root=/dev/mmcblk0p10, see the command line below:
(not sure how I am supposed to know that mmcblk0p10 is the rootfs, though)

mkbootimg --kernel Image.gz+dtb
–ramdisk initrd.img
–output boot-db410c.img
–pagesize 2048
–base 0x80000000
–cmdline “root=/dev/mmcblk0p10 rw rootwait console=ttyMSM0,115200n8”

But I still couldn’t get a correct boot (full logs here):

run-init: opening console: No such file or directory
No init found. Try passing init= bootarg.
(initramfs)

You can also notice in the logs that I have no “disk” repository in /dev nor “udev” repository in initramfs…

For information, here are a few details about what I do:

sudo …/qdl prog_emmc_firehose_8916.mbn rawprogram.xml patch.xml

  • then I can access fastboot mode and I flash successfully my board with 18.01 release, then getting problems such as described earlier

@ndec I couldn’t get a boot from SD yet but I am working on it.

About the missings bits in the logs: I notice sometimes the boot is completely different, maybe rebooting from the above state is causing more boot errors… The boots logs given above are those I have directly after the fastboot flash

Last information that could be interesting: in my board power come either from the USB port, or from a battery. Above logs are given when powering from USB port (that could be an issue as USB_HS_ID is then high). When I boot from battery (USB_HS_ID is low), I get the following logs. A bit different but the error is the same.

Sorry if something is not clear, I am not a linux expert and might miss something basic; hence do not hesitate, any help is welcome. Thank you very much !


#7

Looks to me like you haven’t correctly populated your eMMC. Partition 10 is clearly there but is had the wrong partition label and it looks like its content is corrupt or missing. I also note your eMMC is half the size of the DB410C one. Are you sure you have shrunk everything down correctly?


#8

Hello @danielt

Thank you, that is probably the problem indeed.
I am not sure I shrank everything as needed. Could you please highlight me the differents points that needs to be modified or give me some hints about how to do that exactly?

That would be very helpful to me
Thank you for your help,


#9

I think its just a matter of correcting the partition tables (gpt*
files) to that the rootfs partition doesn’t extend past the end of the
disk.

Note that the GPT gets written to the first and last sectors of the
disk. Depending on how you are populating your eMMC you might have
to update whatever it is that writes the GPT to the last sectors.

The rootfs image automatically resizes to fill the partition but you
should check that your partition is large enough to hold the raw image
(gunzip the .img.gz, then use simg2img to convert the sparse image to a
raw image, finally check the raw image fits into whatever you have
allocated for the rootfs).


#10

I think my disk is large enough; for example when flashing 17.06.1 developer rootfs and boot.img, the raw image for rootfs is 2G, even with 4GB emmc I should have plenty of space…

I tried several methods for flashing: QFIL, QDL, fastboot. They all work but I always fail at boot with typical boot logs as here

Note that I sometime have random log errors such as segment violation …

I am not sure about how to correct the partition tables files (gpt_backup0.bin and gpt_main0.bin, or gpt_both0.bin when flashing with fastboot). The only way I see is using db-boot-tools which can generate gpt_both0.bin from a partitions.txt file. But even when trying this way, I could’t get a proper boot… yet

Note: weirdly, I couldn’t get a boot from SD card; it gets stuck at boot:

Error code 3038 at boot_elf_loader.c Line 334

as described here


#11

I am starting to think that it is a problem related to my hardware config.

I tried to boot from SD using U-Boot, following the instructions given here.

Even that way I couldn’t get a proper boot, it ends in kernel panic, see full logs (not always exactly the same, but always the same kind of errors)

So my bet is that something in the config makes the boot fail.
I guess I have to disable features in the boot config and test until I get a clean boot? Or is there something else I am missing?
Thank you,


#12

Firstly I’m a little surprised you are still working with 17.06… I doubt that’s causing your immediate problem but you should plan to upgrade.

On the hardware side it does looks like the board cannot reliably access the SD card (perhaps try an older SD card without UHS-1 support) however there is no equivalent evidence that eMMC is failing for any reason other than that the contents of the eMMC are badly authored.

It is clear you’ve not got the eMMC partition table right:

[    2.242598] Alternate GPT is invalid, using primary GPT.

The message above Indicates that gpt_backup0.bin is either incorrect or has not been written at the right part of the disk.

As I mentioned before, you should use the prompt on the initramfs to explore the system to determine what is happening. The contents of directories such as /dev/disk/by-partlabel is useful to see if the kernel detected the partition labels correctly. Similarly the contents of /sys/devices/platform/soc/7824900.sdhci/mmc_host/mmc0/mmc0:0001/block/mmcblk0 is also likely to be interesting. From here (for example) you can examine mmcblk0p10/size to see if you did manage to update the size correctly. You can also experimentally mount filesystems, etc.

BTW the other useful trick to see how badly broken things are is to create an initramfs using buildroot and replace the one in the boot image. This gives you a tiny busybox rootfs that allows you to explore more easily and use more advanced tools (such as e2fsck or memory tests) to explore what does and doesn’t work on your board.


#13

I am now working with 18.01, but in my quest for debug I also tried with previous versions.

My SD has been soldered by hand, and as it is a bit dirty, this could explain why the board cannot access it reliably. I’'ll try to improve that.

In the previous post, I used a custom GPT that was given to me (this explain the GPT error message).
A bit of summary could be usefull:

  1. Flashing with original GPT

Here, I remove the SD, and flash the emmc with fastboot, using original GPT from dragonboard410c_bootloader_emmc_linux-88, either:

  • with original kernel command line root=/dev/disk/by-partlabel/rootfs (boot logs)
  • or with modified kernel command line root=/dev/mmcblk0p10 (boot logs)
    At boot:
    -> I have no /disk in /dev
    ->the size of mmcblk0p10 in /sys/devices/platform/soc/7824900.sdhci/mmc_host/mmc0/mmc0:0001/block/mmcblk0
    is 7237567, which gives 3.7GB if I’m right.
  1. Flashing with custom GPT

As I have no idea (yet) how to modify/customize the gpt_**.bin files, I gave a try to a custom GPT that I was given; it is named emmc3o5_gpt.bin and flashed with QFIL.
Boot still fails in initramfs (logs for reference)

  1. Flashing with my custom GPT

This is what I want to do but I still have to figure out how to modify the gpt_main/gpt_backup.bin … I never touched hex binary files and I have a hard time figuring out how to do that correctly !

I will keep investigate and with the hints you have given as well, hoping to figure this out ! Thank you


#14

I had a deeper look to the GPT.

First, when flashing with QDL, the secondary GPT is written “automatically” to the last sectors, as seen in the rawprogram.xml by the line:

start_sector="NUM_DISK_SECTORS-33

So this should not be a problem with my 4GB emmc.

Then, I tried to update the addresses of the last sector address for rootfs in gpt_backup and gpt_main0.bin but this does not result in correct boot yet.

When I type blkid I got the following results:

(initramfs) blkid
/dev/mmcblk0p10: LABEL=“rootfs” UUID=“b1026c73-62e0-42fc-a94f-0a02dc1ccef4” TYPE=“ext4” PARTLABEL=“rootfs” PARTUUID=“30d9c2ab-3507-a876-eaa8-ec59e5048b20”
/dev/mmcblk0: PTUUID=“98101b32-bbe2-4bf2-a06e-2bb33d000c20” PTTYPE=“gpt”
/dev/mmcblk0p1: PARTLABEL=“cdt” PARTUUID=“4cdc17cb-23fd-55ef-af7e-44760994d255”
/dev/mmcblk0p2: PARTLABEL=“sbl1” PARTUUID=“39840824-d4da-2c14-5776-7f3d7088ba88”
/dev/mmcblk0p3: PARTLABEL=“rpm” PARTUUID=“6a563696-ad51-02ea-ce78-bbb555315750”
/dev/mmcblk0p4: PARTLABEL=“tz” PARTUUID=“80cca449-efd0-9190-e451-2a815991aa35”
/dev/mmcblk0p5: PARTLABEL=“hyp” PARTUUID=“c048e013-801a-a8cf-f6fe-dc679e035355”
/dev/mmcblk0p6: PARTLABEL=“sec” PARTUUID=“b339503b-ebf9-7913-5291-0daf5ab3f87b”
/dev/mmcblk0p7: PARTLABEL=“aboot” PARTUUID=“5db473d0-3cce-598b-d028-c947638fb5f7”
/dev/mmcblk0p8: PARTLABEL=“boot” PARTUUID=“10019f1b-9b0f-5bb2-1a5a-e47af678998a”
/dev/mmcblk0p9: PARTLABEL=“devinfo” PARTUUID=“e97897fc-ba7d-2902-b148-7c90c646bdf3”
(initramfs)

The partuuid for rootfs is correct (same as in gpt file).

Not sure if this can help, but I also have this error at boot which seems strange, as I have not SD device plugged in:

Assertion ‘path_startswith(device->syspath, “/sys/”)’ failed at …/src/libsystemd/sd-device/sd-device.c:681, function sd_device_get_sys path(). Aborting.


#15

I still do not understand why the symlink are not generated (no /disk in /dev); from @Loic previous post it can be either bad GPT or udev rules. How can I check those udev rules are correct or wrong?


#16

If you are using the initramfs from DB410C then you can be pretty confident the udev rules are correct.

Also note that the “sd” in “sd-device” stands for systemd and is not necessarily anything to do with an SD ard.


#17

Thanks for the help.

When comparing logs between my custom board and a dev board, my logs start to fail after the error mentionned above:

Assertion ‘path_startswith(device->syspath, “/sys/”)’ failed at …/src/libsystemd/sd-device/sd-device.c:681, function sd_device_get_syspath( ). Aborting.
Aborted

Any idea what this error message is about? I have only found this bug report…
Is seems linked to “systemd-gpt-auto-generator”


#18

I am giving it a try; From what I understood I need to generate rootfs.cpio with buildroot, use it as ramdisk when generating new boot.img that I can flash with fastboot and enjoy the new tools in initramfs to check my emmc, etc…

Is that correct?
What do you suggest to test the emmc?

edit: OK I could install buildroot, I can now make more tests


#19

Using buildroot console, I tried fsck on /dev/mmcblk0p10 but got an error

fsck -t ext4 /dev/mmcblk0p10
fsck from util-linux 2.32
fsck: error 2 (No such file or directory) while executing fsck.ext4 for /dev/mmcblk0p10

I can use mmc tool to inspect my emmc but this doesn’t help my boot.

May I ask how do you see that the partition label is wrong ?

I could also “manually” mount the rootfs and then access it. But no correct full boot yet.

mount -t ext4 /dev/mmcblk0p10 /root
[ 238.639330] EXT4-fs (mmcblk0p10): mounted filesystem with ordered data mode. Opts: (null)

Also, for the record, I tried to boot without initrd (empty file), and with the command line modified to root=/dev/mmcblk0p10. Then I have more repeatable boot error: systemd-gpt-auto-generator error, see logs.

Lastly, could my troubles be due to the fact I use emmc5.0 (and not 4.5?). But I do not think so…


#20

hi,

the gpt_both.bin file is generated from

https://git.linaro.org/landing-teams/working/qualcomm/db-boot-tools.git/

See:

gpt_both.bin is used when flashing the GPT with ‘fastboot flash partition’. When you are using QDL, you are using gpt_main.bin and gpt_backup.bin, and these files are released as binaries, and you can’t rebuild them yourself. So if you want to generate your own GPT, you have to use gpt_both.bin and fastboot to flash it.

The missing links in /dev could be related to this error in the boot:

[ 2.277884] systemd-udevd[1511]: unhandled level 2 translation fault (11) at 0xaaaaec000000, esr 0x92000006, in libc-2.26.so[ffff85483000+141000]

Things look really odd in that thread… I am tempting to say that the root cause is something different… like under powered CPU/board… Have you ever found a build that worked well (even a QCOM Android build)?

In any case, the best thing to do right now, is to boot into a ramdisk and stress test the system and the eMMC from the initrd (create/modify GPT on eMMC, read/write stress tests, … ).

oh… and just now, i notice your last sentence about emmc5.0… i don’t think it has been tested yet… see eMMC 4.5 vs eMMC 5.0