Hikey 620 shutdown after heating up

96boards

#1

Hi,

  • I am using hikey 620 with uboot. I face thermal shutdown after just five minutes. Board gets heated up very much

  • I tried with 4.4 and 4.9 kernel but same issue is observed

Thanks


#2

Hi @ivid,

Are you using Debian?

If you are using Android, I think it should not have such kind issue, due Android 4.4 and 4.9 kernel use “power_allocator” governor by default.

If you are using Debian, you have two options to fix this issue:


#3

@leo-yan

Thank you very much.
I am using normal Image and dtb on a busybox based rootfs. I am not using android.

I will check as you said and let you know the results?
Is there any way that I can monitor the temperature variations in kernel and compare the results?


#4

Hi @ivid, you are welcome.

Yes, Jorge before gave method to monitor temperature:

watch -n1 cat /sys/class/thermal/thermal_zone0/temp

I usually use below command to monitor temperarure:

while true; do cat /sys/class/thermal/thermal_zone0/temp; sleep 1; done


#5

@leo-yan Thank you for the info.

I tried two approaches as suggested by you and both did not work.

  1. First I applied your patch - cpufreq: hisilicon: add acpu driver since it was not present in my 4.9 kernel source and compiled it as builtin module
  • Then since I was using the ‘stepwise governor’, I made power_allocator as governor as below

ivid@ivid:~# cat /sys/class/thermal/thermal_zone0/policy
step_wise
echo “power_allocator” > /sys/class/thermal/thermal_zone0/policy

  • And watched the temp but it went on rising to 100 degree and reboots

  • Below is my config for the same
    > CONFIG_THERMAL=y
    > CONFIG_THERMAL_HWMON=y
    > CONFIG_THERMAL_OF=y
    > # CONFIG_THERMAL_WRITABLE_TRIPS is not set
    > CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
    > # CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
    > # CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
    > # CONFIG_THERMAL_DEFAULT_GOV_POWER_ALLOCATOR is not set
    > # CONFIG_THERMAL_GOV_FAIR_SHARE is not set
    > CONFIG_THERMAL_GOV_STEP_WISE=y
    > # CONFIG_THERMAL_GOV_BANG_BANG is not set
    > # CONFIG_THERMAL_GOV_USER_SPACE is not set
    > CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y
    > CONFIG_CPU_THERMAL=y
    > # CONFIG_CLOCK_THERMAL is not set
    > CONFIG_THERMAL_EMULATION=y
    > CONFIG_HISI_THERMAL=y
    > # CONFIG_HI3660_THERMAL is not set
    > # CONFIG_MAX77620_THERMAL is not set
    > # CONFIG_QORIQ_THERMAL is not set
    > # CONFIG_ROCKCHIP_THERMAL is not set
    > # CONFIG_RCAR_THERMAL is not set
    > # CONFIG_ARMADA_THERMAL is not set
    > CONFIG_MTK_THERMAL=y
    > CONFIG_EXYNOS_THERMAL=y
    > # CONFIG_GENERIC_ADC_THERMAL is not set
    > CONFIG_ACPI_THERMAL=y
    > CONFIG_CPU_FREQ=y
    > CONFIG_CPU_FREQ_STAT=y
    > CONFIG_CPU_FREQ_STAT_DETAILS=y
    > CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
    > # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
    > # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
    > # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
    > # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
    > # CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
    > CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
    > # CONFIG_CPU_FREQ_GOV_POWERSAVE is not set
    > # CONFIG_CPU_FREQ_GOV_USERSPACE is not set
    > # CONFIG_CPU_FREQ_GOV_ONDEMAND is not set
    > # CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set
    > # CONFIG_CPU_FREQ_GOV_SCHEDUTIL is not set

  • Got below backtrace in dmesg

[ 0.691544] sysfs: cannot create duplicate filename ‘/devices/platform/cpufreq-dt’
[ 0.691706] WARNING: CPU: 6 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5c/0x78
[ 0.694593] [] hisi_acpu_cpufreq_driver_init+0x4c/0x60
[ 0.694900] kobject_add_internal failed for cpufreq-dt with -EEXIST, don’t try to register things with the same name in the same directory.
[ 0.695091] WARNING: CPU: 6 PID: 1 at lib/kobject.c:240 kobject_add_internal+0x210/0x290
[ 0.697877] [] hisi_acpu_cpufreq_driver_init+0x4c/0x60
[ 0.786476] ledtrig-cpu: registered to indicate activity on CPUs

  1. I added the fix patch for stepwise governor, but the result is the same.

Could you please check the above observations

Thanks
Ivid


#6

I suspect you are missing two other dependency drivers: mailbox driver and stub clock driver, these two drivers are fundamental drivers and used by cpufreq driver for CPU frequency change. could you check them if existed in your code base?

https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-4.9/drivers/clk/hisilicon/clk-hi6220-stub.c
https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-4.9/drivers/mailbox/hi6220-mailbox.c

DT binding:
https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-4.9/arch/arm64/boot/dts/hisilicon/hi6220.dtsi#280
https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-4.9/arch/arm64/boot/dts/hisilicon/hi6220.dtsi#793


#7

@leo-yan I checked as you suggested and checked the “mailbox driver and stub clock driver”. I already have them in my source. But I still face thermal issue and within just 5 mins it goes into shutdown

For your information, there are xenomai-3 arm64 patches applied.

  • I have ported ipipe and xenomai patches. Are there any hikey specific patches to port? I could not find them in xenomai-jro

Dmesg log:

[ 0.670433] i2c /dev entries driver
[ 0.676729] hisi_thermal f7030700.tsensor: THERMAL ALARM: T > 0
[ 0.676801] hisi_thermal f7030700.tsensor: failed to register sensor id 0: -19
[ 0.676879] hisi_thermal f7030700.tsensor: failed to register thermal sensor: -19
[ 0.676976] hisi_thermal f7030700.tsensor: failed to register sensor id 1: -19
[ 0.677057] hisi_thermal f7030700.tsensor: failed to register thermal sensor: -19
[ 0.692252] hisi_thermal f7030700.tsensor: failed to register sensor id 3: -19
[ 0.692327] hisi_thermal f7030700.tsensor: failed to register thermal sensor: -19
[ 0.703694] sysfs: cannot create duplicate filename ‘/devices/platform/cpufreq-dt’
[ 0.703794] ------------[ cut here ]------------
[ 0.703857] WARNING: CPU: 5 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5c/0x78
[ 0.704138] I-pipe domain: Linux
[ 0.704175] task: ffff800005f90000 task.stack: ffff800005f98000
[ 0.704237] PC is at sysfs_warn_dup+0x5c/0x78
[ 0.704286] LR is at sysfs_warn_dup+0x5c/0x78
[ 0.704334] pc : [] lr : [] pstate: 60000045
[ 0.704408] sp : ffff800005f9bb40
[ 0.704445] x29: ffff800005f9bb40 x28: 0000000000000000
[ 0.704506] x27: ffff000008e038a8 x26: ffff000008d95368
[ 0.704565] x25: ffff000008d343b8 x24: ffff000008d40450
[ 0.704625] x23: ffff800033089aa8 x22: 0000000000000000
[ 0.704686] x21: ffff800005f2d780 x20: ffff8000331d0900
[ 0.704747] x19: ffff800033d60000 x18: 0000000000000010
[ 0.704808] x17: 0000000000000000 x16: 0000000000000000
[ 0.704868] x15: ffff000088f30c27 x14: 72667570632f6d72
[ 0.704931] x13: 6f6674616c702f73 x12: 6563697665642f27
[ 0.704993] x11: 20656d616e656c69 x10: 00000000000000e2
[ 0.705058] x9 : 7075642065746165 x8 : 726320746f6e6e61
[ 0.705125] x7 : 63203a7366737973 x6 : ffff000008f30c7d
[ 0.705191] x5 : ffff000008514b80 x4 : 0000000000000001
[ 0.705273] x3 : ffff800035f788e8 x2 : ffff000008e61f30
[ 0.705332] x1 : 0000000000000046 x0 : 0000000000000046

[ 0.705411] —[ end trace 8a8129406cdb171a ]—
[ 0.705457] Call trace:
[ 0.705487] Exception stack(0xffff800005f9b960 to 0xffff800005f9ba90)
[ 0.705555] b960: ffff800033d60000 0001000000000000 ffff800005f9bb40 ffff00000827d864
[ 0.705631] b980: 0000000060000045 0000000000000000 0000000000000006 0000000000000000
[ 0.705708] b9a0: 0000000000000000 0000000000000056 0000000100000046 ffff000008f35050
[ 0.705787] b9c0: ffff800005f9ba60 ffff000008100b78 ffff800033d60000 ffff8000331d0900
[ 0.705865] b9e0: ffff800005f2d780 0000000000000000 ffff800033089aa8 ffff000008d40450
[ 0.705942] ba00: ffff000008d343b8 ffff000008d95368 0000000000000046 0000000000000046
[ 0.706022] ba20: ffff000008e61f30 ffff800035f788e8 0000000000000001 ffff000008514b80
[ 0.706100] ba40: ffff000008f30c7d 63203a7366737973 726320746f6e6e61 7075642065746165
[ 0.706179] ba60: 00000000000000e2 20656d616e656c69 6563697665642f27 6f6674616c702f73
[ 0.706256] ba80: 72667570632f6d72 ffff000088f30c27
[ 0.706308] [] sysfs_warn_dup+0x5c/0x78
[ 0.706366] [] sysfs_create_dir_ns+0x9c/0xb0
[ 0.706429] [] kobject_add_internal+0x90/0x290
[ 0.706491] [] kobject_add+0x84/0xd0
[ 0.706550] [] device_add+0xcc/0x568
[ 0.706605] [] platform_device_add+0x180/0x240
[ 0.706667] [] platform_device_register_full+0xe8/0x110
[ 0.706740] [] hisi_acpu_cpufreq_driver_init+0x4c/0x60
[ 0.706811] [] do_one_initcall+0x38/0x128
[ 0.706871] [] kernel_init_freeable+0x1ac/0x24c
[ 0.706937] [] kernel_init+0x10/0x100
[ 0.706992] [] ret_from_fork+0x14/0x30
[ 0.707049] kobject_add_internal failed for cpufreq-dt with -EEXIST, don’t try to register things with the same name in the same directory.
[ 0.707187] ------------[ cut here ]------------
[ 0.707239] WARNING: CPU: 5 PID: 1 at lib/kobject.c:240 kobject_add_internal+0x210/0x290
[ 0.707528] I-pipe domain: Linux
[ 0.707564] task: ffff800005f90000 task.stack: ffff800005f98000
[ 0.707625] PC is at kobject_add_internal+0x210/0x290
[ 0.707681] LR is at kobject_add_internal+0x210/0x290
[ 0.707735] pc : [] lr : [] pstate: 60000045
[ 0.707808] sp : ffff800005f9bba0
[ 0.707845] x29: ffff800005f9bba0 x28: 0000000000000000
[ 0.707904] x27: ffff000008e038a8 x26: ffff000008d95368
[ 0.707964] x25: ffff000008d343b8 x24: ffff000008d40450
[ 0.708024] x23: ffff800033089aa8 x22: 0000000000000000
[ 0.708084] x21: ffff000008ee7550 x20: 00000000ffffffef
[ 0.708142] x19: ffff800033089820 x18: 0000000000000010
[ 0.708201] x17: 0000000000000000 x16: 0000000000000000
[ 0.708261] x15: ffff000088f30c27 x14: 747369676572206f
[ 0.708321] x13: 7420797274207427 x12: 6e6f64202c545349
[ 0.708380] x11: 5845452d20687469 x10: 0000000000000118
[ 0.708440] x9 : 6675706320726f66 x8 : 6f74636572696420
[ 0.708499] x7 : 656d617320656874 x6 : ffff000008f30cb6
[ 0.708561] x5 : ffff000008514b80 x4 : 0000000000000001
[ 0.708623] x3 : ffff800035f788e8 x2 : ffff000008e61f30
[ 0.708684] x1 : 000000000000007f x0 : 000000000000007f

[ 0.708764] —[ end trace 8a8129406cdb171b ]—
[ 0.708816] Call trace:
[ 0.708846] Exception stack(0xffff800005f9b9c0 to 0xffff800005f9baf0)
[ 0.708917] b9c0: ffff800033089820 0001000000000000 ffff800005f9bba0 ffff0000083a3a08
[ 0.709006] b9e0: 0000000060000045 0000000000000000 0000000000000002 0000000000000000
[ 0.709094] ba00: 0000000000000000 000000000000008f 000000010000007f ffff000008f35050
[ 0.709184] ba20: ffff800005f9bac0 ffff000008100b78 ffff800033089820 00000000ffffffef
[ 0.709286] ba40: ffff000008ee7550 0000000000000000 ffff800033089aa8 ffff000008d40450
[ 0.709366] ba60: ffff000008d343b8 ffff000008d95368 000000000000007f 000000000000007f
[ 0.709445] ba80: ffff000008e61f30 ffff800035f788e8 0000000000000001 ffff000008514b80
[ 0.709527] baa0: ffff000008f30cb6 656d617320656874 6f74636572696420 6675706320726f66
[ 0.709606] bac0: 0000000000000118 5845452d20687469 6e6f64202c545349 7420797274207427
[ 0.709684] bae0: 747369676572206f ffff000088f30c27
[ 0.709736] [] kobject_add_internal+0x210/0x290
[ 0.709802] [] kobject_add+0x84/0xd0
[ 0.709857] [] device_add+0xcc/0x568
[ 0.709912] [] platform_device_add+0x180/0x240
[ 0.709977] [] platform_device_register_full+0xe8/0x110
[ 0.710047] [] hisi_acpu_cpufreq_driver_init+0x4c/0x60
[ 0.710116] [] do_one_initcall+0x38/0x128
[ 0.710175] [] kernel_init_freeable+0x1ac/0x24c
[ 0.710241] [] kernel_init+0x10/0x100
[ 0.710296] [] ret_from_fork+0x14/0x30
[ 0.711387] sdhci: Secure Digital Host Controller Interface driver
[ 0.711451] sdhci: Copyright© Pierre Ossman
[ 0.711693] Synopsys Designware Multimedia Card Interface Driver

Thanks
Ivid


#8

@ldts

Any suggestions on this?


#9

Hi @ivid, could you check CPUFreq driver has been enabled?

root@linaro-developer:~# ls /sys/devices/system/cpu/cpu0/cpufreq
affected_cpus cpuinfo_max_freq cpuinfo_transition_latency scaling_available_frequencies scaling_cur_freq scaling_governor scaling_min_freq
cpuinfo_cur_freq cpuinfo_min_freq related_cpus scaling_available_governors scaling_driver scaling_max_freq scaling_setspeed


#10

I checked 4.9 kernel, you don’t need to apply the patch “cpufreq: hisilicon: add acpu driver”; 4.9 kernel has added “hi6220” supporting in the file drivers/cpufreq/cpufreq-dt-platdev.c; so by default CPUFreq driver should be enabled. Please check if folder “/sys/devices/system/cpu/cpu0/cpufreq” exists or not?

According to your posting for kernel configuration, I can see you have enabled “CONFIG_CPU_THERMAL”, so this can let thermal management cap CPU capacity if SoC temperature crosses trip point:

CONFIG_CPU_THERMAL=y

Could you check the folder “/sys/class/thermal/thermal_zone0/”? You could see there have node “/sys/class/thermal/thermal_zone0/cdev0/”, this means the binding for CPU cooling device successfully.

root@linaro-developer:~# ls /sys/class/thermal/thermal_zone0/
available_policies cdev0_weight k_d k_pu policy subsystem trip_point_0_hyst trip_point_1_hyst type
cdev0 emul_temp k_i mode power sustainable_power trip_point_0_temp trip_point_1_temp uevent
cdev0_trip_point integral_cutoff k_po offset slope temp trip_point_0_type trip_point_1_type


#11

Hi @leo-yan

Thank u very much for ur help.

I was able to solve the issue yesterday after enabling cpufreq driver. I saw that "ARM_HISI_ACPU_CPUFREQ has been removed by 3920be471ce7f “cpufreq:
hisilicon: Use generic platdev driver”. So I was getting a backtrace when trying hisi-acpu-cpufreq.

There was no “/sys/devices/system/cpu/cpu0/cpufreq” folder as you said.

Solution was to enable “CONFIG_STUB_CLK_HI6220=y” since it was disabled before and cpufreq driver was dependent on it. I did not need to set power-allocator as cpu governor. I still use stepwise and temperature does not go above 80 degree at all

Thanks,
Ivid


#12

@leo-yan,

Sorry for the trouble, but I recently tried a hackbench test and Hikey started heatng up very fast with cpufreq driver enabled. I am using ondemand cpufreq governor and power allocator thermal governor

Thanks,
Ivid


#13

Could you tell what kernel version are you using?

AFAIK if you are using mainline kernel, you need change to latest ARM-TF for CPU errata workaround.


#14

I am using mainline 4.9 kernel and I am using uboot.

Could you please give me some link or so about this latest ARM-TF with CPU errata workaround so that I can compile ATF?

Thanks,
Ivid


#15

If you are using mainline kernel v4.9, I don’t think this CPU errata is the main root cause; I observe the mainline kernel has one patch: “arm64: add basic VMAP_STACK support”, which introduce the regression for kernel panic, especially it introduces panic so at the end the thermal management cannot work [1].

For you case, I think it’s unlikely related with this. Do you see any kernel error logs for hackbench testing? E.g. if there have any log for mailbox for transferring data failure?

Could you check if these dt bindings in your code base, “power allocator” governor needs these two bindings so can work well:
https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-4.9/arch/arm64/boot/dts/hisilicon/hi6220.dtsi#95
https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-4.9/arch/arm64/boot/dts/hisilicon/hi6220.dtsi#897

[1] http://archive.armlinux.org.uk/lurker/message/20171017.003054.898d0678.en.html


#16

Hi @leo-yan

Thank u for the info, but I checked the dtb entry and its already present

But there is no entry as below

sched-energy-costs = <&CPU_COST &CLUSTER_COST>;


#17

This is not necessary because this binding is related with energy aware scheduling but no matter with thermal management. If possible, could you upload your kernel tree so I might do quick check for it?

Otherwise I have no more hints for this.