GPU Crash During Specific Temperature Range

That’s an extremely convincing test. Was the temperature reading true?

Also, I notice that you picked a 4.14 snapshot. Are you able to reproduce with 5.4, 5.10, and 5.13?

I loaded the Dragonboard 820C with 4.14 because this is the version we are using on our custom board and we wanted a side by side comparison (or close). No, we haven’t tried any newer versions on either the DB820 or the custom board.

We were reading the temperature from here: /sys/class/thermal/thermal_zone0/temp
Thanks,
Kim

  1. try other kernels to see if the problem is fixed anywhere.

  2. I asked you if the temperature was true, not where you read it from. For example, is it actually 25 degrees when you read 8? Or is it actually 8?

Hi @doitright,

  1. We will try the latest kernel on the DB820C soon.

  2. Discussing this with the hardware folks here, we are reading the internal temperature of the chip and it will be the most accurate temperature versus putting a sensor on the chip. The chamber is around -5C when the crash occurs. The chip is at a higher temperature than this. Can you please advise?

Thanks,
Kim

We are not having a lot of luck with getting other versions to boot on the DB820C. We tried 5.13 (build 820) and 5.10 (build 800). Neither booted successfully. We will try 5.7 next. If anyone can point me to which of these builds we should try, that would be great: Linaro Snapshots

We got linux version 5.7 to run on the DragonBoard820c. It does not crash!

Good news then. You’re better off in every way with the newer kernel. Do you think you could get the boot logs for the newer kernels? It would be nice to see why they aren’t booting. 5.10 especially, since its an LTS.

Looks like kernels on builds as;
5.7: 572 through 743 inclusive.
5.10: 744 through 817 inclusive.

If you could bisect that and see what exactly was the first build that didn’t boot, it would help get things straightened out.