"thermal thermal_zone0: critical temperature reached (95 C), shutting down"

After around 15 minutes of using Alpine linux, the HoneyComb LX2 (atleast mine) sends out the message: “thermal thermal_zone0: critical temperature reached (95 C), shutting down”.

I don’t have any heavy load and am in a TTY.

Why is that? In this video for example, the HoneyComb reaches 80 degrees with load with the stock fan: Heatsink Mod and thermal benchmark on the SolidRun HoneyComb Arm Workstation - YouTube.

I have the stock fan installed aswell.

We have found a recent bug in the I2C driver code that seems like the newest SOCs from NXP are more susceptible to triggering it. This is most likely not an overheating issue but a false reading and I am working on firmware and Linux driver patches to rectify the bug. Additionally I am reworking the thermal code to be more tolerant to these failures. These patches should all be out in the coming week. Until then you can add thermal.crt=-1 to your kernel commandline to avoid the critical shutdown happening. There as not any danger this will damage your SOC as it has an internal safety shutoff at 105C

1 Like

Thank you Jon, I was also wondering if the system or the distribution is confusing Celsius with Fahrenheit. But it seams like that the file:"/sys/class/thermal/thermal_zone0/trip_point_0_temp" is causing this. the value in this file is set to 95000.
This file is different across distributions, isn’t it?

That temperature is set by the thermalzone TZ ACPI table. The issue is that the I2C sensors return invalid values higher than that temp and then Linux shuts itself off.

This is a new issue but I am very close to a published solution.

1 Like

When I had the problem described by @jnettlet, it said that the critical temperature was reached too, but reported 255 C, not 95 C. 255 C is definitely an erroneous reading, but I can imagine 95 really being reached if the cooling isn’t working right?

There is a bug in the I2C clocking code between both the firmware and the linux kernel. I have solved this in the recent firmware and kernel that is in testing. First off I have patches for both firmware and kernel to fix the I2C clocking code so that device communication is a lot more reliable. Secondly I have split the thermal zone code so that the internal TMU is only responsible for critical shutdown, removing I2C from this signal completely. Additionally I have moved the full speed fan trigger to the TMU as well so even if I2C is malfunctioning or not supported by the host OS we will still have basic fan management that doesn’t rely on I2c, just GPIO controlled via system registers.

Additionally we have removed the secondary temperature sensor and are only using the external PWM controller sensor. Since we need to access the PWM over I2C to control the fan speed we figured using this temp reading was best. It will also simplify future designs.