Modifying and building u-boot

elahav · September 6, 2023, 8:04pm

I must be missing something, but I don’t see a LX2160A-based board in Solidrun’s u-boot git repository. Are the instructions somewhere on how to build u-boot for Honeycomb?

Background: We have noticed hangups in specific scenarios involving load-exclusive/store-exclusive instructions. These appear similar to issues observed on a different A72-based board, for which the solution is to set bit 31 in CPUACTLR_EL1 (see Documentation – Arm Developer).

–Elad

elahav · September 7, 2023, 11:21am

Looks like I was mislead by the presence of a u-boot tree in SolidRun’s github repository. As far as I can tell the way to build u-boot for Honeycomb is to clone SolidRun/lx2160a_build and build a full image, which seems a bit heavy.

–Elad

elahav · September 7, 2023, 7:47pm

I finally figured out how to change u-boot and update it on the SD card. Unfortunately, enabling snoop-delayed exclusive handling prevents u-boot from starting. I added the code in the same place as the other errata workarounds that update this register. Not sure what is going on.

–Elad

jnettlet · September 8, 2023, 5:48am

The patch you need to unlock this is here, fix(layerscape): unlock write access for SMMU SMMU_CBn_ACTLR · nxp-qoriq/atf@4414070 · GitHub

We are working on rebasing the lx2160a_build BSP to the latest version that will include it by default.

elahav · September 8, 2023, 9:40am

Thanks, Jon. The patch does not fix the hang, but now I understand that the problem is (likely) that CPUACTLR_EL1 access is restricted to EL3.

Perhaps I can update the ATF code to set that bit?

–Elad

elahav · September 8, 2023, 10:09am

OK, got it to work by patching ATF.

root@honeycomb:~/src$ ./cpuactlr 0x200
cpu=9 cpuactlr_el1=80000180000000 (80000000)
root@honeycomb:~/src$ ./cpuactlr 0x1
cpu=0 cpuactlr_el1=80000180000000 (80000000)
root@honeycomb:~/src$ ./cpuactlr 0x2
cpu=1 cpuactlr_el1=80000180000000 (80000000)

If this indeed fixes the hang problems then you may want to add this patch to the upstream BSP.

–Elad

elahav · September 9, 2023, 8:59pm

I ran a stress test for 24 hours on two Honeycomb boards (yes, we pretty much bought all of the Canadian stock…). The board that has bit 31 of CPUACTLR_EL1 cleared (the default) froze multiple times and had to be rebooted after each freeze. The board that has bit 31 set with the modified firmware is alive and kicking.
I think this is an issue with A72-based SoCs that should probably be reported to NXP and ARM. TI is already aware, as we have seen the same problem on one of their boards.

–Elad

jnettlet · September 10, 2023, 4:21am

Both ARM and NXP are aware. The errata has to do with the prefetcher of the MMU. Thanks for testing.

elahav · September 10, 2023, 10:08am

Do you have a reference number for the erratum? The patch you posted above seems unrelated.

jnettlet · September 13, 2023, 4:45am

I apologize for the confusion. My patch is for another hang errata. Can you post a test case so I can test and reproduce the bug you are addressing?

elahav · September 13, 2023, 9:48am

The test case won’t help you much unless you have access to QNX 7.1. The hang itself is in the kernel, and requires running in EL1.

What we suspect is happening is that a core fails to acquire a spin lock, getting stuck in WFE forever:

Core A reads the value of a spin lock with ldaxr and observes that it is busy
Core A issues wfe
Core B releases the spin lock by writing 0 using stlr

Step 3 should emit an event, per section B2.9.2 of the ARMv8 manual, as the global monitor transitions to an open state, but that doesn’t seem to happen in some cases. Replacing wfe with yield avoids the hang, as does adding an explicit sev after stlr for releasing the lock.

Note that 99.9% of the time the system behaves just fine and it takes a kernel-heavy stress test on all cores running for about half an hour before the hang happens.

When we saw a similar issue on another board it turned out that the hardware provider has their own cache coherency implementation and not the ARM CCI. I don’t know if that is the case here.

–Elad

jnettlet · September 13, 2023, 10:10am

Thanks for the update. The LX2160a uses ARM’s CCN-508 CHI implementation for coherency. I doubt there are any modifications. There is an errata regarding entering and exiting wfi/wfe and atomic operations. I will review that again and look at NXP’s assembly to verify it meets their recommend workaround.

Topic		Replies	Views
LX2 honeycomb linux reboot fails with LSDK-21.08 changes NXP LX2160	8	366	February 1, 2022
SMP boot sequence NXP LX2160	6	373	July 7, 2022
In-place updates: kernel, firmware etc NXP LX2160	6	917	May 5, 2025
USB boot on Arch linux ISO, arm-smmu errors. Board is no more booting NXP LX2160	4	241	May 5, 2022
LX2 honeycomb requires serial connection to boot, linux reboot fails NXP LX2160	5	439	January 20, 2022

Modifying and building u-boot

Related topics