Modifying and building u-boot

I must be missing something, but I don’t see a LX2160A-based board in Solidrun’s u-boot git repository. Are the instructions somewhere on how to build u-boot for Honeycomb?

Background: We have noticed hangups in specific scenarios involving load-exclusive/store-exclusive instructions. These appear similar to issues observed on a different A72-based board, for which the solution is to set bit 31 in CPUACTLR_EL1 (see Documentation – Arm Developer).

–Elad

Looks like I was mislead by the presence of a u-boot tree in SolidRun’s github repository. As far as I can tell the way to build u-boot for Honeycomb is to clone SolidRun/lx2160a_build and build a full image, which seems a bit heavy.

–Elad

I finally figured out how to change u-boot and update it on the SD card. Unfortunately, enabling snoop-delayed exclusive handling prevents u-boot from starting. I added the code in the same place as the other errata workarounds that update this register. Not sure what is going on.

–Elad

The patch you need to unlock this is here, fix(layerscape): unlock write access for SMMU SMMU_CBn_ACTLR · nxp-qoriq/atf@4414070 · GitHub

We are working on rebasing the lx2160a_build BSP to the latest version that will include it by default.

Thanks, Jon. The patch does not fix the hang, but now I understand that the problem is (likely) that CPUACTLR_EL1 access is restricted to EL3.

Perhaps I can update the ATF code to set that bit?

–Elad

OK, got it to work by patching ATF.

root@honeycomb:~/src$ ./cpuactlr 0x200
cpu=9 cpuactlr_el1=80000180000000 (80000000)
root@honeycomb:~/src$ ./cpuactlr 0x1
cpu=0 cpuactlr_el1=80000180000000 (80000000)
root@honeycomb:~/src$ ./cpuactlr 0x2
cpu=1 cpuactlr_el1=80000180000000 (80000000)

If this indeed fixes the hang problems then you may want to add this patch to the upstream BSP.

–Elad

I ran a stress test for 24 hours on two Honeycomb boards (yes, we pretty much bought all of the Canadian stock…). The board that has bit 31 of CPUACTLR_EL1 cleared (the default) froze multiple times and had to be rebooted after each freeze. The board that has bit 31 set with the modified firmware is alive and kicking.
I think this is an issue with A72-based SoCs that should probably be reported to NXP and ARM. TI is already aware, as we have seen the same problem on one of their boards.

–Elad

Both ARM and NXP are aware. The errata has to do with the prefetcher of the MMU. Thanks for testing.

Do you have a reference number for the erratum? The patch you posted above seems unrelated.

I apologize for the confusion. My patch is for another hang errata. Can you post a test case so I can test and reproduce the bug you are addressing?

The test case won’t help you much unless you have access to QNX 7.1. The hang itself is in the kernel, and requires running in EL1.

What we suspect is happening is that a core fails to acquire a spin lock, getting stuck in WFE forever:

  1. Core A reads the value of a spin lock with ldaxr and observes that it is busy
  2. Core A issues wfe
  3. Core B releases the spin lock by writing 0 using stlr

Step 3 should emit an event, per section B2.9.2 of the ARMv8 manual, as the global monitor transitions to an open state, but that doesn’t seem to happen in some cases. Replacing wfe with yield avoids the hang, as does adding an explicit sev after stlr for releasing the lock.

Note that 99.9% of the time the system behaves just fine and it takes a kernel-heavy stress test on all cores running for about half an hour before the hang happens.

When we saw a similar issue on another board it turned out that the hardware provider has their own cache coherency implementation and not the ARM CCI. I don’t know if that is the case here.

–Elad

Thanks for the update. The LX2160a uses ARM’s CCN-508 CHI implementation for coherency. I doubt there are any modifications. There is an errata regarding entering and exiting wfi/wfe and atomic operations. I will review that again and look at NXP’s assembly to verify it meets their recommend workaround.