Hi,
We have an LX2160 HoneyComb board which has a network interface PCI card plugged into in its PCI slot. What we have observed is that the interrupts sent from the PCI card to host CPU are no longer seen by the host CPU (/proc/interrupts
shoes zero counts for the card’s interrupts). We have tried both our own custom PCI network card and an off the shelf Intel 1Gb NIC and both exhibit the same “loss of interrupts”.
Note that a full power cycle restores operation.
This looks like a different code path is taken to initialise the hardware on reboot compared to a power cycle. The only evidence I have for this is that on a reboot the following messages appear in the boot log that are not present after a power cycle:
Re-Distributor 0 LPI is already enabled
which comes from arch/arm/lib/gic-v3-its.c
in the U-Boot sources. This may or may not be relevant.
Any help solving this issue would be appreciated.
Antony.
can you provide a pastebin of your dmesg output on an unsuccessful boot, as well as the output of cat /proc/interrupts
thanks
Following is the console output from the board on a bad reboot SolidRun LX2160 Bad Reboot - Pastebin.com (and this is the console output on a good boot after a power cycle SolidRun LX2160 Good Boot - Pastebin.com). The output from cat /proc/interrupts
:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
9: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 25 Level vgic
11: 4103 3063 4548 3384 3501 4431 4950 3255 3326 3784 4181 3416 1773 1778 1808 1857 GICv3 30 Level arch_timer
12: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 27 Level kvm guest vtimer
14: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 23 Level arm-pmu
19: 4436 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 66 Level 2000000.i2c
20: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 67 Level 2020000.i2c
21: 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 106 Level 2040000.i2c
22: 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 57 Level 20c0000.spi
23: 6547 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 60 Level mmc0
24: 265 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 95 Level mmc1
25: 1535 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 64 Level uart-pl011
27: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 68 Level gpio-cascade, gpio-cascade
28: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 69 Level gpio-cascade, gpio-cascade
30: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 76 Level 2800000.timer
31: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 112 Level xhci-hcd:usb1
32: 115 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 113 Level xhci-hcd:usb3
33: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 165 Level ahci-qoriq[3200000.sata]
34: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 168 Level ahci-qoriq[3210000.sata]
35: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 129 Level ahci-qoriq[3220000.sata]
36: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 132 Level ahci-qoriq[3230000.sata]
37: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 150 Level PCIe PME, aerdrv
38: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 160 Level PCIe PME, aerdrv
39: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 45 Level arm-smmu global fault
40: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 46 Level arm-smmu global fault
41: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 47 Level arm-smmu global fault
42: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 48 Level arm-smmu global fault
43: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 243 Level arm-smmu global fault
44: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 244 Level arm-smmu global fault
45: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 245 Level arm-smmu global fault
46: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 246 Level arm-smmu global fault
47: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 247 Level arm-smmu global fault
48: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 248 Level arm-smmu global fault
49: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 249 Level arm-smmu global fault
50: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 250 Level arm-smmu global fault
51: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 251 Level arm-smmu global fault
52: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 252 Level arm-smmu global fault
121: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230000 Edge dpmac.10
122: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230001 Edge dpmac.9
123: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230002 Edge dpmac.8
129: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230008 Edge dprtc.0
130: 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230009 Edge dpio.15
131: 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230010 Edge dpio.14
132: 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230011 Edge dpio.13
133: 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230012 Edge dpio.12
134: 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230013 Edge dpio.11
135: 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230014 Edge dpio.10
136: 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 ITS-fMSI 230015 Edge dpio.9
137: 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 ITS-fMSI 230016 Edge dpio.8
138: 0 0 0 0 0 0 0 0 27 0 0 0 0 0 0 0 ITS-fMSI 230017 Edge dpio.7
139: 0 0 0 0 0 0 0 0 0 91 0 0 0 0 0 0 ITS-fMSI 230018 Edge dpio.6
140: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ITS-fMSI 230019 Edge dpio.5
141: 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 ITS-fMSI 230020 Edge dpio.4
142: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 ITS-fMSI 230021 Edge dpio.3
143: 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 ITS-fMSI 230022 Edge dpio.2
144: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 ITS-fMSI 230023 Edge dpio.1
145: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 ITS-fMSI 230024 Edge dpio.0
146: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-fMSI 230025 Edge dprc.1
147: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ITS-fMSI 230026 Edge dpni.0
148: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 ITS-fMSI 230027 Edge dpni.1
377: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mpc8xxx-gpio 0 Edge sfp-0-mod-def0
378: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mpc8xxx-gpio 9 Edge sfp-1-mod-def0
379: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mpc8xxx-gpio 10 Edge sfp-2-mod-def0
380: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mpc8xxx-gpio 11 Edge sfp-3-mod-def0
381: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 mpc8xxx-gpio 6 Edge power
382: 485 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 172 Level 8010000.jr
383: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 173 Level 8020000.jr
384: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 174 Level 8030000.jr
385: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-MSI 524288 Edge bh2-bmi
386: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-MSI 524289 Edge bh2-rx
417: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-MSI 526336 Edge bh2-bmi
418: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ITS-MSI 526337 Edge bh2-rx
IPI0: 1002 1272 1556 1071 1866 1067 1057 896 985 1306 3999 1225 17 18 19 18 Rescheduling interrupts
IPI1: 953 301 204 177 185 173 166 227 159 170 187 181 153 153 153 154 Function call interrupts
IPI2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Timer broadcast interrupts
IPI5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IRQ work interrupts
IPI6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CPU wake-up interrupts
Err: 0
Note that the kernel configuration we are using is not the stock version since we have disabled everything related so sound and graphics and enabled most networking options (can provided the config if needed). We are also running a minimalist Ubuntu 22.04 LTS as the distribution.
can you also provide the output of lspci -vvv
in both cases. In both cases your devices should be using MSI interrupts, so I don’t believe the error message you are seeing is relevant.
In both cases the cards are detected so I am guessing that most likely this is due to pcie powersaving misbehaving. Could you please also do a boot with pcie_aspm=off
added to the kernel commandline to see if that makes a difference?
See SolidRun LX2160 Good lspci -vvv - Pastebin.com for the lspci -vvv output after a good boot and see SolidRun LX2160 Bad lspci -vvv - Pastebin.com after a bad reboot. This is with our custom PCI card installed (rather than the Intel NIC). The only difference I see in the output is the following change:
- Good:
Masking: fffffff8 Pending: 00000000
- Bad
Masking: fffffffc Pending: 00000000
in the capabilities configuration of the PCI card.
I added the pcie_aspm=off
to the kernel boot options but this did not make any difference (problem still present).
Does the card work if MSI interrupts are disabled? pci=nomsi
oh I just notice are you using SR-IOV on these cards?
The answers to your questions:
-
Work with
pci=nomsi
? - No
-
Are you using SR-IOV ? - No
Hi,
We have run into the exact same issue, even with LSDK21.08.
Does anyone have a solution too this ?
Cheers,
Neil
Is the card detected fine and shows up in lspci? My initial thought is that the card is having issues by not getting a hard reset on reboot.
Card is detected fine, Enumerated correctly (all BARS allocated correctly), driver loads correctly, but no interrupts trigger when the card writes to its MSI address. Card only supports MSI interrupts and does not support ASPM, this is correctly advertised in the PCIe configuration space, and reported by lspci. The card is reset correctly, we can communicate with it, seems the host is ignoring the write to the MSI address. Power cycle the board and all is fine.
Thanks for resurrecting this thread…
My initial thought is that the card is having issues by not getting a hard reset on reboot.
If this is the case then what can be done to get the PCIe reset (i.e. the PERSTn signal) to be asserted on reboot - do we need a CPLD/FPGA firmware update ?
Without a working PERSTn, a power cycle will be needed after every reboot.
Note that in addition to NICs (our own design and off the shelf) we have seen a similar "not work after reboot) problem appear on an NVME card (a Sabrent in this case).
I was curious about PERST, because our reset for PCIe and all the other external devices is tied to the reset of the system. You can review this under the Reset Logic and Boot Select section of the CEX7 simplified schematic, The SYSRST_OUT signal is tied into most the hardware reset signals. Sometimes some PCIe devices such as FPGAs require a longer time to reset and load firmware and the timing of PERST vs CLOCK etc can cause issues on reset.
If you don’t mind, and this isn’t a recommended solution just a debugging option, it would be interesting to know if you don’t see this issue if you use our UEFI firmware.
Otherwise we will probably need some more debug output from the IRQ and PCIe initialization to dig into this further.