vader
December 13, 2022, 11:50am
1
I install open-gpu-kernel-modules with version 525.60.13 , NVIDIA driver with same version and merge LX2160a support for Nvidia open-gpu-kernel-modules · GitHub
But I can not still get GPU info by command “nvidia-smi” , the response as following:
# nvidia-smi
No devices were found
There is pcie info and nvidia kernel info
root@localhost:/lib/firmware/nvidia/525.60.13# lspci
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0001:01:00.0 VGA compatible controller: NVIDIA Corporation Device 2208 (rev a1)
0001:01:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
root@localhost:/lib/firmware/nvidia/525.60.13#
root@localhost:/lib/firmware/nvidia/525.60.13#
root@localhost:/lib/firmware/nvidia/525.60.13#
root@localhost:/lib/firmware/nvidia/525.60.13# lsmod
Module Size Used by
nvidia_drm 65536 0
nvidia_modeset 1417216 1 nvidia_drm
fsl_jr_uio 20480 0
caam_jr 229376 0
nvidia 6107136 1 nvidia_modeset
caamkeyblob_desc 16384 1 caam_jr
crypto_engine 16384 1 caam_jr
rng_core 24576 2 caam_jr
dpaa2_caam 114688 0
caamhash_desc 16384 2 caam_jr,dpaa2_caam
caamalg_desc 40960 2 caam_jr,dpaa2_caam
crct10dif_ce 20480 1
libdes 24576 2 caam_jr,dpaa2_caam
caam 45056 1 caam_jr
error 24576 7 caamalg_desc,caamkeyblob_desc,caamhash_desc,caam,caam_jr,fsl_jr_uio,dpaa2_caam
lm90 28672 0
at24 24576 0
rtc_pcf2127 24576 0
root@localhost:/lib/firmware/nvidia/525.60.13#
Check kernel message, it indicate that gsp Cannot initialize.
[ 1432.182518] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[ 1432.182524] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[ 1432.182531] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 1432.186336] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[ 1432.189462] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
[ 2114.862624] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 2114.862631] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 2114.887078] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[ 2114.887087] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
You appear to be using a device-tree based bootloader and kernel. Currently the NVIDIA device-tree drivers are only tested and working with edk2 and ACPI based firmware.
vader
December 15, 2022, 8:07am
3
I’m a green hand for device tree. Could give me some guidance?
I would need to look at nvidia’s kernel module, but currently they rely on some specific configurations that are ACPI only. The driver is designed mostly for SystemReady ES/SR designed systems.
vader
March 21, 2023, 12:05pm
7
I add
echo "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" | sudo tee /etc/modprobe.d/nvreg_fix.conf > /dev/null
as module params, and then run
nvidia-smi
I successfully get GPU info. But when i run “nvidia-smi” again,there is “No device found” and kernel message as follow
28.205046] NVRM objClInitPcieChipset: *** Chipset Setup Function Error!
[ 28.205053] NVRM: Chipset not recognized (vendor ID 0x1957, device ID 0x8d80)
[ 28.205055] The NVIDIA GPU driver for AArch64 has not been qualified on this platform
and therefore it is not recommended or intended for use in any production
environment.
[ 30.024859] NVRM gpuInitOptimusSettings_IMPL: SBIOS did not acknowledge cfg space owner change
[ 30.053677] NVRM: GPU at PCI:0001:01:00: GPU-7ed4ded5-ab03-6b23-0d18-8ed689f3a43c
[ 30.053683] NVRM: Xid (PCI:0001:01:00): 31, pid=672, name=nvidia-smi, Ch 00000000, intr 00000000. MMU Fault: ENGINE HOST2 HUBCLIENT_ESC faulted @ 0xff_dfff7000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[ 34.169090] NVRM _threadNodeCheckTimeout: _threadNodeCheckTimeout: currentTime: 3d08c20ec7c200 >= 3d08c20ec7c200
[ 34.169094] NVRM _threadNodeCheckTimeout: _threadNodeCheckTimeout: Timeout was set to: 4000 msecs!
[ 34.169101] NVRM scrubberDestruct: Timed out when waiting for the scrub to complete the pending work .
[ 34.416926] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 34.416930] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 34.416945] NVRM nvAssertFailedNoLog: Assertion failed: rmStatus == NV_OK @ osinit.c:1926
[ 35.454071] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 35.454077] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 35.478470] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[ 35.478478] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[ 35.478488] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[ 35.478494] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[ 35.478501] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 35.482315] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[ 35.485417] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
[ 35.708031] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 35.708038] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 35.732499] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[ 35.732508] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[ 35.732518] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[ 35.732523] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[ 35.732530] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 35.736281] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[ 35.739376] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
[ 37.779296] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 37.779302] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 37.803730] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[ 37.803738] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[ 37.803748] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[ 37.803753] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[ 37.803760] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 37.807511] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[ 37.810560] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
[ 38.032031] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 38.032037] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 38.056500] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[ 38.056507] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[ 38.056517] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[ 38.056522] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[ 38.056529] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 38.060293] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[ 38.063350] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
[ 38.976709] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 38.976715] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 39.001214] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[ 39.001223] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[ 39.001233] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[ 39.001238] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[ 39.001245] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 39.004970] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[ 39.008121] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
[ 39.229826] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa
[ 39.229832] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[ 39.254299] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[ 39.254308] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[ 39.254318] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[ 39.254323] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[ 39.254330] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 39.258142] NVRM: GPU 0001:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[ 39.261188] NVRM: GPU 0001:01:00.0: rm_init_adapter failed, device minor number 0
You should also have nvidia-drm.modeset=1
set and make sure to disable the nouveau modules from loading.