Hello @jnettlet and @YazanShhadySR,
Thank you for your answers. I have some news and more questions about the topic as we did a lot of tests since.
We tried with NXP kernel 5.4 from codeaurora. The problem apparently disappeared on our platform (which embed the SolidRun SOM): no more U-Boot and freeze when exiting suspend. This is geat news.
However, 2 other issues, which were initially very rare are now very common. Consequences are the same : randomly, the i.MX won’t wake up well from a suspend-to-ram mode. It will stay in an “idle” mode as nothing can be done to interact with the system, except disconnect power and reconnect it to make it boot again.
Case n°1 : error -84
[03:10:49.768217 1.105460] [ 8484.359835] PM: suspend entry (deep)
[03:10:49.773508 0.005295] [ 8484.363646] Filesystems sync: 0.000 seconds
[03:10:49.775322 0.001815] [ 8484.368561] Freezing user space processes … (elapsed 0.001 seconds) done.
[03:10:49.783931 0.008608] [ 8484.377170] OOM killer disabled.
[03:10:49.786007 0.002077] [ 8484.380455] Freezing remaining freezable tasks … (elapsed 0.001 seconds) done.
[03:10:49.790556 0.004549] [ 8484.389204] printk: Suspending console(s) (use no_console_suspend to debug)
[03:14:43.134265 233.343704] [ 8484.423579] fec 2188000.ethernet eth0: Link is Down
[03:14:43.138717 0.004456] [ 8484.427205] Disabling non-boot CPUs …
[03:14:43.140497 0.001781] [ 8484.428654] Enabling non-boot CPUs …
[03:14:43.149932 0.009434] [ 8484.429470] CPU1 is up
[03:14:43.151450 0.001519] [ 8484.461285] OOM killer enabled.
[03:14:43.153331 0.001881] [ 8484.464455] Restarting tasks … done.
[03:14:43.155550 0.002219] [ 8484.470775] PM: suspend exit
[03:14:43.534233 0.378677] [ 8484.845584] mmc1: tuning execution failed: -84
[03:14:43.538275 0.004047] [ 8484.850105] mmc1: error -84 doing runtime resume
[03:14:43.540420 0.002146] [ 8484.856967] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:315: I/O error 10 writing to inode 134822 (offset 0 size 4096 starting block 627296)
[03:14:43.557243 0.016821] [ 8484.870143] Buffer I/O error on device mmcblk1p1, logical block 624223
[03:14:43.567174 0.009930] [ 8484.878993] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:315: I/O error 10 writing to inode 134825 (offset 0 size 4096 starting block 722997)
[03:14:43.583654 0.016478] [ 8484.879466] mmc1: card 59b4 removed
[03:14:43.587762 0.004111] [ 8484.892161] Buffer I/O error on device mmcblk1p1, logical block 719924
[03:14:43.595249 0.007488] [ 8484.902267] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:315: I/O error 10 writing to inode 8570 (offset 0 size 0 starting block 1790998)
[03:14:43.606834 0.011585] [ 8484.915050] Buffer I/O error on device mmcblk1p1, logical block 1787925
[03:14:43.610353 0.003520] [ 8484.921825] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:315: I/O error 10 writing to inode 8518 (offset 0 size 0 starting block 1270168)
[03:14:43.618609 0.008255] [ 8484.934619] Buffer I/O error on device mmcblk1p1, logical block 1267095
[03:14:43.630478 0.011868] [ 8484.941356] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:315: I/O error 10 writing to inode 8518 (offset 3768320 size 4096 starting block 1270169)
[03:14:43.646438 0.015960] [ 8484.954935] Buffer I/O error on device mmcblk1p1, logical block 1267096
[03:14:43.648368 0.001932] [ 8484.961674] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:315: I/O error 10 writing to inode 8528 (offset 0 size 0 starting block 1191875)
[03:14:43.663168 0.014795] [ 8484.974461] Buffer I/O error on device mmcblk1p1, logical block 1188802
[03:14:43.665632 0.002468] [ 8484.981210] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:315: I/O error 10 writing to inode 8528 (offset 3944448 size 4096 starting block 1191876)
[03:14:43.680338 0.014706] [ 8484.994778] Buffer I/O error on device mmcblk1p1, logical block 1188803
[03:14:43.682265 0.001928] [ 8485.001989] JBD2: Detected IO errors while flushing file data on mmcblk1p1-8
[03:14:43.696030 0.013763] [ 8485.010307] Aborting journal on device mmcblk1p1-8.
[03:14:43.697559 0.001531] [ 8485.015418] JBD2: Error -5 detected when updating journal superblock for mmcblk1p1-8.
[03:14:43.711773 0.014212] [ 8485.019025] EXT4-fs (mmcblk1p1): I/O error while writing superblock
[03:14:43.713703 0.001932] [ 8485.029683] EXT4-fs error (device mmcblk1p1): ext4_journal_check_start:61: Detected aborted journal
[03:14:43.727718 0.014005] [ 8485.038803] EXT4-fs (mmcblk1p1): Remounting filesystem read-only
[03:14:43.742761 0.015052] [ 8485.065542] EXT4-fs (mmcblk1p1): I/O error while writing superblock
[03:14:44.142696 0.399930] [ 8485.462667] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm systemd-udevd: reading directory lblock 0
[03:14:44.162843 0.020150] [ 8485.474523] EXT4-fs (mmcblk1p1): I/O error while writing superblock
[03:14:44.169231 0.006389] [ 8485.481023] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8098: comm sysnav-pr8011-m: reading directory lblock 0
[03:14:44.185245 0.016012] [ 8485.481665] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #5694: comm systemd-udevd: reading directory lblock 0
[03:14:44.195698 0.010455] [ 8485.502718] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #1907: comm kworker/u4:2: reading directory lblock 0
[03:14:44.206260 0.010563] [ 8485.516637] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm systemd-udevd: reading directory lblock 0
[03:14:44.210681 0.004417] [ 8485.528586] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm systemd-udevd: reading directory lblock 0
[03:14:44.224456 0.013780] [ 8485.541318] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #158: comm systemd-udevd: reading directory lblock 0
[03:14:44.239494 0.015036] [ 8485.553158] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #1907: comm systemd-udevd: reading directory lblock 0
[03:14:44.254443 0.014951] [ 8485.565565] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8098: comm systemd-udevd: reading directory lblock 0
[03:14:44.257817 0.003374] [ 8485.577924] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm systemd-udevd: reading directory lblock 0
[03:14:44.301742 0.043922] [ 8485.623240] mmc1: host does not support reading read-only switch, assuming write-enable
[03:14:44.479202 0.177457] [ 8485.799751] mmc1: new ultra high speed SDR104 SDXC card at address 59b4
[03:14:44.498736 0.019536] [ 8485.809512] mmcblk1: mmc1:59b4 ED2S5 119 GiB
[03:14:44.509619 0.010886] [ 8485.818745] mmcblk1: p1
[03:14:44.750141 0.240517] [ 8486.066056] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:44.768298 0.018160] [ 8486.078438] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:44.781667 0.013369] [ 8486.090868] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:46.207127 1.425457] [ 8487.515586] fec 2188000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
[03:14:48.751008 2.543880] [ 8490.066028] EXT4-fs warning: 290 callbacks suppressed
[03:14:48.758837 0.007835] [ 8490.066043] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.770059 0.011222] [ 8490.083858] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.787349 0.017290] [ 8490.096472] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.798209 0.010860] [ 8490.108726] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.813594 0.015385] [ 8490.122140] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.817005 0.003411] [ 8490.134378] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.832828 0.015822] [ 8490.147012] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.847179 0.014352] [ 8490.159386] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.862323 0.015143] [ 8490.171781] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:48.868357 0.006034] [ 8490.184132] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:49.245687 0.377327] [ 8490.559277] EXT4-fs error: 623 callbacks suppressed
[03:14:49.251395 0.005712] [ 8490.559289] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #132307: comm redis-server: reading directory lblock 0
[03:14:49.758943 0.507544] [ 8491.066550] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm python3: reading directory lblock 0
[03:14:50.750966 0.992022] [ 8492.067228] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm python3: reading directory lblock 0
[03:14:50.877846 0.126881] [ 8492.196600] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:50.900332 0.022488] [ 8492.214039] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:50.918995 0.018662] [ 8492.229741] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:50.933549 0.014551] [ 8492.245426] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:50.945258 0.011716] [ 8492.261175] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:50.960112 0.014851] [ 8492.276780] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:50.976722 0.016612] [ 8492.292440] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:53.757771 2.781044] [ 8495.071244] EXT4-fs warning: 340 callbacks suppressed
[03:14:53.766294 0.008527] [ 8495.071258] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.776383 0.010089] [ 8495.088695] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.791296 0.014914] [ 8495.101173] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.799384 0.008087] [ 8495.113426] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.813276 0.013892] [ 8495.126699] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.827488 0.014212] [ 8495.138958] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.840785 0.013298] [ 8495.151337] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.854485 0.013699] [ 8495.163609] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.862683 0.008199] [ 8495.175954] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:53.876890 0.014205] [ 8495.188297] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:54.750019 0.873126] [ 8496.070112] EXT4-fs error: 222 callbacks suppressed
[03:14:54.767738 0.017723] [ 8496.070126] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm python3: reading directory lblock 0
[03:14:54.894596 0.126856] [ 8496.205677] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:54.909272 0.014677] [ 8496.222906] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:54.926467 0.017195] [ 8496.238667] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:54.942645 0.016177] [ 8496.254409] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:54.957815 0.015172] [ 8496.268656] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm systemd-journal: reading directory lblock 0
[03:14:54.973249 0.015434] [ 8496.281561] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm systemd-journal: reading directory lblock 0
[03:14:54.979630 0.006377] [ 8496.293561] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:54.993227 0.013601] [ 8496.305548] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm systemd-journal: reading directory lblock 0
[03:14:55.007274 0.014048] [ 8496.317532] EXT4-fs error (device mmcblk1p1): __ext4_find_entry:1532: inode #8100: comm sysnav-pr8011-r: reading directory lblock 0
[03:14:58.765976 3.758698] [ 8500.074233] EXT4-fs warning: 411 callbacks suppressed
[03:14:58.774299 0.008327] [ 8500.074247] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #155: lblock 0: comm bash: error -5 reading directory block
[03:14:58.781857 0.007559] [ 8500.078198] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #130615: lblock 0: comm python3: error -5 reading directory block
[03:14:58.787444 0.005585] [ 8500.089259] EXT4-fs warning (device mmcblk1p1): dx_probe:761: inode #155: lblock 0: comm bash: error -5 reading directory block
[03:14:58.802013 0.014570] [ 8500.092576] EXT4-fs warning (device mmcblk1p1): dx_probe:761:
And so on…
Case n°2 : Freeze at PM suspend exit
[11:38:02.865070 0.001584] [55577.217222] CPU1 is up
[11:38:02.866361 0.001287] [55577.249707] OOM killer enabled.
[11:38:02.875042 0.008682] [55577.252879] Restarting tasks ... done.
[11:38:02.889130 0.014085] [55577.268599] PM: suspend exit
And then nothing more.
Some explanations :
Case n°1 : the system declares error -84 then outputs continuously error messages, as you can read.
Case n°2 : the system seems to roll the suspend exit sequence normaly, but freezes after writing the line “PM: suspend exit”.
Frequency of appearance :
On our systems under tests (operating a deep sleep entry and exit every 3 minutes for prompting the issue as soon as possible), any of the two cases might occur (apparently in a random fashion).
We might have to wait 30 minutes or 60 hours as well for the system to crash (case 1 or 2). As I said, pretty random.
Strategy we would like to implement :
- Before working on the source of the problem, we would like to implement a watchdog in the OS. This watchdog, by resuming early after deep sleep exit, will reboot the system in case n°2, and case n°1 too (in case n°1, we detect error-84 and prompt a kernel panic. Then, reboot is operated by the watchdog).
- Once we ensure by these means that the system will not stay stuck when case 1 and case 2 crashes occur, we will tackle the issue source
Questions about our platform and the Hummingboard
- We implemented watchdog and the kernel panic prompting. However the watchdog does not reboot with the SOM mounted on our platform. With the same SOM mounted on the hummingboard, the watchdog operates the reboot. The SOM implementation datasheet is fully respected on our platform. Is there any rule of thumb implemented in hummingboard for operating the SOM that is not specified in datasheets ?
- We remarked some SD cards (where OS images are flashed) will enable the watchdog to boot on our platform, and some won’t. On the hummingboard, all SD seems to work. As the SOM implementation, the SD implementation datasheets are fully respected by our platform design. Do you know why we encounter such surprises ?
- Regarding the SD card. We performed eye diagram drawing on one of the databus of the SD card. On our platform, the diagram is not very clean (lines are blurry). However, it is the same with Hummingboard. I’m affraid we may be at an edge of a malfunction which is without consequences on humminboard, but impacts our platform. What are your thoughts about it ?
I can provide more information if necessary.
I’m afraid this topic is not yet solved.
Thanks a lot for the help,
Frank