ClearFog CX LX2 - SATA bandwidth is "shared" with network, reducing performance

Hello,
I’ve noticed that bandwidth, or perhaps some kind of kernel resource, is shared between the SATA ports and the SFP+ ports.
Running two iperf3 servers for two of the SFP+ ports, two parallel iperf3 clients from a separate ClearFog board gets close to a consistent 9.9Gb/s on each port. However, writing to a SATA SSD (e.g. simply with cp) at the same time decreases performance to about 8-9Gb/s on each SFP+ port. Writing to multiple SATA disks has only a minor additional effect. SATA write performance does not appear to be affected.
There is plenty of CPU to spare (i.e. 1 core for each iperf3 instance, and 1 core for writing to disk).

Is this as expected? Or is there perhaps some kind of kernel setting I can adjust to resolve this issue?

The only physically shared hardware between them are the SMMU and the DDR. Most likely the performance decrease is being caused by interrupt contention on CPU core 0.

From my understanding, interrupts may not be the case of the issue.
Running two instances of iperf3 and a disk-writing benchmark I wrote, there doesn’t appear to be a high load on the ksoftirqd processes.

top - 12:14:33 up 11 min,  4 users,  load average: 13.82, 4.85, 1.98
Tasks: 266 total,  10 running, 256 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.8 us, 42.3 sy,  0.0 ni,  8.0 id, 27.4 wa,  2.2 hi, 11.3 si,  0.0 st
MiB Mem :  29969.5 total,    206.9 free,  12871.8 used,  16890.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  16742.5 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2826 user      20   0   12.3g   9.9g   7988 R 531.2  34.0   4:28.44 DiskWriteBenchm
   1663 user      20   0    5248   2060   1588 R  93.8   0.0   2:47.95 iperf3
   1672 user      20   0    5248   2140   1668 R  93.8   0.0   2:43.90 iperf3
    187 root      20   0       0      0      0 S  50.0   0.0   0:35.18 kswapd0
   2245 root      20   0       0      0      0 R  25.0   0.0   0:09.32 kworker/u32:0+flush-8:0
    154 root      20   0       0      0      0 D  18.8   0.0   0:09.36 kworker/u32:1+flush-8:48
    202 root      20   0       0      0      0 R  18.8   0.0   0:09.59 kworker/u32:4+flush-8:16
    206 root      20   0       0      0      0 D  12.5   0.0   0:09.16 kworker/u32:6+flush-8:32
   2939 user      20   0    6116   2892   2272 R  12.5   0.0   0:00.03 top
     10 root      20   0       0      0      0 R   6.2   0.0   0:00.23 rcu_preempt
     55 root      20   0       0      0      0 S   6.2   0.0   0:02.28 ksoftirqd/9
    327 root      20   0       0      0      0 R   6.2   0.0   0:00.21 kworker/2:2-events
      1 root      20   0  166796   6164   3332 S   0.0   0.0   0:07.15 systemd

Looking at /proc/interrupts, however, ahci-qoriq[3200000.sata], ahci-qoriq[3210000.sata], ahci-qoriq[3220000.sata] and ahci-qoriq[3230000.sata] were experiencing about 500 interrupts/second all on core 0.
The network interrupts (dpio.0 to dpio.15 I believe?) were distributed across all cores.

Regardless the SATA interrupts seemed like a lot, so I installed irqbalance, and confirmed that the SATA interrupts were being rebalanced. However, the issue of reduced network performance when using the SATA disks remained.

Is there anything else I should look at?

Can you test and collect output from mpstat -P ALL 1 this should provide a better per core usage of resources. It is also possible that you are hitting a bandwidth limitation within the CCN of the LX2160a. You could test this by temporarily running your system bus overclocked to 800Mhz, the default is 700.

Running both iperf3 instances along with the disk benchmark, I get the following output from mpstat -P ALL 1. It doesn’t look bottlenecked to me, but I could be missing something.

00:33:53     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:33:54     all    6.56    0.00   42.61   31.71    2.41   11.44    0.00    0.00    0.00    5.28
00:33:54       0    0.00    0.00    1.04    0.00   13.54   66.67    0.00    0.00    0.00   18.75
00:33:54       1    0.98    0.00   89.22    0.00    2.94    3.92    0.00    0.00    0.00    2.94
00:33:54       2    3.16    0.00   54.74   29.47    1.05    2.11    0.00    0.00    0.00    9.47
00:33:54       3    3.16    0.00   38.95   51.58    0.00    6.32    0.00    0.00    0.00    0.00
00:33:54       4    6.74    0.00   42.70   38.20    2.25    4.49    0.00    0.00    0.00    5.62
00:33:54       5   12.22    0.00   40.00   40.00    1.11    5.56    0.00    0.00    0.00    1.11
00:33:54       6    7.78    0.00   38.89   45.56    0.00    0.00    0.00    0.00    0.00    7.78
00:33:54       7    7.61    0.00   32.61   58.70    0.00    0.00    0.00    0.00    0.00    1.09
00:33:54       8    9.89    0.00   43.96   41.76    1.10    0.00    0.00    0.00    0.00    3.30
00:33:54       9    7.78    0.00   41.11   48.89    1.11    1.11    0.00    0.00    0.00    0.00
00:33:54      10    3.30    0.00   46.15   41.76    0.00    8.79    0.00    0.00    0.00    0.00
00:33:54      11    5.32    0.00   51.06   32.98    0.00   10.64    0.00    0.00    0.00    0.00
00:33:54      12    6.32    0.00   57.89   29.47    2.11    1.05    0.00    0.00    0.00    3.16
00:33:54      13   27.96    0.00   23.66   39.78    1.08    0.00    0.00    0.00    0.00    7.53
00:33:54      14    1.04    0.00    3.12    0.00   10.42   65.62    0.00    0.00    0.00   19.79
00:33:54      15    3.12    0.00   72.92   16.67    1.04    3.12    0.00    0.00    0.00    3.12

00:33:54     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:33:55     all    7.08    0.00   41.35   31.45    2.29   11.27    0.00    0.00    0.00    6.55
00:33:55       0    0.00    0.00    7.29    4.17   11.46   63.54    0.00    0.00    0.00   13.54
00:33:55       1    3.09    0.00   62.89   27.84    1.03    3.09    0.00    0.00    0.00    2.06
00:33:55       2    4.17    0.00   46.88   23.96    0.00    1.04    0.00    0.00    0.00   23.96
00:33:55       3    6.38    0.00   37.23   45.74    0.00    3.19    0.00    0.00    0.00    7.45
00:33:55       4    7.45    0.00   46.81   39.36    0.00    6.38    0.00    0.00    0.00    0.00
00:33:55       5    5.38    0.00   60.22   26.88    1.08    6.45    0.00    0.00    0.00    0.00
00:33:55       6    9.28    0.00   32.99   50.52    0.00    5.15    0.00    0.00    0.00    2.06
00:33:55       7    6.38    0.00   40.43   47.87    1.06    3.19    0.00    0.00    0.00    1.06
00:33:55       8   16.33    0.00   37.76   28.57    3.06    1.02    0.00    0.00    0.00   13.27
00:33:55       9    7.45    0.00   32.98   52.13    0.00    1.06    0.00    0.00    0.00    6.38
00:33:55      10    9.38    0.00   48.96   32.29    1.04    6.25    0.00    0.00    0.00    2.08
00:33:55      11    8.51    0.00   38.30   41.49    1.06    9.57    0.00    0.00    0.00    1.06
00:33:55      12   20.21    0.00   38.30   31.91    1.06    0.00    0.00    0.00    0.00    8.51
00:33:55      13    8.60    0.00   36.56   53.76    0.00    0.00    0.00    0.00    0.00    1.08
00:33:55      14    0.00    0.00    2.08    0.00   12.50   66.67    0.00    0.00    0.00   18.75
00:33:55      15    1.00    0.00   90.00    0.00    3.00    3.00    0.00    0.00    0.00    3.00

00:33:55     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
00:33:56     all    7.35    0.00   42.36   31.17    2.25   10.99    0.00    0.00    0.00    5.89
00:33:56       0    0.00    0.00    4.17    1.04   12.50   63.54    0.00    0.00    0.00   18.75
00:33:56       1    7.69    0.00   53.85   35.16    1.10    1.10    0.00    0.00    0.00    1.10
00:33:56       2   37.37    0.00   20.20   25.25    1.01    1.01    0.00    0.00    0.00   15.15
00:33:56       3    7.69    0.00   39.56   49.45    1.10    2.20    0.00    0.00    0.00    0.00
00:33:56       4    8.42    0.00   52.63   28.42    1.05    7.37    0.00    0.00    0.00    2.11
00:33:56       5    5.43    0.00   45.65   43.48    0.00    3.26    0.00    0.00    0.00    2.17
00:33:56       6    5.38    0.00   49.46   35.48    1.08    5.38    0.00    0.00    0.00    3.23
00:33:56       7    5.26    0.00   51.58   33.68    0.00    4.21    0.00    0.00    0.00    5.26
00:33:56       8    6.32    0.00   49.47   38.95    1.05    1.05    0.00    0.00    0.00    3.16
00:33:56       9    5.26    0.00   40.00   49.47    0.00    4.21    0.00    0.00    0.00    1.05
00:33:56      10    4.12    0.00   67.01   22.68    1.03    5.15    0.00    0.00    0.00    0.00
00:33:56      11    6.38    0.00   40.43   44.68    0.00    5.32    0.00    0.00    0.00    3.19
00:33:56      12    4.12    0.00   64.95   11.34    4.12    2.06    0.00    0.00    0.00   13.40
00:33:56      13    5.32    0.00   45.74   38.30    1.06    1.06    0.00    0.00    0.00    8.51
00:33:56      14    1.03    0.00    7.22    1.03   10.31   64.95    0.00    0.00    0.00   15.46
00:33:56      15    6.67    0.00   47.78   44.44    0.00    1.11    0.00    0.00    0.00    0.00

How do I overclock the system bus?

Looking at mpstat, I don’t believe you are hitting a hardware limitation but instead a kernel limitation. Between that and your top output I see you have many of the cores sitting around in iowait. This is most likely due to the kernel using the multi-queue accesses for the drive accesses. You also have very limited memory available as everything has been allocated to buffer/cache…you can also see this in the CPU utilization with kswapd0 and the kworkers.

Overall it looks like you just need to tune your kernel configuration for this specific workload.

Running the iperf3 instances alone does not appear to affect iowait or buffer/cache, but I can imagine how their utilization being high could put strain on the kernel and affect networking.

Would a high buffer/cache value not indicate sufficient memory availability, given that Linux can drop it from the cache at a moment’s notice if another process needs it? Or perhaps I’m misunderstandings things, and the large size of buffer/cache means that there’s some kind of memory pressure.

Which kernel settings should I look at for this workload?

the buffers are most likely being created by your disk benchmark. The kernel can unmap and remap them but that adds overhead to the network stack which could limit the throughput. This is something the Google’s MGLRU that was merged in the 6.1 kernel could help with. You can also try different IO policy schedulers for your drives to see if it makes a difference.

I’ve installed 64GB of RAM, but there is no change in behavior. buff/cache seems to just consume as much memory as it possibly can.
Changing IO policy schedulers also had no noticeable impact. Do you have any other suggestions for how this may be worked around?
Here is the output of mpstat -P ALL 1:

Linux 5.4.47-00007-g9c7b74fdbb19 (clearfog1)    11/15/22        _aarch64_       (16 CPU)

12:45:41     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:45:42     all   10.22    0.00   26.12    6.37    2.21    8.45    0.00    0.00    0.00   46.62
12:45:42       0    9.00    0.00   28.00   16.00    2.00    5.00    0.00    0.00    0.00   40.00
12:45:42       1    8.00    0.00   35.00   23.00    1.00    0.00    0.00    0.00    0.00   33.00
12:45:42       2    7.14    0.00   32.65    0.00    1.02    0.00    0.00    0.00    0.00   59.18
12:45:42       3    7.22    0.00   36.08    0.00    1.03    0.00    0.00    0.00    0.00   55.67
12:45:42       4    3.96    0.00   23.76   11.88    0.99    1.98    0.00    0.00    0.00   57.43
12:45:42       5    7.84    0.00   39.22   16.67    0.98    3.92    0.00    0.00    0.00   31.37
12:45:42       6   25.77    0.00   14.43    0.00    1.03    0.00    0.00    0.00    0.00   58.76
12:45:42       7   18.37    0.00   40.82    0.00    1.02    1.02    0.00    0.00    0.00   38.78
12:45:42       8   17.35    0.00   17.35   19.39    3.06    4.08    0.00    0.00    0.00   38.78
12:45:42       9   28.43    0.00   18.63    3.92    0.98    2.94    0.00    0.00    0.00   45.10
12:45:42      10    7.92    0.00   40.59    0.00    0.99    0.00    0.00    0.00    0.00   50.50
12:45:42      11   10.00    0.00   35.00    0.00    0.00    0.00    0.00    0.00    0.00   55.00
12:45:42      12    1.04    0.00    4.17    0.00   10.42   58.33    0.00    0.00    0.00   26.04
12:45:42      13    1.05    0.00    3.16    0.00   10.53   58.95    0.00    0.00    0.00   26.32
12:45:42      14    5.94    0.00   27.72    4.95    0.99    0.00    0.00    0.00    0.00   60.40
12:45:42      15    4.04    0.00   19.19    5.05    0.00    3.03    0.00    0.00    0.00   68.69

12:45:42     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:45:43     all   12.59    0.00   24.86    5.40    1.53    8.52    0.00    0.00    0.00   47.11
12:45:43       0    9.09    0.00   25.25   20.20    1.01    5.05    0.00    0.00    0.00   39.39
12:45:43       1   11.11    0.00   39.39    0.00    0.00    1.01    0.00    0.00    0.00   48.48
12:45:43       2   31.31    0.00   17.17    0.00    0.00    1.01    0.00    0.00    0.00   50.51
12:45:43       3   15.00    0.00   26.00    0.00    0.00    1.00    0.00    0.00    0.00   58.00
12:45:43       4   19.19    0.00   21.21    0.00    0.00    0.00    0.00    0.00    0.00   59.60
12:45:43       5   30.93    0.00   18.56    9.28    0.00    0.00    0.00    0.00    0.00   41.24
12:45:43       6    5.15    0.00   16.49    0.00    0.00    1.03    0.00    0.00    0.00   77.32
12:45:43       7   20.20    0.00   31.31   14.14    0.00    2.02    0.00    0.00    0.00   32.32
12:45:43       8    7.14    0.00   44.90    0.00    0.00    0.00    0.00    0.00    0.00   47.96
12:45:43       9    5.00    0.00   12.00    6.00    0.00    2.00    0.00    0.00    0.00   75.00
12:45:43      10   16.33    0.00   28.57   17.35    0.00    5.10    0.00    0.00    0.00   32.65
12:45:43      11   14.14    0.00   51.52   11.11    1.01    1.01    0.00    0.00    0.00   21.21
12:45:43      12    0.00    0.00    5.26    1.05   11.58   60.00    0.00    0.00    0.00   22.11
12:45:43      13    0.00    0.00    4.17    0.00   11.46   58.33    0.00    0.00    0.00   26.04
12:45:43      14   13.27    0.00   33.67    7.14    0.00    2.04    0.00    0.00    0.00   43.88
12:45:43      15    3.00    0.00   21.00    0.00    0.00    0.00    0.00    0.00    0.00   76.00

12:45:43     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:45:44     all   10.54    0.00   23.24    3.43    2.03    8.00    0.00    0.00    0.00   52.76
12:45:44       0    7.07    0.00   36.36    0.00    6.06    3.03    0.00    0.00    0.00   47.47
12:45:44       1    2.02    0.00   11.11   19.19    1.01    0.00    0.00    0.00    0.00   66.67
12:45:44       2   30.61    0.00   17.35    0.00    0.00    0.00    0.00    0.00    0.00   52.04
12:45:44       3   15.15    0.00   35.35    0.00    1.01    0.00    0.00    0.00    0.00   48.48
12:45:44       4   11.46    0.00   18.75    0.00    0.00    0.00    0.00    0.00    0.00   69.79
12:45:44       5   10.89    0.00   20.79    0.00    0.99    0.00    0.00    0.00    0.00   67.33
12:45:44       6   10.10    0.00   21.21    0.00    1.01    0.00    0.00    0.00    0.00   67.68
12:45:44       7    5.94    0.00   41.58    0.00    0.00    0.99    0.00    0.00    0.00   51.49
12:45:44       8   12.12    0.00   39.39    0.00    1.01    1.01    0.00    0.00    0.00   46.46
12:45:44       9    8.16    0.00   24.49    0.00    1.02    0.00    0.00    0.00    0.00   66.33
12:45:44      10   11.34    0.00   21.65   30.93    0.00    6.19    0.00    0.00    0.00   29.90
12:45:44      11   17.00    0.00   28.00    5.00    0.00    4.00    0.00    0.00    0.00   46.00
12:45:44      12    2.06    0.00    6.19    0.00    9.28   55.67    0.00    0.00    0.00   26.80
12:45:44      13    1.04    0.00    8.33    0.00   10.42   58.33    0.00    0.00    0.00   21.88
12:45:44      14   18.56    0.00   31.96    0.00    0.00    0.00    0.00    0.00    0.00   49.48
12:45:44      15    5.05    0.00    8.08    0.00    1.01    1.01    0.00    0.00    0.00   84.85

And here is the output to top:

top - 12:45:05 up 18 min,  3 users,  load average: 7.01, 3.46, 1.42
Tasks: 266 total,   2 running, 264 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11.9 us, 19.9 sy,  0.0 ni, 34.6 id, 22.4 wa,  1.4 hi,  9.8 si,  0.0 st
MiB Mem :  62113.3 total,    384.9 free,   8810.2 used,  52918.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  52694.5 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   3823 user      20   0 5882772   3.0g  20252 S 218.8   4.9   7:11.86 NetAndDiskTest
   3822 user      20   0 5890968   3.1g  20108 S 187.5   5.1   8:25.19 NetAndDiskTest
    154 root      20   0       0      0      0 D  25.0   0.0   0:08.48 kworker/u32:1+flush-8:32
   5114 root      20   0       0      0      0 D  25.0   0.0   0:10.15 kworker/u32:0+flush-8:16
    202 root      20   0       0      0      0 D  18.8   0.0   0:06.67 kworker/u32:4+flush-8:48
    206 root      20   0       0      0      0 D  18.8   0.0   0:12.55 kworker/u32:6+flush-8:0
     70 root      20   0       0      0      0 R  12.5   0.0   0:10.08 ksoftirqd/12
     75 root      20   0       0      0      0 S   6.2   0.0   0:09.88 ksoftirqd/13
    805 root      20   0       0      0      0 S   6.2   0.0   0:00.06 jbd2/sdd1-8
   5838 user      20   0    6116   2820   2256 R   6.2   0.0   0:00.03 top
      1 root      20   0  168840  10128   7232 S   0.0   0.0   0:14.80 systemd

@jnettlet Sorry for the bump, but do you have any further theories as to what the issue could be, considering that 64GB of memory did not help?

I didn’t expect more memory to solve the issue. The issue isn’t a lack of memory but the overhead in linux of processing the memory, This is what Google has work on with the MGLRU patch.

Generally you are going to need to profile the kernel and see what is bottlenecking the system. I would also recommend testing with another filesystem benchmark like fio or stress-ng that allows more customizable workloads and see if that makes a difference.

Intel once had a kernel patch + userland tool called “latencytop”, I wonder if they can be ported to ARM. And they’d need to have been maintained too (8+ years old IIRC).