SFP Optics module debugging

Hello,
I have purchased an optics module from fs.com. I am seeing the link “flap” when the system is simply running. I am not seeing the issues on the other side of the link using the same optic. I am wondering if this module being dual rate 1 and 10 Gbps, if that could be causing the problem? Other than that, it seems to match the modules listed here
Is there any advice for debugging an SFP module?

Can you please provide logs?

The sfp module is shown in the dmesg output. The device tree file used is from the solid-run github. I am wondering if there are updates to the mvpp2 driver that would stop this. I do not see this behaviour on the clearfog GT8K I have at the other end of the link.

root@OpenWrt:/# dmesg
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] Linux version 5.15.79 (lblyth@build) (aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r20153-22ffbbf04a) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Tue Nov 22 21:33:53 2022
[    0.000000] Machine model: SolidRun CN9130 based SOM Clearfog Pro
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000013fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000]   node   0: [mem 0x0000000004000000-0x00000000041fffff]
[    0.000000]   node   0: [mem 0x0000000004200000-0x00000000bfffffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x000000013fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000013fffffff]
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.2
[    0.000000] percpu: Embedded 25 pages/cpu s64984 r8192 d29224 u102400
[    0.000000] pcpu-alloc: s64984 r8192 d29224 u102400 alloc=25*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: Spectre-v2
[    0.000000] CPU features: detected: Spectre-BHB
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1032192
[    0.000000] Kernel command line: root=/dev/mmcblk1p2 rootwait
[    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] software IO TLB: mapped [mem 0x00000000bc000000-0x00000000c0000000] (64MB)
[    0.000000] Memory: 4039848K/4194304K available (9088K kernel code, 1366K rwdata, 2452K rodata, 512K init, 287K bss, 154456K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] trace event string verifier disabled
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32)
[    0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32)
[    0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32)
[    0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32)
[    0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287]
[    0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns
[    0.000000] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns
[    0.000087] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.00 BogoMIPS (lpj=250000)
[    0.000092] pid_max: default: 32768 minimum: 301
[    0.000158] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.000176] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.000738] rcu: Hierarchical SRCU implementation.
[    0.000870] dyndbg: Ignore empty _ddebug table in a CONFIG_DYNAMIC_DEBUG_CORE build
[    0.001011] smp: Bringing up secondary CPUs ...
[    0.001406] Detected PIPT I-cache on CPU1
[    0.001439] CPU1: Booted secondary processor 0x0000000001 [0x410fd083]
[    0.001857] Detected PIPT I-cache on CPU2
[    0.001880] CPU2: Booted secondary processor 0x0000000100 [0x410fd083]
[    0.002287] Detected PIPT I-cache on CPU3
[    0.002303] CPU3: Booted secondary processor 0x0000000101 [0x410fd083]
[    0.002335] smp: Brought up 1 node, 4 CPUs
[    0.002346] SMP: Total of 4 processors activated.
[    0.002349] CPU features: detected: 32-bit EL0 Support
[    0.002352] CPU features: detected: CRC32 instructions
[    0.002376] CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
[    0.002380] CPU: All CPU(s) started at EL2
[    0.002393] alternatives: patching kernel code
[    0.004073] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.004084] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    0.004134] pinctrl core: initialized pinctrl subsystem
[    0.004420] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.004634] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
[    0.004714] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.004787] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.004893] thermal_sys: Registered thermal governor 'step_wise'
[    0.005145] cpuidle: using governor ladder
[    0.005169] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.005200] ASID allocator initialised with 65536 entries
[    0.010548] cryptd: max_cpu_qlen set to 1000
[    0.011522] ap0_sd_vccq: Bringing 3300000uV into 1800000-1800000uV
[    0.011689] SCSI subsystem initialized
[    0.011752] libata version 3.00 loaded.
[    0.011803] usbcore: registered new interface driver usbfs
[    0.011816] usbcore: registered new interface driver hub
[    0.011829] usbcore: registered new device driver usb
[    0.011919] usb_phy_generic cp0_usb3_phy@0: dummy supplies not allowed for exclusive requests
[    0.012007] usb_phy_generic cp0_usb3_phy@1: dummy supplies not allowed for exclusive requests
[    0.012436] clocksource: Switched to clocksource arch_sys_counter
[    0.020112] NET: Registered PF_INET protocol family
[    0.020265] IP idents hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.021115] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[    0.021142] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.021148] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.021253] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[    0.021459] TCP: Hash tables configured (established 32768 bind 32768)
[    0.021532] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.021576] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.021675] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.021690] PCI: CLS 0 bytes, default 64
[    0.033938] workingset: timestamp_bits=46 max_order=20 bucket_order=0
[    0.035331] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.035334] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[    0.037042] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver
[    0.037340] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver
[    0.037727] gpio-59 (phy_reset): hogged as output/high
[    0.039046] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver
[    0.039323] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver
[    0.039606] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver
[    0.039875] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver
[    0.040189] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver
[    0.040462] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver
[    0.040563] Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
[    0.040818] printk: console [ttyS0] disabled
[    0.060949] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 16, base_baud = 12500000) is a 16550A
[    0.788683] printk: console [ttyS0] enabled
[    0.793226] omap_rng f2760000.trng: Random Number Generator ver. 203b34c
[    0.793548] random: crng init done
[    0.804441] loop: module loaded
[    0.807966] ahci f2540000.sata: supply ahci not found, using dummy regulator
[    0.815101] ahci f2540000.sata: supply phy not found, using dummy regulator
[    0.822191] platform f2540000.sata:sata-port@0: supply target not found, using dummy regulator
[    0.830936] platform f2540000.sata:sata-port@1: supply target not found, using dummy regulator
[    0.839669] ahci f2540000.sata: masking port_map 0x3 -> 0x3
[    0.845294] ahci f2540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode
[    0.853859] ahci f2540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
[    0.862336] scsi host0: ahci
[    0.865393] scsi host1: ahci
[    0.868331] ata1: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x100 irq 34
[    0.876287] ata2: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x180 irq 34
[    0.885753] spi-nor spi2.0: s25fl064k (8192 Kbytes)
[    0.891052] spi-nor spi2.1: unrecognized JEDEC id bytes: ff ff ff ff ff ff
[    0.897969] spi-nor: probe of spi2.1 failed with error -2
[    0.904582] hwmon hwmon0: temp1_input not attached to any thermal zone
[    0.911552] mv88e6085 f212a200.mdio-mii:04: switch 0x1760 detected: Marvell 88E6176, revision 1
[    1.151844] mvpp2 f2000000.ethernet: using 8 per-cpu buffers
[    1.162928] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.169483] ehci-platform: EHCI generic platform driver
[    1.174791] ehci-orion: EHCI orion driver
[    1.178989] xhci-hcd f2500000.usb: xHCI Host Controller
[    1.184248] xhci-hcd f2500000.usb: new USB bus registered, assigned bus number 1
[    1.191718] xhci-hcd f2500000.usb: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    1.200915] xhci-hcd f2500000.usb: irq 35, io mem 0xf2500000
[    1.206651] xhci-hcd f2500000.usb: xHCI Host Controller
[    1.211898] xhci-hcd f2500000.usb: new USB bus registered, assigned bus number 2
[    1.219329] xhci-hcd f2500000.usb: Host supports USB 3.0 SuperSpeed
[    1.223682] ata2: SATA link down (SStatus 0 SControl 300)
[    1.225777] hub 1-0:1.0: USB hub found
[    1.231056] ata1: SATA link down (SStatus 0 SControl 300)
[    1.234818] hub 1-0:1.0: 1 port detected
[    1.244284] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.252524] hub 2-0:1.0: USB hub found
[    1.256295] hub 2-0:1.0: 1 port detected
[    1.260425] usbcore: registered new interface driver usb-storage
[    1.266636] armada38x-rtc f2284000.rtc: registered as rtc0
[    1.272152] armada38x-rtc f2284000.rtc: setting system clock to 2030-05-20T15:00:03 UTC (1905519603)
[    1.281391] i2c_dev: i2c /dev entries driver
[    1.285999] pca953x 0-0020: supply vcc not found, using dummy regulator
[    1.292696] pca953x 0-0020: using no AI
[    1.297900] gpio-496 (pcie1.0-clkreq): hogged as input
[    1.304098] gpio-499 (pcie1.0-w-disable): hogged as output/low
[    1.310258] gpio-500 (pcie2.0-clkreq): hogged as input
[    1.316033] gpio-503 (pcie2.0-w-disable): hogged as output/low
[    1.322191] gpio-501 (usb3-current-limit): hogged as input
[    1.328314] gpio-502 (usb3-power): hogged as output/high
[    1.335100] gpio-507 (m.2 devslp): hogged as output/low
[    1.353743] sdhci: Secure Digital Host Controller Interface driver
[    1.359952] sdhci: Copyright(c) Pierre Ossman
[    1.364374] sdhci-pltfm: SDHCI platform and OF driver helper
[    1.370585] NET: Registered PF_INET6 protocol family
[    1.375881] Segment Routing with IPv6
[    1.379565] In-situ OAM (IOAM) with IPv6
[    1.383531] NET: Registered PF_PACKET protocol family
[    1.388661] 8021q: 802.1Q VLAN Support v1.8
[    1.394558] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available
[    1.401174] mmc0: SDHCI controller on f06e0000.mmc [f06e0000.mmc] using ADMA 64-bit
[    1.403272] gpio-59 (phy_reset): hogged as output/high
[    1.416185] armada8k-pcie f2620000.pcie: host bridge /cp0/pcie@f2620000 ranges:
[    1.423548] armada8k-pcie f2620000.pcie:      MEM 0x00e0000000..0x00e0efffff -> 0x00e0000000
[    1.432048] armada8k-pcie f2620000.pcie: iATU unroll: disabled
[    1.437907] armada8k-pcie f2620000.pcie: Detected iATU regions: 8 outbound, 8 inbound
[    1.494450] mmc0: new HS200 MMC card at address 0001
[    1.499691] mmcblk0: mmc0:0001 8GTF4R 7.28 GiB 
[    1.504740] mmcblk0boot0: mmc0:0001 8GTF4R 4.00 MiB 
[    1.509942] mmcblk0boot1: mmc0:0001 8GTF4R 4.00 MiB 
[    1.515038] mmcblk0rpmb: mmc0:0001 8GTF4R 512 KiB, chardev (248:0)
[    2.442397] armada8k-pcie f2620000.pcie: Phy link never came up
[    2.448392] armada8k-pcie f2620000.pcie: PCI host bridge to bus 0000:00
[    2.455039] pci_bus 0000:00: root bus resource [bus 00-ff]
[    2.460548] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe0efffff]
[    2.467469] pci 0000:00:00.0: [11ab:0110] type 01 class 0x060400
[    2.473509] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
[    2.479840] pci 0000:00:00.0: supports D1 D2
[    2.484129] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[    2.490834] pci 0000:00:00.0: BAR 0: assigned [mem 0xe0000000-0xe00fffff]
[    2.497658] pci 0000:00:00.0: PCI bridge to [bus 01-ff]
[    3.117562] pcieport 0000:00:00.0: AER: enabled with IRQ 42
[    3.123462] armada8k-pcie f2640000.pcie: host bridge /cp0/pcie@f2640000 ranges:
[    3.130813] armada8k-pcie f2640000.pcie:      MEM 0x00e1000000..0x00e1efffff -> 0x00e1000000
[    3.139317] armada8k-pcie f2640000.pcie: iATU unroll: disabled
[    3.145176] armada8k-pcie f2640000.pcie: Detected iATU regions: 8 outbound, 8 inbound
[    4.152396] armada8k-pcie f2640000.pcie: Phy link never came up
[    4.158370] armada8k-pcie f2640000.pcie: PCI host bridge to bus 0001:00
[    4.165018] pci_bus 0001:00: root bus resource [bus 00-ff]
[    4.170528] pci_bus 0001:00: root bus resource [mem 0xe1000000-0xe1efffff]
[    4.177446] pci 0001:00:00.0: [11ab:0110] type 01 class 0x060400
[    4.183485] pci 0001:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
[    4.189811] pci 0001:00:00.0: supports D1 D2
[    4.194099] pci 0001:00:00.0: PME# supported from D0 D1 D3hot
[    4.200768] pci 0001:00:00.0: BAR 0: assigned [mem 0xe1000000-0xe10fffff]
[    4.207591] pci 0001:00:00.0: PCI bridge to [bus 01-ff]
[    4.827437] pcieport 0001:00:00.0: AER: enabled with IRQ 43
[    4.834266] sfp sfp-eth@0: Host maximum power 2.0W
[    4.841916] mv88e6085 f212a200.mdio-mii:04: switch 0x1760 detected: Marvell 88E6176, revision 1
[    5.035312] mvpp2 f2000000.ethernet: using 8 per-cpu buffers
[    5.059721] mvpp2 f2000000.ethernet eth0: Using random mac address da:cb:fd:16:73:15
[    5.070253] mvpp2 f2000000.ethernet eth1: Using random mac address 3e:cc:6d:e4:76:21
[    5.079576] mvpp2 f2000000.ethernet eth2: Using random mac address d6:64:04:de:13:81
[    5.163798] xenon-sdhci f2780000.sdhci: Got CD GPIO
[    5.163941] mv88e6085 f212a200.mdio-mii:04: switch 0x1760 detected: Marvell 88E6176, revision 1
[    5.183113] sfp sfp-eth@0: module FS               SFP-10GSR-85     rev      sn S2203037721      dc 220326  
[    5.199628] mmc1: SDHCI controller on f2780000.sdhci [f2780000.sdhci] using ADMA 64-bit
[    5.233444] hwmon hwmon5: temp1_input not attached to any thermal zone
[    5.244080] mmc1: new high speed SDHC card at address aaaa
[    5.249840] mmcblk1: mmc1:aaaa SL16G 14.8 GiB 
[    5.258958]  mmcblk1: p1 p2
[    6.136292] mv88e6085 f212a200.mdio-mii:04: configuring for fixed/ link mode
[    6.156515] mv88e6085 f212a200.mdio-mii:04: Link is Up - 1Gbps/Full - flow control off
[    6.259235] mv88e6085 f212a200.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-2:00] driver [Marvell 88E1540] (irq=69)
[    6.276062] mvpp2 f2000000.ethernet: all ports have a low MTU, switching to per-cpu buffers
[    6.305515] mvpp2 f2000000.ethernet: using 8 per-cpu buffers
[    6.416998] mv88e6085 f212a200.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-2:01] driver [Marvell 88E1540] (irq=70)
[    6.539235] mv88e6085 f212a200.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-2:02] driver [Marvell 88E1540] (irq=71)
[    6.656999] mv88e6085 f212a200.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-2:03] driver [Marvell 88E1540] (irq=72)
[    6.774775] mv88e6085 f212a200.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-2:04] driver [Marvell 88E1540] (irq=73)
[    6.791145] DSA: tree 0 setup
[    6.806259] EXT4-fs (mmcblk1p2): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
[    6.815803] VFS: Mounted root (ext4 filesystem) readonly on device 179:26.
[    6.822825] Freeing unused kernel memory: 512K
[    6.882467] Run /sbin/init as init process
[    6.886580]   with arguments:
[    6.886582]     /sbin/init
[    6.886583]   with environment:
[    6.886584]     HOME=/
[    6.886586]     TERM=linux
[    6.942601] init: Console is alive
[    7.020795] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    7.035798] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    7.046790] init: - preinit -
[    7.230354] mvpp2 f2000000.ethernet eth0: configuring for inband/10gbase-r link mode
[    7.292441] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[    7.300234] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    9.266963] mount_root: mounting /dev/root
[    9.292229] EXT4-fs (mmcblk1p2): re-mounted. Opts: (null). Quota mode: disabled.
[    9.335870] EXT4-fs (mmcblk1p1): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
[    9.410199] urandom-seed: Seed file not found (/etc/urandom.seed)
[    9.430921] mvpp2 f2000000.ethernet eth0: Link is Down
[    9.446087] procd: - early -
[    9.965178] procd: - ubus -
[   10.019510] procd: - init -
[   10.085199] urngd: v1.0.2 started.
[   10.088785] kmodloader: loading kernel modules from /etc/modules.d/*
[   10.101919] GACT probability on
[   10.105420] Mirror/redirect action on
[   10.111182] u32 classifier
[   10.114049]     input device check on
[   10.117724]     Actions configured
[   10.142944] PPP generic driver version 2.4.2
[   10.147513] NET: Registered PF_PPPOX protocol family
[   10.153771] kmodloader: done loading kernel modules from /etc/modules.d/*
[   13.855813] mvpp2 f2000000.ethernet eth0: configuring for inband/10gbase-r link mode
[   13.863980] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[   13.872139] br-lan: port 1(eth0) entered blocking state
[   13.877416] br-lan: port 1(eth0) entered disabled state
[   13.882834] device eth0 entered promiscuous mode
[   13.888083] br-lan: port 1(eth0) entered blocking state
[   13.893353] br-lan: port 1(eth0) entered forwarding state
[   14.882417] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   36.962405] cp0-xhci1-vbus: disabling
[   36.966090] cp0_sd_vccq: disabling
[  114.515055] mvpp2 f2000000.ethernet eth0: Link is Down
[  114.520254] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[  114.528182] br-lan: port 1(eth0) entered disabled state
[  114.533562] br-lan: port 1(eth0) entered blocking state
[  114.538817] br-lan: port 1(eth0) entered forwarding state
[  128.608889] mvpp2 f2000000.ethernet eth0: Link is Down
[  128.614089] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[  128.621971] br-lan: port 1(eth0) entered disabled state
[  128.627436] br-lan: port 1(eth0) entered blocking state
[  128.632691] br-lan: port 1(eth0) entered forwarding state
[  144.748608] br-lan: port 1(eth0) entered disabled state
[  144.754791] device eth0 left promiscuous mode
[  144.759252] br-lan: port 1(eth0) entered disabled state
[  144.874119] mvpp2 f2000000.ethernet eth0: Link is Down
[  146.207939] mvpp2 f2000000.ethernet eth1: configuring for fixed/sgmii link mode
[  146.216111] mvpp2 f2000000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
[  146.223280] mv88e6085 f212a200.mdio-mii:04 lan1: configuring for phy/gmii link mode
[  146.234322] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[  146.294588] br-lan: port 1(lan1) entered blocking state
[  146.299843] br-lan: port 1(lan1) entered disabled state
[  146.333869] device lan1 entered promiscuous mode
[  146.338507] device eth1 entered promiscuous mode
[  146.486812] mv88e6085 f212a200.mdio-mii:04 lan2: configuring for phy/gmii link mode
[  146.558597] br-lan: port 2(lan2) entered blocking state
[  146.563858] br-lan: port 2(lan2) entered disabled state
[  146.583267] mv88e6085 f212a200.mdio-mii:04: Link is Down
[  146.633038] mv88e6085 f212a200.mdio-mii:04: Link is Up - 1Gbps/Full - flow control off
[  146.633514] device lan2 entered promiscuous mode
[  146.664256] mv88e6085 f212a200.mdio-mii:04 lan3: configuring for phy/gmii link mode
[  146.732874] br-lan: port 3(lan3) entered blocking state
[  146.738123] br-lan: port 3(lan3) entered disabled state
[  146.787850] device lan3 entered promiscuous mode
[  146.811150] mv88e6085 f212a200.mdio-mii:04 lan4: configuring for phy/gmii link mode
[  146.867849] br-lan: port 4(lan4) entered blocking state
[  146.873102] br-lan: port 4(lan4) entered disabled state
[  146.924276] device lan4 entered promiscuous mode
[  146.947458] mv88e6085 f212a200.mdio-mii:04 lan5: configuring for phy/gmii link mode
[  147.012321] br-lan: port 5(lan5) entered blocking state
[  147.017574] br-lan: port 5(lan5) entered disabled state
[  147.068588] device lan5 entered promiscuous mode
[  147.102341] mvpp2 f2000000.ethernet eth0: configuring for inband/10gbase-r link mode
[  147.110728] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[  147.193356] mvpp2 f2000000.ethernet eth2: PHY [f212a200.mdio-mii:00] driver [Marvell 88E1510] (irq=POLL)
[  147.202930] mvpp2 f2000000.ethernet eth2: configuring for phy/rgmii-id link mode
[  147.812408] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1166.669027] mvpp2 f2000000.ethernet eth0: Link is Down
[ 1166.674228] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 2196.428425] mvpp2 f2000000.ethernet eth0: Link is Down
[ 2196.433625] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 3813.652973] mvpp2 f2000000.ethernet eth0: Link is Down
[ 3813.658168] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 5302.362389] mvpp2 f2000000.ethernet eth0: Link is Down
[ 5302.367590] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 5442.268959] mvpp2 f2000000.ethernet eth0: Link is Down
[ 5442.274200] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 5854.806767] mvpp2 f2000000.ethernet eth0: Link is Down
[ 5854.811963] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 8457.873078] mvpp2 f2000000.ethernet eth0: Link is Down
[ 8457.878274] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 9735.173150] mvpp2 f2000000.ethernet eth0: Link is Down
[ 9735.178345] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[11464.058805] mvpp2 f2000000.ethernet eth0: Link is Down
[11464.064005] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[13004.658511] mvpp2 f2000000.ethernet eth0: Link is Down
[13004.663713] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[15488.660302] mvpp2 f2000000.ethernet eth0: Link is Down
[15488.665501] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[17754.959881] mvpp2 f2000000.ethernet eth0: Link is Down
[17754.965082] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[18902.116343] mvpp2 f2000000.ethernet eth0: Link is Down
[18902.121541] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[23043.328595] mvpp2 f2000000.ethernet eth0: Link is Down
[23043.333797] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[25151.969498] mvpp2 f2000000.ethernet eth0: Link is Down
[25151.974698] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[27511.046791] mvpp2 f2000000.ethernet eth0: Link is Down
[27511.051986] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[27529.473798] mvpp2 f2000000.ethernet eth0: Link is Down
[27529.478990] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[28975.922164] mvpp2 f2000000.ethernet eth0: Link is Down
[28975.927364] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[30412.622540] mvpp2 f2000000.ethernet eth0: Link is Down
[30412.627736] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[30793.514448] mvpp2 f2000000.ethernet eth0: Link is Down
[30793.519645] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[32138.031340] mvpp2 f2000000.ethernet eth0: Link is Down
[32138.036545] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx```

I will try and replicate this locally. Are you using the BSP kernel, or mainline?

We have observed similar problems with all Marvell devices we own (CN9130 Base, CN9130 Pro, GT-8K). For that we made a table of various combinations.

One key insight we had is that DACs seem to work without issue and the problems with optics seem to disappear when the the other side is NOT connected to another Marvell chip.

The disconnects were partially caused by the enclosure, since it doesn’t allow the sfp modules to fully seat.

I attached a table (hopefully it renders correctly) with a couple of combinations we tested. Columns that indicate disconnects have the same dmesg messages as above. In addition to that we observed CRC errors that cause packets to drop (around 0.1-1%). The errors look like:

[ 7221.999308] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 7232.691563] mvpp2 f2000000.ethernet eth0: Link is Down
[ 7241.237238] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[ 7262.187663] mvpp2 f2000000.ethernet eth0: bad rx status 43048a10 (crc error), size=1516
[ 7262.839740] mvpp2 f2000000.ethernet eth0: bad rx status 43048a10 (crc error), size=790
[ 7263.548694] mvpp2 f2000000.ethernet eth0: bad rx status 43048a10 (crc error), size=1516
[ 7263.974951] mvpp2 f2000000.ethernet eth0: bad rx status 43048a10 (crc error), size=1516
[ 7265.089051] mvpp2 f2000000.ethernet eth0: bad rx status 43048a10 (crc error), size=1516
[ 7266.933865] mvpp2 f2000000.ethernet eth0: bad rx status 43048a10 (crc error), size=1516
Summary
Clearfog SFP Modul Modul Firmware Detected Medium SFP Modul Modul Firmware Detected Peer Link? Disconnects? CRC Error
CN9130 Pro Flexoptix P.1396.10 Nvidia MFM1T02A-LR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Pro SFP-10GLRM-31 MK Yes SM SFP-10GLRM-31 Generic Yes CR328-24P-4S+RM Yes Yes
CN9130 Pro Finisar FTLX8571D3BCL Generic Yes MM Finisar FTLX8571D3BCL Generic Yes CR328-24P-4S+RM Yes Yes
CN9130 Pro Flexoptix P.C30.1 Generic Yes DAC Flexoptix P.C30.1 Generic Yes CR328-24P-4S+RM No
CN9130 Pro Flexoptix P.1396.10 Nvidia MFM1T02A-LR Yes SM Flexoptix P.1396.10 Aruba J9152A Yes Aruba 2930F JL258A Yes No
CN9130 Base Flexoptix P.1396.10 Nvidia MFM1T02A-LR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Flexoptix P.1396.10 Intel E10GSFPLR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Flexoptix P.1396.10 Intel E10GSFPSR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Flexoptix P.1396.10 MSA Standard P.1396.10 Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Flexoptix P.1396.10 Penguin Computing SFP+ LR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Flexoptix P.1396.10 Quanta SFP+ LR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Flexoptix P.1396.10 Synology SFP+ LR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Flexoptix P.1396.10 Zyxel SFP10G-LR Yes SM Flexoptix P.1396.10 MikroTik S+31DLC10D Yes CR328-24P-4S+RM Yes Yes
CN9130 Base Finisar FTLX8571D3BCL Generic Yes SM Finisar FTLX8571D3BCL Generic Yes CR328-24P-4S+RM Yes Yes
CN9130 Pro Flexoptix P.C30.1 Nvidia MC3309130-001 DAC Flexoptix P.C30.1 Aruba J9281B Aruba 2930F JL258A
CN9130 Base Flexoptix P.1396.10 Nvidia MFM1T02A-LR SM Flexoptix P.1396.10 Aruba J9152A Aruba 2930F JL258A
CN9130 Base Flexoptix P.1396.10 Ubiquiti Yes SMF Flexoptix P.1396.10 Ubiquiti Yes CN9130 Pro Yes No Yes
CN9130 Base Flexoptix P.1396.10 Mellanox Yes SMF Flexoptix P.1396.10 Mellanox Yes CN9130 Pro Yes No Yes
CN9130 Base Flexoptix P.1396.10 ZTE Yes SMF Flexoptix P.1396.10 ZTE Yes CN9130 Pro Yes No Yes
CN9130 Base Flexoptix P.1396.10 Mikrotik Yes SMF Flexoptix P.1396.10 Mikrotik Yes CN9130 Pro No
CN9130 Base Flexoptix P.1396.10 Intel Yes SMF Flexoptix P.1396.10 Intel Yes CN9130 Pro No
CN9130 Base Flexoptix P.1396.10 Ubiquiti 1G Yes SMF Flexoptix P.1396.10 Ubiquiti 1G Yes CN9130 Pro No
CN9130 Base FS SFPP-PC01 Generic Yes DAC FS SFPP-PC01 Generic Yes CN9130 Pro Yes No No
CN9130 Base SFP-10GLRM-31 Generic Yes SMF SFP-10GLRM-31 Generic Yes CN9130 Pro Yes No Yes
CN9130 Base SFP-10GLRM-31 Generic Yes SMF SFP-10GLRM-31 Generic Yes Mellanox Connect-X 3 Yes No No
CN9130 Base Flexoptix P.1396.10 Ubiquiti Yes SMF Flexoptix P.1396.10 Ubiquiti Yes Mellanox Connect-X 3 Yes No No
CN9130 Base Flexoptix P.1396.10 Mellanox Yes SMF Flexoptix P.1396.10 Mellanox Yes Mellanox Connect-X 3 Yes No No
CN9130 Base Flexoptix P.1396.10 ZTE Yes SMF Flexoptix P.1396.10 ZTE Yes Mellanox Connect-X 3 Yes No No

I am using the openwrt mainline, but I have configured the build for their testing kernel. I saw the same behavior on their official kernel. I am happy to try more images or DTS additions / subtractions.

Based on the comment above, I attempted to re-seat my sfp module. when I plugged the module back it, the kernel had an Oops.

root@OpenWrt:/# ethtool eth0
Settings for eth0:
	Supported ports: [ FIBRE ]
	Supported link modes:   2500baseX/Full
	                        1000baseX/Full
	                        10000baseSR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10000baseSR/Full
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: on
	Port: FIBRE
	PHYAD: 0
	Transceiver: internal
	Link detected: yes
root@OpenWrt:/# [588940.101007] mvpp2 f2000000.ethernet eth0: Link is Down
[588940.106294] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx

root@OpenWrt:/# 
root@OpenWrt:/# 
root@OpenWrt:/# [590180.428000] mvpp2 f2000000.ethernet eth0: Link is Down
[590180.433319] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[590180.443154] mvpp2 f2000000.ethernet eth0: Link is Down
[590180.448432] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[590180.457362] mvpp2 f2000000.ethernet eth0: Link is Down
[590180.462638] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[590180.471599] mvpp2 f2000000.ethernet eth0: Link is Down
[590182.444816] mvpp2 f2000000.ethernet eth0: port 0: cleaning queue 0 timed out
[590182.470123] mvpp2 f2000000.ethernet eth0: configuring for inband/10gbase-r link mode
[590196.713757] sfp sfp-eth@0: module removed
[590205.853733] sfp sfp-eth@0: module FS               SFP-10GSR-85     rev      sn S2203037721      dc 220326  
[590205.893460] hwmon hwmon5: temp1_input not attached to any thermal zone
[590217.736096] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[590219.202739] mvpp2 f2000000.ethernet eth0: Tx stop timed out, status=0x00000101
[590219.210087] mvpp2 f2000000.ethernet eth0: Link is Down
[590239.822609] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[590240.892417] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[590241.052455] ------------[ cut here ]------------
[590241.057180] refcount_t: underflow; use-after-free.
[590241.062087] WARNING: CPU: 0 PID: 12 at refcount_warn_saturate+0xec/0x140
[590241.068912] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack slhc nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mdio_gpio mdio_bitbang libcrc32c crc_ccitt sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact netlink_diag gpio_button_hotplug gpio_cascade mux_core
[590241.133973] CPU: 0 PID: 12 Comm: ksoftirqd/0 Not tainted 5.15.79 #0
[590241.140353] Hardware name: SolidRun CN9130 based SOM Clearfog Pro (DT)
[590241.146993] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[590241.154072] pc : refcount_warn_saturate+0xec/0x140
[590241.158968] lr : refcount_warn_saturate+0xec/0x140
[590241.163865] sp : ffffffc008e53a30
[590241.167279] x29: ffffffc008e53a30 x28: ffffffc008be6000 x27: 0000000000000000
[590241.174534] x26: ffffff8107e68000 x25: ffffffc008e53b00 x24: ffffff8102078800
[590241.181788] x23: 0000000000003fe3 x22: 0000000000000400 x21: ffffff8107e68000
[590241.189042] x20: ffffff8101139904 x19: ffffff81011397c0 x18: 0000000000000483
[590241.196296] x17: 0000000000000013 x16: 00004c0b104c3f1c x15: ffffffc008bfdd98
[590241.203550] x14: 0000000000000d89 x13: 0000000000000483 x12: ffffffc008e53758
[590241.210803] x11: ffffffc008c55d98 x10: 00000000fffff000 x9 : ffffffc008c55d98
[590241.218056] x8 : 0000000000000000 x7 : ffffffc008bfdd98 x6 : 0000000000000001
[590241.225310] x5 : ffffff813ff5e738 x4 : 0000000000000000 x3 : 0000000000000027
[590241.232562] x2 : 0000000000000027 x1 : 0000000000000023 x0 : 0000000000000026
[590241.239816] Call trace:
[590241.242357]  refcount_warn_saturate+0xec/0x140
[590241.246906]  sock_wfree+0xe4/0xf0
[590241.250322]  skb_release_head_state+0x38/0xc0
[590241.254783]  consume_skb+0x40/0xe0
[590241.258285]  __dev_kfree_skb_any+0x3c/0x4c
[590241.262483]  mvpp2_txq_bufs_free.constprop.0+0x6c/0x140
[590241.267818]  mvpp2_txq_done+0xb8/0x110
[590241.271670]  mvpp2_tx_done+0x8c/0xd0
[590241.275346]  mvpp2_poll+0x1ec/0x200
[590241.278935]  __napi_poll+0x34/0x1d0
[590241.282525]  net_rx_action+0x2cc/0x320
[590241.286375]  _stext+0x13c/0x370
[590241.289615]  run_ksoftirqd+0x4c/0x60
[590241.293292]  smpboot_thread_fn+0x13c/0x17c
[590241.297492]  kthread+0x11c/0x130
[590241.300819]  ret_from_fork+0x10/0x20
[590241.304496] ---[ end trace 8468808817bf0168 ]---
[590241.309481] ------------[ cut here ]------------
[590241.314214] refcount_t: saturated; leaking memory.
[590241.319120] WARNING: CPU: 0 PID: 12365 at refcount_warn_saturate+0x6c/0x140
[590241.326199] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack slhc nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mdio_gpio mdio_bitbang libcrc32c crc_ccitt sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact netlink_diag gpio_button_hotplug gpio_cascade mux_core
[590241.391241] CPU: 0 PID: 12365 Comm: kworker/0:0 Tainted: G        W         5.15.79 #0
[590241.399277] Hardware name: SolidRun CN9130 based SOM Clearfog Pro (DT)
[590241.405917] Workqueue: mld mld_ifc_work
[590241.409857] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[590241.416934] pc : refcount_warn_saturate+0x6c/0x140
[590241.421831] lr : refcount_warn_saturate+0x6c/0x140
[590241.426728] sp : ffffffc00be13b00
[590241.430141] x29: ffffffc00be13b00 x28: 0000000000000000 x27: 0000000000000000
[590241.437395] x26: 0000000000000000 x25: ffffffc00be13c34 x24: ffffffc00807e460
[590241.444649] x23: ffffffc00be13bc0 x22: ffffff8107c4cc80 x21: ffffff8102078000
[590241.451902] x20: ffffff8106a55d00 x19: ffffff81011397c0 x18: 00000000000004a9
[590241.459155] x17: 0002bd5f0a42edb2 x16: 0000000000000015 x15: ffffffc008bfdd98
[590241.466408] x14: 0000000000000dfb x13: 00000000000004a9 x12: ffffffc00be13828
[590241.473662] x11: ffffffc008c55d98 x10: 00000000fffff000 x9 : ffffffc008c55d98
[590241.480916] x8 : 0000000000000000 x7 : ffffffc008bfdd98 x6 : 0000000000000001
[590241.488169] x5 : ffffff813ff5e738 x4 : 0000000000000000 x3 : 0000000000000027
[590241.495423] x2 : 0000000000000027 x1 : 0000000000000023 x0 : 0000000000000026
[590241.502675] Call trace:
[590241.505216]  refcount_warn_saturate+0x6c/0x140
[590241.509764]  skb_set_owner_w+0xb4/0x104
[590241.513701]  sock_alloc_send_pskb+0x21c/0x230
[590241.518161]  sock_alloc_send_skb+0x1c/0x24
[590241.522359]  mld_newpack.isra.0+0x70/0x1f0
[590241.526557]  add_grhead+0x94/0xa4
[590241.529971]  add_grec+0x43c/0x460
[590241.533386]  mld_ifc_work+0x314/0x490
[590241.537148]  process_one_work+0x21c/0x480
[590241.541262]  worker_thread+0x74/0x4d4
[590241.545026]  kthread+0x11c/0x130
[590241.548353]  ret_from_fork+0x10/0x20
[590241.552029] ---[ end trace 8468808817bf0169 ]---
[590242.385473] Unable to handle kernel read from unreadable memory at virtual address 000000000000006f
[590242.394649] Mem abort info:
[590242.397538]   ESR = 0x96000005
[590242.400690]   EC = 0x25: DABT (current EL), IL = 32 bits
[590242.406114]   SET = 0, FnV = 0
[590242.409266]   EA = 0, S1PTW = 0
[590242.412509]   FSC = 0x05: level 1 translation fault
[590242.417492] Data abort info:
[590242.420468]   ISV = 0, ISS = 0x00000005
[590242.424408]   CM = 0, WnR = 0
[590242.427472] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001076d8000
[590242.434031] [000000000000006f] pgd=0800000107341003, p4d=0800000107341003, pud=0800000107341003, pmd=0000000000000000
[590242.444780] Internal error: Oops: 96000005 [#1] SMP
[590242.449766] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack slhc nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mdio_gpio mdio_bitbang libcrc32c crc_ccitt sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact netlink_diag gpio_button_hotplug gpio_cascade mux_core
[590242.514807] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.15.79 #0
[590242.522320] Hardware name: SolidRun CN9130 based SOM Clearfog Pro (DT)
[590242.528960] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[590242.536037] pc : kmem_cache_alloc+0xa4/0x4bc
[590242.540413] lr : kmem_cache_alloc+0x38/0x4bc
[590242.544786] sp : ffffffc008003b90
[590242.548198] x29: ffffffc008003b90 x28: ffffff810016c480 x27: 0000000000000004
[590242.555451] x26: ffffff8102078800 x25: ffffffc0086fabc8 x24: 0000000000000a20
[590242.562705] x23: 0000000000000007 x22: 0000000000000000 x21: ffffff81002df300
[590242.569958] x20: ffffffc008d69000 x19: ffffffc008be8000 x18: 0000000000000000
[590242.577211] x17: ffffffc137399000 x16: ffffffc008004000 x15: 0000000000004000
[590242.584464] x14: ffffffffffffffff x13: 0000000000000038 x12: 0101010101010101
[590242.591717] x11: 0000000000000098 x10: 00000000000008f0 x9 : ffffffc008be3cd0
[590242.598970] x8 : ffffff813ff6a380 x7 : 0000000000000000 x6 : 0000000000001520
[590242.606223] x5 : 0000000000000000 x4 : fffffffe041a9540 x3 : ffffffc137399000
[590242.613477] x2 : ffffff813ff711d0 x1 : 00000000000ebdac x0 : 0000000000000068
[590242.620730] Call trace:
[590242.623271]  kmem_cache_alloc+0xa4/0x4bc
[590242.627295]  __build_skb+0x28/0x60
[590242.630796]  build_skb+0x18/0x70
[590242.634123]  mvpp2_rx+0x5fc/0xb30
[590242.637539]  mvpp2_poll+0x100/0x200
[590242.641129]  __napi_poll+0x34/0x1d0
[590242.644717]  net_rx_action+0x2cc/0x320
[590242.648567]  _stext+0x13c/0x370
[590242.651806]  irq_exit+0x80/0xac
[590242.655047]  handle_domain_irq+0x60/0x90
[590242.659073]  gic_handle_irq+0x70/0xa0
[590242.662836]  call_on_irq_stack+0x28/0x40
[590242.666862]  do_interrupt_handler+0x4c/0x54
[590242.671149]  el1_interrupt+0x2c/0x4c
[590242.674824]  el1h_64_irq_handler+0x14/0x20
[590242.679023]  el1h_64_irq+0x78/0x7c
[590242.682524]  arch_cpu_idle+0x14/0x20
[590242.686200]  default_idle_call+0x44/0x110
[590242.690313]  do_idle+0x1b0/0x1e0
[590242.693641]  cpu_startup_entry+0x20/0x60
[590242.697667]  rest_init+0xc4/0xd0
[590242.700994]  arch_call_rest_init+0xc/0x14
[590242.705107]  start_kernel+0x678/0x698
[590242.708870]  __primary_switched+0xa0/0xa8
[590242.712983] Code: f100009f fa401ae4 54001a60 b9402aa0 (f8606ae3) 
[590242.719188] ---[ end trace 8468808817bf016a ]---
[590242.723910] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[590242.730899] SMP: stopping secondary CPUs
[590242.734925] Kernel Offset: disabled
[590242.738513] CPU features: 0x0,00000120,00000802
[590242.743148] Memory Limit: none
[590242.746300] Rebooting in 3 seconds..

I don’t know if the Oops / refcount issue is related to the link flapping but I figured it should be mentioned.

That is a fantastic table!
I had the generic firmware from FS. feel free to add my info to the table if that is useful.

It is going to take me a day or so to free up the hardware I need to replicate this. I will update the forum after testing.

Any updates? Were you able to get the hardware for a reproducer?

I see the same issue here with a 10Gtek AXS85-192-M3. See link flaps usually between 0.00003 and 0.00004 sec length. Anywhere from once a minute to once a second.

The Clearfog is connected to a Mikrotik switch.

What kernel / version are you using? uname -a if you are at a prompt.

At the moment 5.10.0-21-arm64 #1 SMP Debian 5.10.162-1 (2023-01-21) aarch64 GNU/Linux. dtb from the kernel provided by Solidrun.

Let me double check with a kernel from a solidrun provided boot image.

Confirmed that it also happens with 5.8.0-00006-g6da046907c97-dirty #4 SMP PREEMPT Thu Jun 24 16:44:14 IDT 2021 aarch64 GNU/Linux from https://solid-run-images.sos-de-fra-1.exo.io/CN913x/cn9130-cf-base_config_1_ubuntu-4cbe176.img.xz .

It seems that with 5.15.94 the frequency of the link flap is reduced (>>10min instead of <<1min).

I installed the dts and cpufreq/thermal patches from solidrun. The DPDK/MUSDK patches did not install cleanly.

Update: Seems to hold steady at ~5 flaps in the past 12 hours.

It has remained an issue for me with all debian kernels including 6.0.12-1~bpo11+1 as well.

Another update: with the patched 5.15.94 the system is keeping steady ad 4-5 flaps per day over the past week. So not perfect but significantly better.

After reading about the OpenWRT effort here

Booting OpenWrt on ClearFog CN9130-Pro - #13 by solarnetone

we reran some tests on plain Linux kernel version 6.2.8 with only the new devicetrees added. And again, connecting 2 CN9130 with fiber optics caused very frequent link losses in the order up to a 100 times a second:

[   64.370308] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[   64.379168] mvpp2 f2000000.ethernet eth0: Link is Down
[   64.384398] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[   64.393332] mvpp2 f2000000.ethernet eth0: Link is Down
[   64.398535] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[   64.407406] mvpp2 f2000000.ethernet eth0: Link is Down
[   64.412612] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[   64.421522] mvpp2 f2000000.ethernet eth0: Link is Down
[   64.426819] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[   64.435701] mvpp2 f2000000.ethernet eth0: Link is Down

At this point we haven’t managed to find a single working configuration. The only options remain using DACs or some other link partner that is not using marvel hardware.

Can you please provide the full dmesg output. It is possible that the fiber modules have a marginal pull-up/pull-down on the pins that Linux uses to determine link.

Changed from a Mikrotik to an older Brocade ICX switch with all modules/cables being the same. Have seen neither CRC errors nor link flags since…
I did not see any other difference in the configuration (speed, duplex, flow control, …) between the switches.

The peering partner seems to have a clear impact as suggested by the table in one of the comments above. I assume this is something on the lower level ethernet protocols with manifesting with the MikroTik switches causing the issues.
Given that the behavior changed without the module changing I assume it’s not related to anything with the fiber modules.