LX2160A DPDK Example Use Case - Software Packet Distributor

Hi all,

We’re releasing SPD v1.0.5 — a DPDK-based, software-only Ethernet packet distributor that adds a
Greedy Reshaper to reduce worker imbalance under elephant-flow skew.

What it is
• Software-only & portable: no NIC-specific features; all reshaping is done in user space. Suit for SDN.
• Bounded, in-place edits: each interval flips a small number of RETA entries to move hot buckets
from overloaded to cold workers, keeping overhead predictable.

Repo & docs
• Repository: GitHub - mikechang-engr/software-packet-distributor: A software-only, flow-aware Ethernet packet distribution framework for DPDK that dynamically reshapes traffic and workload using a greedy algorithm to improve fairness and load balance across multiple cores in an embedded networking system. Validated on the NXP LX2160A platform.
• README: overview, quick start, start script knobs (TARGET_MPPS/GBPS, ELEPHANTS, GREEDY),
metrics path (/var/log/software-packet-distributor/worker_stats_v105.csv), and core layout.

Overview
The Software Packet Distributor (SPD) is a DPDK-based packet distribution framework for embedded multicore networking systems. It addresses the limitations of static RSS by introducing a Greedy Reshaper that adaptively reassigns flow buckets to worker cores based on runtime telemetry—improving fairness, utilization, and stability without relying on NIC-specific features.

Motivation
Traditional RSS-based packet steering is fast but fundamentally static. While RSS classifies traffic by 5-tuple flows, it ignores the weight imbalance between flows. In real network conditions, a small number of high-volume (elephant) flows can overwhelm specific CPU cores, leaving others underutilized—hurting throughput, fairness, and stability.

Hardware solutions offer limited help: NIC indirection tables are vendor-specific and coarse-grained. Kernel-level steering mechanisms are not suitable for high-speed, user-space DPDK pipelines, where latency and overhead must remain tightly bounded.

The Software Packet Distributor (SPD) is motivated by the need for a fully software-defined, adaptable distribution path that remains portable and predictable across platforms. Its design is guided by three principles:

• Software-only — No dependence on NIC-specific capabilities or offloads; deterministic behavior across DPDK environments.
• Flow-aware — Continuously observes per-flow and per-bucket characteristics to identify hotspots and imbalances.
• Workload-aware — Dynamically reshapes bucket-to-core assignments based on real-time worker utilization, stabilizing system behavior under skewed traffic.

Validated setup (Arm)
• Board: NXP LX2160A-RDB (16×A72 @2.2GHz); LSDK 21.08; Linux 5.10.35; DPDK 19.11.7 (PCAP/NULL vdev).
• Hugepages: 1GiB (preferred) with 2MiB fallback.

What’s new in v1.0.5
• Build portability: nested func removed (lcg32_local at file scope); runtime unchanged.
• Perf CSV: new filename/location; start script: improved signal escalation & newline-safe logging.

Call for feedback
• Looking for testers on additional SoCs/NICs and discussion about congestion-aware bucket ranking.
• If you can share worker_stats_v105.csv from a 60–120s run, we’ll generate comparison plots.

Thanks!
—Mike (author), repo link above

Very interesting, thanks for sharing your project. Do you have any benchmarks showing the throughput and latency achieved on the LX2160a?

Hi Jon,

I am a student in my first year of high school. I did this project to prepare for my science fair competition.

I have already tested my code and generated initial performance data based on LX2160A-RDB. I plan to post the test result by next week.

It is also possible to reproduce the test on the Solidrun Honeycomb platform. My test is based on user space DPDK and is hardware independent. The entire test runs in user space only and is not dependent on DPAA2. I will be very happy if you can test my project on Honeycomb.

This is my development plan for my science fair projects:

  • City Science Fair (in March): Using the project I made about Hash algorithm to achieve flow-awareness packet distribution.

  • National Science Fair (in July): Using Greedy Reshaper to achieve Workload-awareness packet distribution.

  • International Science Fair (in Nov): Using an MLP model based Edge AI to replace the Greedy Reshaper algorithm to improve overall system efficiency.

Thank you, Jon, for your guidance.

Regards,

Mike