[net-next,v6,00/21] ice: add PFCP filter support

Message ID	20240327152358.2368467-1-aleksander.lobakin@intel.com
Headers	show Return-Path: <intel-wired-lan-bounces@osuosl.org> Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=198.175.65.19; helo=mgamail.intel.com; envelope-from=aleksander.lobakin@intel.com; receiver=<UNKNOWN> DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 461EA40144 From: Alexander Lobakin <aleksander.lobakin@intel.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Date: Wed, 27 Mar 2024 16:23:37 +0100 Message-ID: <20240327152358.2368467-1-aleksander.lobakin@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711553090; x=1743089090; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=YuLkF9Z+FIw9qgWIRQbpESpKngwjwTeo7VxYdY1Lr8g=; b=XmU5lufCZLu96apm+iC4k0E/WWEzoX1imI2E7rureE+zCW+ZNbjuEfuP Yay9cgkrv3+AHJPffk2vw18xCaqcGfcR7XBtYx+fcFYdGXkgx+0vJyQb/ nLFWVebGb68NEGMfqWRynF7S5+Gcz7S4htd2SAf7BHLTy/Oa1iCae5tGE d3j27F+Jxgr4Te+z/oezE6EoDELIDIfWTeN7iw666SaGMRGWemVGFa+Ib UEA4rYoP7vIeZUDrCtivRlA6eY33HrWuY45ZgEJtPVA0/BeCWCOMPajmw uqa+ooVdORbAaTznLERLZbjgHqc8ydrAKJ8UNli8BKNOqRIDayJjWxq05 w==; X-Mailman-Original-Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=none dis=none) header.from=intel.com X-Mailman-Original-Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=XmU5lufC Subject: [Intel-wired-lan] [PATCH net-next v6 00/21] ice: add PFCP filter support Precedence: list Cc: Yury Norov <yury.norov@gmail.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Alexander Lobakin <aleksander.lobakin@intel.com>, intel-wired-lan@lists.osuosl.org, nex.sw.ncis.osdt.itp.upstreaming@intel.com, Alexander Potapenko <glider@google.com> Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" <intel-wired-lan-bounces@osuosl.org>
Series	ice: add PFCP filter support \| expand [net-next,v6,00/21] ice: add PFCP filter support [net-next,v6,01/21] lib/bitmap: add bitmap_{read, write}() [net-next,v6,02/21] lib/test_bitmap: add tests for bitmap_{read, write}() [net-next,v6,03/21] lib/test_bitmap: use pr_info() for non-error messages [net-next,v6,04/21] bitops: add missing prototype check [net-next,v6,05/21] bitops: make BYTES_TO_BITS() treewide-available [net-next,v6,06/21] bitops: let the compiler optimize {__, }assign_bit() [net-next,v6,07/21] linkmode: convert linkmode_{test, set, clear, mod}_bit() to macros [net-next,v6,08/21] s390/cio: rename bitmap_size() -> idset_bitmap_size() [net-next,v6,09/21] fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64() [net-next,v6,10/21] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() [net-next,v6,11/21] tools: move alignment-related macros to new <linux/align.h> [net-next,v6,12/21] bitmap: introduce generic optimized bitmap_size() [net-next,v6,13/21] bitmap: make bitmap_{get, set}_value8() use bitmap_{read, write}() [net-next,v6,14/21] lib/bitmap: add compile-time test for __assign_bit() optimization [net-next,v6,15/21] ip_tunnel: use a separate struct to store tunnel params in the kernel [net-next,v6,16/21] ip_tunnel: convert __be16 tunnel flags to bitmaps [net-next,v6,17/21] net: net_test: add tests for IP tunnel flags conversion helpers [net-next,v6,18/21] pfcp: add PFCP module [net-next,v6,19/21] pfcp: always set pfcp metadata [net-next,v6,20/21] ice: refactor ICE_TC_FLWR_FIELD_ENC_OPTS [net-next,v6,21/21] ice: Add support for PFCP hardware offload in switchdev

Alexander Lobakin March 27, 2024, 3:23 p.m. UTC

Add support for creating PFCP filters in switchdev mode. Add pfcp module
that allows to create a PFCP-type netdev. The netdev then can be passed to
tc when creating a filter to indicate that PFCP filter should be created.

To add a PFCP filter, a special netdev must be created and passed to tc
command:

  ip link add pfcp0 type pfcp
  tc filter add dev eth0 ingress prio 1 flower pfcp_opts \
    1:12ab/ff:fffffffffffffff0 skip_hw action mirred egress redirect \
    dev pfcp0

Changes in iproute2 [1] are required to use pfcp_opts in tc.

ICE COMMS package is required as it contains PFCP profiles.

Part of this patchset modifies IP_TUNNEL_*_OPTs, which were previously
stored in a __be16. All possible values have already been used, making
it impossible to add new ones.

* 1-3: add new bitmap_{read,write}(), which is used later in the IP
       tunnel flags code (from Alexander's ARM64 MTE series[2]);
* 4-14: some bitmap code preparations also used later in IP tunnels;
* 15-17: convert IP tunnel flags from __be16 to a bitmap;
* 18-21: add PFCP module and support for it in ice.

[1] https://lore.kernel.org/netdev/20230614091758.11180-1-marcin.szycik@linux.intel.com
[2] https://lore.kernel.org/linux-kernel/20231218124033.551770-1-glider@google.com

Alexander Lobakin (14):
  bitops: add missing prototype check
  bitops: make BYTES_TO_BITS() treewide-available
  bitops: let the compiler optimize {__,}assign_bit()
  linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros
  s390/cio: rename bitmap_size() -> idset_bitmap_size()
  fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64()
  btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
  tools: move alignment-related macros to new <linux/align.h>
  bitmap: introduce generic optimized bitmap_size()
  bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()
  lib/bitmap: add compile-time test for __assign_bit() optimization
  ip_tunnel: use a separate struct to store tunnel params in the kernel
  ip_tunnel: convert __be16 tunnel flags to bitmaps
  net: net_test: add tests for IP tunnel flags conversion helpers

Alexander Potapenko (2):
  lib/test_bitmap: add tests for bitmap_{read,write}()
  lib/test_bitmap: use pr_info() for non-error messages

Marcin Szycik (2):
  ice: refactor ICE_TC_FLWR_FIELD_ENC_OPTS
  ice: Add support for PFCP hardware offload in switchdev

Michal Swiatkowski (1):
  pfcp: always set pfcp metadata

Syed Nayyar Waris (1):
  lib/bitmap: add bitmap_{read,write}()

Wojciech Drewek (1):
  pfcp: add PFCP module

 drivers/net/Kconfig                           |  13 +
 drivers/net/Makefile                          |   1 +
 net/core/Makefile                             |   2 +-
 .../net/ethernet/intel/ice/ice_flex_type.h    |   4 +-
 .../ethernet/intel/ice/ice_protocol_type.h    |  12 +
 drivers/net/ethernet/intel/ice/ice_switch.h   |   2 +
 drivers/net/ethernet/intel/ice/ice_tc_lib.h   |   8 +-
 .../ethernet/mellanox/mlx5/core/en/tc_tun.h   |   2 +-
 .../ethernet/mellanox/mlxsw/spectrum_ipip.h   |   2 +-
 fs/ntfs3/ntfs_fs.h                            |   4 +-
 include/linux/bitmap.h                        |  95 ++++--
 include/linux/bitops.h                        |  23 +-
 include/linux/cpumask.h                       |   2 +-
 include/linux/linkmode.h                      |  27 +-
 include/linux/netdevice.h                     |   7 +-
 include/net/dst_metadata.h                    |  10 +-
 include/net/flow_dissector.h                  |   2 +-
 include/net/gre.h                             |  70 ++--
 include/net/ip6_tunnel.h                      |   4 +-
 include/net/ip_tunnels.h                      | 139 ++++++--
 include/net/pfcp.h                            |  90 ++++++
 include/net/udp_tunnel.h                      |   4 +-
 include/uapi/linux/if_tunnel.h                |  36 +++
 include/uapi/linux/pkt_cls.h                  |  14 +
 tools/include/linux/align.h                   |  12 +
 tools/include/linux/bitmap.h                  |   9 +-
 tools/include/linux/bitops.h                  |   2 +
 tools/include/linux/mm.h                      |   5 +-
 drivers/md/dm-clone-metadata.c                |   5 -
 drivers/net/bareudp.c                         |  19 +-
 drivers/net/ethernet/intel/ice/ice_ddp.c      |   9 +
 drivers/net/ethernet/intel/ice/ice_switch.c   |  85 +++++
 drivers/net/ethernet/intel/ice/ice_tc_lib.c   |  68 +++-
 .../mellanox/mlx5/core/en/tc_tun_encap.c      |   6 +-
 .../mellanox/mlx5/core/en/tc_tun_geneve.c     |  12 +-
 .../mellanox/mlx5/core/en/tc_tun_gre.c        |   8 +-
 .../mellanox/mlx5/core/en/tc_tun_vxlan.c      |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  16 +-
 .../ethernet/mellanox/mlxsw/spectrum_ipip.c   |  56 ++--
 .../ethernet/mellanox/mlxsw/spectrum_span.c   |  10 +-
 .../ethernet/netronome/nfp/flower/action.c    |  27 +-
 drivers/net/geneve.c                          |  44 ++-
 drivers/net/pfcp.c                            | 302 ++++++++++++++++++
 drivers/net/vxlan/vxlan_core.c                |  14 +-
 drivers/s390/cio/idset.c                      |  12 +-
 fs/btrfs/free-space-cache.c                   |   8 +-
 fs/ntfs3/bitmap.c                             |   4 +-
 fs/ntfs3/fsntfs.c                             |   2 +-
 fs/ntfs3/index.c                              |  11 +-
 fs/ntfs3/super.c                              |   2 +-
 kernel/trace/trace_probe.c                    |   2 -
 lib/math/prime_numbers.c                      |   2 -
 lib/test_bitmap.c                             | 203 ++++++++++--
 net/bridge/br_vlan_tunnel.c                   |   9 +-
 net/core/filter.c                             |  26 +-
 net/core/flow_dissector.c                     |  20 +-
 net/core/{gso_test.c => net_test.c}           | 129 +++++++-
 net/ipv4/fou_bpf.c                            |   2 +-
 net/ipv4/gre_demux.c                          |   2 +-
 net/ipv4/ip_gre.c                             | 144 +++++----
 net/ipv4/ip_tunnel.c                          | 109 +++++--
 net/ipv4/ip_tunnel_core.c                     |  82 +++--
 net/ipv4/ip_vti.c                             |  41 ++-
 net/ipv4/ipip.c                               |  33 +-
 net/ipv4/ipmr.c                               |   2 +-
 net/ipv4/udp_tunnel_core.c                    |   5 +-
 net/ipv6/addrconf.c                           |   3 +-
 net/ipv6/ip6_gre.c                            |  85 ++---
 net/ipv6/ip6_tunnel.c                         |  14 +-
 net/ipv6/sit.c                                |  38 ++-
 net/netfilter/ipvs/ip_vs_core.c               |   6 +-
 net/netfilter/ipvs/ip_vs_xmit.c               |  20 +-
 net/netfilter/nft_tunnel.c                    |  44 +--
 net/openvswitch/flow_netlink.c                |  61 ++--
 net/psample/psample.c                         |  26 +-
 net/sched/act_tunnel_key.c                    |  36 +--
 net/sched/cls_flower.c                        | 134 +++++++-
 tools/perf/util/probe-finder.c                |   4 +-
 78 files changed, 1988 insertions(+), 624 deletions(-)
 create mode 100644 include/net/pfcp.h
 create mode 100644 tools/include/linux/align.h
 create mode 100644 drivers/net/pfcp.c
 rename net/core/{gso_test.c => net_test.c} (67%)

---
From v5[3]:
* 6.9-rc1;
* move IP tunnel flags KUnit test from the bitmap test suite to the
  networking KUnit (CONFIG_NET_TEST) (Yury);
* pick tags (Yury, Peter); now the entire series is fully tagged
  (except the above tests);
* no functional changes besides the tests.

From v4[4]:
* rebase on top of 6.8-rc1;
* collect all the dependencies together in one series (did I get it
  right? :s);
* no functional changes.

[3] https://lore.kernel.org/netdev/20240201122216.2634007-1-aleksander.lobakin@intel.com
[4] https://lore.kernel.org/netdev/20231207164911.14330-1-marcin.szycik@linux.intel.com

v3: https://lore.kernel.org/intel-wired-lan/20230721071532.613888-1-marcin.szycik@linux.intel.com
v2: https://lore.kernel.org/intel-wired-lan/20230607112606.15899-1-marcin.szycik@linux.intel.com
v1: https://lore.kernel.org/intel-wired-lan/20230601131929.294667-1-marcin.szycik@linux.intel.com

patchwork-bot+netdevbpf@kernel.org April 1, 2024, 10 a.m. UTC | #1

Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Wed, 27 Mar 2024 16:23:37 +0100 you wrote:
> Add support for creating PFCP filters in switchdev mode. Add pfcp module
> that allows to create a PFCP-type netdev. The netdev then can be passed to
> tc when creating a filter to indicate that PFCP filter should be created.
> 
> To add a PFCP filter, a special netdev must be created and passed to tc
> command:
> 
> [...]

Here is the summary with links:
  - [net-next,v6,01/21] lib/bitmap: add bitmap_{read,write}()
    https://git.kernel.org/netdev/net-next/c/63c15822b8dd
  - [net-next,v6,02/21] lib/test_bitmap: add tests for bitmap_{read,write}()
    https://git.kernel.org/netdev/net-next/c/991e5583647d
  - [net-next,v6,03/21] lib/test_bitmap: use pr_info() for non-error messages
    https://git.kernel.org/netdev/net-next/c/f3e28876b6e0
  - [net-next,v6,04/21] bitops: add missing prototype check
    https://git.kernel.org/netdev/net-next/c/72cc1980a0ef
  - [net-next,v6,05/21] bitops: make BYTES_TO_BITS() treewide-available
    (no matching commit)
  - [net-next,v6,06/21] bitops: let the compiler optimize {__,}assign_bit()
    https://git.kernel.org/netdev/net-next/c/5259401ef8f4
  - [net-next,v6,07/21] linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros
    https://git.kernel.org/netdev/net-next/c/8fab6a9d72e4
  - [net-next,v6,08/21] s390/cio: rename bitmap_size() -> idset_bitmap_size()
    https://git.kernel.org/netdev/net-next/c/c1023f5634b9
  - [net-next,v6,09/21] fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64()
    (no matching commit)
  - [net-next,v6,10/21] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
    https://git.kernel.org/netdev/net-next/c/4ca532d64648
  - [net-next,v6,11/21] tools: move alignment-related macros to new <linux/align.h>
    https://git.kernel.org/netdev/net-next/c/10a04ff09bcc
  - [net-next,v6,12/21] bitmap: introduce generic optimized bitmap_size()
    (no matching commit)
  - [net-next,v6,13/21] bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()
    https://git.kernel.org/netdev/net-next/c/b44759705f7d
  - [net-next,v6,14/21] lib/bitmap: add compile-time test for __assign_bit() optimization
    https://git.kernel.org/netdev/net-next/c/7adaf37f7f10
  - [net-next,v6,15/21] ip_tunnel: use a separate struct to store tunnel params in the kernel
    (no matching commit)
  - [net-next,v6,16/21] ip_tunnel: convert __be16 tunnel flags to bitmaps
    (no matching commit)
  - [net-next,v6,17/21] net: net_test: add tests for IP tunnel flags conversion helpers
    https://git.kernel.org/netdev/net-next/c/5b2be2ab76d1
  - [net-next,v6,18/21] pfcp: add PFCP module
    (no matching commit)
  - [net-next,v6,19/21] pfcp: always set pfcp metadata
    (no matching commit)
  - [net-next,v6,20/21] ice: refactor ICE_TC_FLWR_FIELD_ENC_OPTS
    (no matching commit)
  - [net-next,v6,21/21] ice: Add support for PFCP hardware offload in switchdev
    (no matching commit)

You are awesome, thank you!

Harald Welte May 15, 2024, 11:55 a.m. UTC | #2

Daer Alexander, Wojciech,

forgive me for being late to the party, but I just saw the PFCP support
hitting Linus'' git repo in 1b294a1f35616977caddaddf3e9d28e576a1adbc
and was trying to figure out what it is all about.  Is there some kind
of article, kernel documentation or other explanation about it?

I have a prehistoric background in Linux kernel networking, and have
been spending much of the last two decades in creating open source
implemenmtations of 3GPP specifications.

So I'm very familiar with what PFCP is, and what it does, and how it is
used as a protocol by the 3GPP control plane to control the user/data
plane.

Conceptually it seems very odd to me to have something like *pfcp
net-devices*.  PFCP is just a control plane protocol, not a tunnel
mechanism.

From the Kconfig:

> +config PFCP
> +	tristate "Packet Forwarding Control Protocol (PFCP)"
> +	depends on INET
> +	select NET_UDP_TUNNEL
> +	help
> +	  This allows one to create PFCP virtual interfaces that allows to
> +	  set up software and hardware offload of PFCP packets.

I'm curious to understand why are *pfcp* packets hardware offloaded?
PFCP is just the control plane, similar to you can consider netlink the
control plane by which userspace programs control the data plane.

I can fully understand that GTP-U packets are offloaded to kernel space or
hardware, and that then some control plane mechanism like PFCP is needed
to control that data plane.  But offloading packets of that control
protocol?

I also see the following in the patch:

> +MODULE_DESCRIPTION("Interface driver for PFCP encapsulated traffic");

PFCP is not an encapsulation protocol for user plane traffic.  It is not
a tunneling protocol.  GTP-U is the tunneling protocol, whose
implementations (typically UPFs) are remote-controlled by PFCP.

> +	  Note that this module does not support PFCP protocol in the kernel space.
> +	  There is no support for parsing any PFCP messages.

If I may be frank, why do we introduce something called "pfcp" to the
kernel, if it doesn't actually implement any of the PFCP specification
3GPP TS 29.244 (which is specifying a very concrete protocol)?

Once again, I just try to understand what you're trying to do here. It's
very much within my professional field, but I somehow cannot align what
I see within this patch set with my existing world view of what PFCP is
and how it works.

If anyone else has a better grasp of the architecture of this kernel
PFCP support, or has any pointers, I'd be very happy to follow up
on that.

Thanks for your time,
	Harald

Michal Swiatkowski May 16, 2024, 10:44 a.m. UTC | #3

On Wed, May 15, 2024 at 01:55:42PM +0200, Harald Welte wrote:
> Daer Alexander, Wojciech,

Hi Harald,

> 
> forgive me for being late to the party, but I just saw the PFCP support
> hitting Linus'' git repo in 1b294a1f35616977caddaddf3e9d28e576a1adbc
> and was trying to figure out what it is all about.  Is there some kind
> of article, kernel documentation or other explanation about it?
> 
> I have a prehistoric background in Linux kernel networking, and have
> been spending much of the last two decades in creating open source
> implemenmtations of 3GPP specifications.
> 
> So I'm very familiar with what PFCP is, and what it does, and how it is
> used as a protocol by the 3GPP control plane to control the user/data
> plane.
> 
> Conceptually it seems very odd to me to have something like *pfcp
> net-devices*.  PFCP is just a control plane protocol, not a tunnel
> mechanism.
> 
> From the Kconfig:
> 
> > +config PFCP
> > +	tristate "Packet Forwarding Control Protocol (PFCP)"
> > +	depends on INET
> > +	select NET_UDP_TUNNEL
> > +	help
> > +	  This allows one to create PFCP virtual interfaces that allows to
> > +	  set up software and hardware offload of PFCP packets.
> 
> I'm curious to understand why are *pfcp* packets hardware offloaded?
> PFCP is just the control plane, similar to you can consider netlink the
> control plane by which userspace programs control the data plane.
> 
> I can fully understand that GTP-U packets are offloaded to kernel space or
> hardware, and that then some control plane mechanism like PFCP is needed
> to control that data plane.  But offloading packets of that control
> protocol?

It is hard for me to answer your concerns, because oposite to you, I
don't have any experience with telco implementations. We had client that
want to add offload rule for PFCP in the same way as for GTP. Intel
hardware support matching on specific PFCP packet parts. We spent some
time looking at possible implementations. As you said, it is a little
odd to follow the same scheme for GTP and PFCP, but it look for me like
reasonable solution.

The main idea is to allow user to match in tc flower on PFCP specific
parts. To do that we need PFCP device (which is like dummy device for
now, it isn't doing anything related to PFCP specification, only parsing).

Do you have better idea for that?

> 
> I also see the following in the patch:
> 
> > +MODULE_DESCRIPTION("Interface driver for PFCP encapsulated traffic");
> 
> PFCP is not an encapsulation protocol for user plane traffic.  It is not
> a tunneling protocol.  GTP-U is the tunneling protocol, whose
> implementations (typically UPFs) are remote-controlled by PFCP.
> 

Agree, it is done like that to simplify implementation and reuse kernel
software stack.

> > +	  Note that this module does not support PFCP protocol in the kernel space.
> > +	  There is no support for parsing any PFCP messages.
> 
> If I may be frank, why do we introduce something called "pfcp" to the
> kernel, if it doesn't actually implement any of the PFCP specification
> 3GPP TS 29.244 (which is specifying a very concrete protocol)?
> 
> Once again, I just try to understand what you're trying to do here. It's
> very much within my professional field, but I somehow cannot align what
> I see within this patch set with my existing world view of what PFCP is
> and how it works.
> 
> If anyone else has a better grasp of the architecture of this kernel
> PFCP support, or has any pointers, I'd be very happy to follow up
> on that.

This is the way to allow user to steer PFCP packet based on specific
opts (type and seid) using tc flower. If you have better solution for
that I will probably agree with you and will be willing to help you
with better implementation.

I assume the biggest problem here is with treating PFCP as a tunnel
(done for simplification and reuse) and lack of any functionality of
PFCP device (moving the PFCP specification implementation into kernel
probably isn't good idea and may never be accepted).

Offloading doesn't sound like problematic. If there is a user that want
to use that (to offload for better performance, or even to steer between
VFs based on PFCP specific parts) why not allow to do that?

In your opinion there should not be the pfcp device in kernel, or the
device should have more functionality (or any functionality, because now
it is only parsing)?

> 
> Thanks for your time,
> 	Harald
>

Thanks,
Michal

> -- 
> - Harald Welte <laforge@gnumonks.org>          https://laforge.gnumonks.org/
> ============================================================================
> "Privacy in residential applications is a desirable marketing option."
>                                                   (ETSI EN 300 175-7 Ch. A6)

Harald Welte May 16, 2024, 9:30 p.m. UTC | #4

Hi Michal,

thanks for your response.

On Thu, May 16, 2024 at 12:44:13PM +0200, Michal Swiatkowski wrote:

> > I'm curious to understand why are *pfcp* packets hardware offloaded?
> > PFCP is just the control plane, similar to you can consider netlink the
> > control plane by which userspace programs control the data plane.
> > 
> > I can fully understand that GTP-U packets are offloaded to kernel space or
> > hardware, and that then some control plane mechanism like PFCP is needed
> > to control that data plane.  But offloading packets of that control
> > protocol?
> 
> It is hard for me to answer your concerns, because oposite to you, I
> don't have any experience with telco implementations. We had client that
> want to add offload rule for PFCP in the same way as for GTP. 

I've meanwhile done some reading and it seems there are indeed some
papers suggesting that in specific implementations of control/data plane
splits, the transaction rate between control and data plane (to set up /
tear down / modify tunnels) is to low.  As a work-around, the entire
PFCP parsing is then off-loaded into (e.g. P4 capable) hardware.

So it seems at least there appears to be equipment where PFCP offloading
is useful to significantly incresae performance.

For those curious, https://faculty.iiitd.ac.in/~rinku/resources/slides/2022-sosr-accelupf-slides.pdf
seems to cover one such configuration where offloading processing the
control-plane protocol into the P4 hardware switch has massively
improved the previously poor PFCP processing rate.

> Intel
> hardware support matching on specific PFCP packet parts. We spent some
> time looking at possible implementations. As you said, it is a little
> odd to follow the same scheme for GTP and PFCP, but it look for me like
> reasonable solution.

Based on what I understand, I am not sure I would agree with the
"reasonable solution" part.  But then of course, it is a subjective
point of view.

I understand and appreciate the high-level goal of giving the user some
way to configure a specific feature of an intel NIC.

However, I really don't think that this should come at the expense of
introducing tunnel devices (or other net-devs) for things that are not
tunnels, and by calling things PFCP whcih are not an implementation of
PFCP.

Tou are introducing something called "pfcp" into the kernel,
whcih is not pfcp.  What if somebody else at some point wanted to
introduce some actual PFCP support in some form?  How should they call
their sub-systems / Kconfigs / etc?  They could no longer call it simply
"pfcp" as you already used this very generic term for (from the
perspective of PFCP) a specific niche use case of configuring a NIC to
handle all of PFCP.

> Do you have better idea for that?

I am not familiar with the use cases and the intel NICs and what kind of
tooling or third party software might be out there wanting to configure
it.  It's really not my domain of expertise and as such I have no
specific proposal, sorry.

It just seems mind-boggling to me that we would introduce
* a net-device for sometthing that's not a net-device
* a tunnel for something that does no tunneling whatsoeer
* code mentioning "PFCP encapsulated traffic" when in fact it is
  impossible to encapsulate any traffic in PFCP, and the code does
  not - to the best of my understanding - do any such encapsulation
  or even configure hardware to perform any such non-existant PFCP
  encapsulation

[and possibly more] *just* so that a user can use 'tc' to configure a
hardware offloading feature in their NIC.

IMHO, there must be a better way.

> > I also see the following in the patch:
> > 
> > > +MODULE_DESCRIPTION("Interface driver for PFCP encapsulated traffic");
> > 
> > PFCP is not an encapsulation protocol for user plane traffic.  It is not
> > a tunneling protocol.  GTP-U is the tunneling protocol, whose
> > implementations (typically UPFs) are remote-controlled by PFCP.
> > 
> 
> Agree, it is done like that to simplify implementation and reuse kernel
> software stack.

My comment was about the module description.  It claims something that
makes no sense and that it does not actually implement.  Unless I'm not
understanding what the code does, it is outright wrong and misleading.

Likewise is the comment on top of the drivers/net/pfcp.c file says:

> * PFCP according to 3GPP TS 29.244

While the code is in fact *not* implementing any part of TS 29.244.  The
code is using the udp_tunnel infrastructure for something that's not a
tunnel.  From what i can tell it is creating an in-kernel UDP socket
(whcih can be done without any relation to udp_tunnel) and it is
configuring hardware offload features in a NIC.  The fact that the
payload of those UDP packets may be PFCP is circumstancial - nothing in
the code actually implements even the tiniest bit of PFCP protocol
parsing.

> This is the way to allow user to steer PFCP packet based on specific
> opts (type and seid) using tc flower. If you have better solution for
> that I will probably agree with you and will be willing to help you
> with better implementation.

sadly, neither tc nor intel NIC hardware offload capabilities are my
domain of expertise, so I am unable to propose detials of a better
solution.

> I assume the biggest problem here is with treating PFCP as a tunnel
> (done for simplification and reuse) and lack of any functionality of
> PFCP device

I don't think the kernel should introduce a net-device (here
specifically a tunnel device) for something that is not a tunnel.  This
device [nor the hardware accelerator it controls] will never encapsulate
or decapsulate any PFCP packet, as PFCP is not a tunnel / encapsulation
protocol.

> (moving the PFCP specification implementation into kernel
> probably isn't good idea and may never be accepted).

I would agree, but that is a completely separate discussion.  Even if
one ignores this, the hypothetical kernel PFCP implementation would not
be a tunnel, and not be a net-device at all.  It would simply be some
kind of in-kernel UDP socket parsing and responding to packets, similar
to the in-kernel nfs daemon or whatever other in-kernel UDP users exist.

Also: The fact that you or we think an actual PFCP implementation would
not be accepted in the kernel should *not* be an argument in favor of
accepting something else into the kernel, call it PFCP and create tunnel
devices for things that are not tunnels :)

> Offloading doesn't sound like problematic. If there is a user that want
> to use that (to offload for better performance, or even to steer between
> VFs based on PFCP specific parts) why not allow to do that?

The papers I read convinced me that there is a use case.  I very much
believe the use case is a work-around for a different problem (the
inefficiency of the control->data plance protocol in this case), but
my opinion on that doesn't matter here.  I do agree with you that there
are apparently people who would want to make use of such a feature, and
that there is nothing wrong with provoding them means to do this.

However, the *how* is what I strongly object to.  Once again, I may miss
some part of your architecture, and I am happy to be proven otherwise.

But if all of this is *just* to configure hardware offloading in a nic,
I don't think there should be net-devices or tunnels that never
encap/decap a single packet or for protocols / use cases that clearly do
not encap or decap packets.

I also think this sets very bad precedent.  What about other prootocols
in the future?  Will we see new tunnels in the kernel for things that
are not tunnels at all, every time there is some new protocol that gains
hardware offloading capability in some NIC? Surely this kind of
proliferation is not the kind of software architecture we want to have?

Once again, I do apologize for raising my concerns at such a late stage.
I am not a kernel developer anymore these days, and I do not follow any
of the related mailing lists.  It was pure coincidence that the net-next
merge of some GTP improvements I was involved in specifying also
contained the PFCP code and I started digging what this was all about.

Regards,
	Harald

Michal Swiatkowski May 17, 2024, 2:01 p.m. UTC | #5

On Thu, May 16, 2024 at 11:30:00PM +0200, Harald Welte wrote:
> Hi Michal,
> 
> thanks for your response.
> 
> On Thu, May 16, 2024 at 12:44:13PM +0200, Michal Swiatkowski wrote:
> 
> > > I'm curious to understand why are *pfcp* packets hardware offloaded?
> > > PFCP is just the control plane, similar to you can consider netlink the
> > > control plane by which userspace programs control the data plane.
> > > 
> > > I can fully understand that GTP-U packets are offloaded to kernel space or
> > > hardware, and that then some control plane mechanism like PFCP is needed
> > > to control that data plane.  But offloading packets of that control
> > > protocol?
> > 
> > It is hard for me to answer your concerns, because oposite to you, I
> > don't have any experience with telco implementations. We had client that
> > want to add offload rule for PFCP in the same way as for GTP. 
> 
> I've meanwhile done some reading and it seems there are indeed some
> papers suggesting that in specific implementations of control/data plane
> splits, the transaction rate between control and data plane (to set up /
> tear down / modify tunnels) is to low.  As a work-around, the entire
> PFCP parsing is then off-loaded into (e.g. P4 capable) hardware.
> 
> So it seems at least there appears to be equipment where PFCP offloading
> is useful to significantly incresae performance.
> 
> For those curious, https://faculty.iiitd.ac.in/~rinku/resources/slides/2022-sosr-accelupf-slides.pdf
> seems to cover one such configuration where offloading processing the
> control-plane protocol into the P4 hardware switch has massively
> improved the previously poor PFCP processing rate.

Thanks for the interesting link.

> 
> > Intel
> > hardware support matching on specific PFCP packet parts. We spent some
> > time looking at possible implementations. As you said, it is a little
> > odd to follow the same scheme for GTP and PFCP, but it look for me like
> > reasonable solution.
> 
> Based on what I understand, I am not sure I would agree with the
> "reasonable solution" part.  But then of course, it is a subjective
> point of view.
> 
> I understand and appreciate the high-level goal of giving the user some
> way to configure a specific feature of an intel NIC.
> 
> However, I really don't think that this should come at the expense of
> introducing tunnel devices (or other net-devs) for things that are not
> tunnels, and by calling things PFCP whcih are not an implementation of
> PFCP.
> 
> Tou are introducing something called "pfcp" into the kernel,
> whcih is not pfcp.  What if somebody else at some point wanted to
> introduce some actual PFCP support in some form?  How should they call
> their sub-systems / Kconfigs / etc?  They could no longer call it simply
> "pfcp" as you already used this very generic term for (from the
> perspective of PFCP) a specific niche use case of configuring a NIC to
> handle all of PFCP.

What about changing the name and the description to better reflect what
this new device is for? I don't have any idea for good fitting name.

> 
> > Do you have better idea for that?
> 
> I am not familiar with the use cases and the intel NICs and what kind of
> tooling or third party software might be out there wanting to configure
> it.  It's really not my domain of expertise and as such I have no
> specific proposal, sorry.
> 
> It just seems mind-boggling to me that we would introduce
> * a net-device for sometthing that's not a net-device
> * a tunnel for something that does no tunneling whatsoeer
> * code mentioning "PFCP encapsulated traffic" when in fact it is
>   impossible to encapsulate any traffic in PFCP, and the code does
>   not - to the best of my understanding - do any such encapsulation
>   or even configure hardware to perform any such non-existant PFCP
>   encapsulation
> 
> [and possibly more] *just* so that a user can use 'tc' to configure a
> hardware offloading feature in their NIC.
> 
> IMHO, there must be a better way.

Yeah, let's say we didn't find the better way. I agree with you that it
isn't the best solution. The problem is that we couldn't have found the
better.

In short, if I remember correctly (because we sent first RFC more than
one year ago), to be sure that the packet is PFCP and has PFCP specific
fields to match, the UDP port needs to be checked. Currently any matching
for UDP port in flower kernel code is to match tunnel. Because of that
we treated it as a tunnel, only for simplification of this matching.

Why we need specific net-device is because of this simplification. All
tunnels have their net-devices, so in tc code the decision if specific
fields from the protocol can be matched is done based on net-device
type. If PFCP will have specific ip proto (as example) ther won't be any
problem, but as specific is only UDP port we end up in this messy
solution.

> 
> > > I also see the following in the patch:
> > > 
> > > > +MODULE_DESCRIPTION("Interface driver for PFCP encapsulated traffic");
> > > 
> > > PFCP is not an encapsulation protocol for user plane traffic.  It is not
> > > a tunneling protocol.  GTP-U is the tunneling protocol, whose
> > > implementations (typically UPFs) are remote-controlled by PFCP.
> > > 
> > 
> > Agree, it is done like that to simplify implementation and reuse kernel
> > software stack.
> 
> My comment was about the module description.  It claims something that
> makes no sense and that it does not actually implement.  Unless I'm not
> understanding what the code does, it is outright wrong and misleading.
> 
> Likewise is the comment on top of the drivers/net/pfcp.c file says:
> 
> > * PFCP according to 3GPP TS 29.244
> 
> While the code is in fact *not* implementing any part of TS 29.244.  The
> code is using the udp_tunnel infrastructure for something that's not a
> tunnel.  From what i can tell it is creating an in-kernel UDP socket
> (whcih can be done without any relation to udp_tunnel) and it is
> configuring hardware offload features in a NIC.  The fact that the
> payload of those UDP packets may be PFCP is circumstancial - nothing in
> the code actually implements even the tiniest bit of PFCP protocol
> parsing.
> 

Like I said it is only for tc to allow matching on the PFCP specific
fields.

> > This is the way to allow user to steer PFCP packet based on specific
> > opts (type and seid) using tc flower. If you have better solution for
> > that I will probably agree with you and will be willing to help you
> > with better implementation.
> 
> sadly, neither tc nor intel NIC hardware offload capabilities are my
> domain of expertise, so I am unable to propose detials of a better
> solution.
> 
> > I assume the biggest problem here is with treating PFCP as a tunnel
> > (done for simplification and reuse) and lack of any functionality of
> > PFCP device
> 
> I don't think the kernel should introduce a net-device (here
> specifically a tunnel device) for something that is not a tunnel.  This
> device [nor the hardware accelerator it controls] will never encapsulate
> or decapsulate any PFCP packet, as PFCP is not a tunnel / encapsulation
> protocol.
> 

Good point, I hope to treate PFCP as only one exception and even remove
it when we have better solution for allowing tc to match PFCP packets.

> > (moving the PFCP specification implementation into kernel
> > probably isn't good idea and may never be accepted).
> 
> I would agree, but that is a completely separate discussion.  Even if
> one ignores this, the hypothetical kernel PFCP implementation would not
> be a tunnel, and not be a net-device at all.  It would simply be some
> kind of in-kernel UDP socket parsing and responding to packets, similar
> to the in-kernel nfs daemon or whatever other in-kernel UDP users exist.
> 
> Also: The fact that you or we think an actual PFCP implementation would
> not be accepted in the kernel should *not* be an argument in favor of
> accepting something else into the kernel, call it PFCP and create tunnel
> devices for things that are not tunnels :)
> 
> > Offloading doesn't sound like problematic. If there is a user that want
> > to use that (to offload for better performance, or even to steer between
> > VFs based on PFCP specific parts) why not allow to do that?
> 
> The papers I read convinced me that there is a use case.  I very much
> believe the use case is a work-around for a different problem (the
> inefficiency of the control->data plance protocol in this case), but
> my opinion on that doesn't matter here.  I do agree with you that there
> are apparently people who would want to make use of such a feature, and
> that there is nothing wrong with provoding them means to do this.
> 
> However, the *how* is what I strongly object to.  Once again, I may miss
> some part of your architecture, and I am happy to be proven otherwise.
> 

It will be hard. As you said it is done "just to configure hardware
offloading". Can't it be an argument here?

> But if all of this is *just* to configure hardware offloading in a nic,
> I don't think there should be net-devices or tunnels that never
> encap/decap a single packet or for protocols / use cases that clearly do
> not encap or decap packets.
> 
> I also think this sets very bad precedent.  What about other prootocols
> in the future?  Will we see new tunnels in the kernel for things that
> are not tunnels at all, every time there is some new protocol that gains
> hardware offloading capability in some NIC? Surely this kind of
> proliferation is not the kind of software architecture we want to have?
> 
> Once again, I do apologize for raising my concerns at such a late stage.
> I am not a kernel developer anymore these days, and I do not follow any
> of the related mailing lists.  It was pure coincidence that the net-next
> merge of some GTP improvements I was involved in specifying also
> contained the PFCP code and I started digging what this was all about.
> 

No problem, sorry for no CCing you during all this revision of this
changes.

Probably I can't convince you. We agree that the solution with
additional netdev and treating PFCP as a tunnel is ugly solution. It is
done only to allow tc flower matching (in hardware with fallback to
software). Currently I don't have any idea for more suitable solution.

I willing to work on better idea if anybody have any.

> Regards,
> 	Harald
> 

Thanks,
Michal

> -- 
> - Harald Welte <laforge@gnumonks.org>          https://laforge.gnumonks.org/
> ============================================================================
> "Privacy in residential applications is a desirable marketing option."
>                                                   (ETSI EN 300 175-7 Ch. A6)

[net-next,v6,00/21] ice: add PFCP filter support

Message

Comments