[nf-next,RFC,0/5] Netfilter egress hook

Message ID	cover.1572528496.git.lukas@wunner.de
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Message-Id: <cover.1572528496.git.lukas@wunner.de> From: Lukas Wunner <lukas@wunner.de> Date: Thu, 31 Oct 2019 14:41:00 +0100 Subject: [PATCH nf-next,RFC 0/5] Netfilter egress hook To: "Pablo Neira Ayuso" <pablo@netfilter.org>, Jozsef Kadlecsik <kadlec@netfilter.org>, Florian Westphal <fw@strlen.de> Cc: netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, Martin Mares <mj@ucw.cz>, Daniel Borkmann <daniel@iogearbox.net> Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	Netfilter egress hook \| expand [nf-next,RFC,0/5] Netfilter egress hook [nf-next,RFC,1/5] netfilter: Clean up unnecessary #ifdef [nf-next,RFC,2/5] netfilter: Document ingress hook [nf-next,RFC,3/5] netfilter: Rename ingress hook include file [nf-next,RFC,4/5] netfilter: Generalize ingress hook [nf-next,RFC,5/5] netfilter: Introduce egress hook

Message ID

cover.1572528496.git.lukas@wunner.de

Headers

Message-Id: <cover.1572528496.git.lukas@wunner.de>
From: Lukas Wunner <lukas@wunner.de>
Date: Thu, 31 Oct 2019 14:41:00 +0100
Subject: [PATCH nf-next,RFC 0/5] Netfilter egress hook
To: "Pablo Neira Ayuso" <pablo@netfilter.org>,
	Jozsef Kadlecsik <kadlec@netfilter.org>, Florian Westphal <fw@strlen.de>
Cc: netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	netdev@vger.kernel.org, Martin Mares <mj@ucw.cz>,
	Daniel Borkmann <daniel@iogearbox.net>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Series

Netfilter egress hook | expand

Message

Lukas Wunner Oct. 31, 2019, 1:41 p.m. UTC

Introduce a netfilter egress hook to complement the existing ingress hook.

User space support for nft is submitted in a separate patch.

The need for this arose because I had to filter egress packets which do
not match a specific ethertype.  The most common solution appears to be
to enslave the interface to a bridge and use ebtables, but that's
cumbersome to configure and comes with a (small) performance penalty.
An alternative approach is tc, but that doesn't afford equivalent
matching options as netfilter.  A bit of googling reveals that more
people have expressed a desire for egress filtering in the past:

https://www.spinics.net/lists/netfilter/msg50038.html
https://unix.stackexchange.com/questions/512371

I am first performing traffic control with sch_handle_egress() before
performing filtering with nf_egress().  That order is identical to
ingress processing.  I'm wondering whether an inverse order would be
more logical or more beneficial.  Among other things it would allow
marking packets with netfilter on egress before performing traffic
control based on that mark.  Thoughts?

Lukas Wunner (5):
  netfilter: Clean up unnecessary #ifdef
  netfilter: Document ingress hook
  netfilter: Rename ingress hook include file
  netfilter: Generalize ingress hook
  netfilter: Introduce egress hook

 include/linux/netdevice.h         |   5 ++
 include/linux/netfilter_ingress.h |  58 -----------------
 include/linux/netfilter_netdev.h  | 102 ++++++++++++++++++++++++++++++
 include/uapi/linux/netfilter.h    |   1 +
 net/core/dev.c                    |  31 ++++++---
 net/netfilter/Kconfig             |   8 +++
 net/netfilter/core.c              |  24 +++++--
 net/netfilter/nft_chain_filter.c  |   4 +-
 8 files changed, 161 insertions(+), 72 deletions(-)
 delete mode 100644 include/linux/netfilter_ingress.h
 create mode 100644 include/linux/netfilter_netdev.h

Comments

Pablo Neira Ayuso Nov. 7, 2019, 10:51 p.m. UTC | #1

Hi,

On Thu, Oct 31, 2019 at 02:41:00PM +0100, Lukas Wunner wrote:
> Introduce a netfilter egress hook to complement the existing ingress hook.
> 
> User space support for nft is submitted in a separate patch.
> 
> The need for this arose because I had to filter egress packets which do
> not match a specific ethertype.  The most common solution appears to be
> to enslave the interface to a bridge and use ebtables, but that's
> cumbersome to configure and comes with a (small) performance penalty.
> An alternative approach is tc, but that doesn't afford equivalent
> matching options as netfilter.  A bit of googling reveals that more
> people have expressed a desire for egress filtering in the past:
> 
> https://www.spinics.net/lists/netfilter/msg50038.html
> https://unix.stackexchange.com/questions/512371
> 
> I am first performing traffic control with sch_handle_egress() before
> performing filtering with nf_egress().  That order is identical to
> ingress processing.  I'm wondering whether an inverse order would be
> more logical or more beneficial.  Among other things it would allow
> marking packets with netfilter on egress before performing traffic
> control based on that mark.  Thoughts?

Would you provide some numbers on the performance impact for this new
hook?

Thanks.

Lukas Wunner Nov. 23, 2019, 1:11 p.m. UTC | #2

On Thu, Nov 07, 2019 at 11:51:49PM +0100, Pablo Neira Ayuso wrote:
> On Thu, Oct 31, 2019 at 02:41:00PM +0100, Lukas Wunner wrote:
> > Introduce a netfilter egress hook to complement the existing ingress hook.
> 
> Would you provide some numbers on the performance impact for this new
> hook?

For some reason the numbers are slightly better with this series.

Could be caused by the __always_inline in patch 4, I'd have to compare
the disassembly to confirm this hunch.


* Without this patch:
  Result: OK: 34205373(c34202809+d2564) usec, 100000000 (60byte,0frags)
  2923517pps 1403Mb/sec (1403288160bps) errors: 0

* With this patch:
  Result: OK: 34106013(c34103172+d2841) usec, 100000000 (60byte,0frags)
  2932034pps 1407Mb/sec (1407376320bps) errors: 0


* Without this patch + tc egress:
  Result: OK: 37848652(c37846140+d2511) usec, 100000000 (60byte,0frags)
  2642102pps 1268Mb/sec (1268208960bps) errors: 0

* With this patch + tc egress:
  Result: OK: 37784817(c37782026+d2791) usec, 100000000 (60byte,0frags)
  2646565pps 1270Mb/sec (1270351200bps) errors: 0


* With this patch + nft egress:
  Result: OK: 43911936(c43908932+d3003) usec, 100000000 (60byte,0frags)
  2277285pps 1093Mb/sec (1093096800bps) errors: 0


Tested on a bare-metal Core i7-3615QM, each measurement was performed
twice to verify that the numbers are stable.

Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000

Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32

Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop

All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.
This required the following small patch:


diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index bb99152..020c825 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2003,8 +2003,8 @@ static int pktgen_setup_dev(const struct pktgen_net *pn,
 		return -ENODEV;
 	}
 
-	if (odev->type != ARPHRD_ETHER) {
-		pr_err("not an ethernet device: \"%s\"\n", ifname);
+	if (odev->type != ARPHRD_ETHER && odev->type != ARPHRD_LOOPBACK) {
+		pr_err("not an ethernet or loopback device: \"%s\"\n", ifname);
 		err = -EINVAL;
 	} else if (!netif_running(odev)) {
 		pr_err("device is down: \"%s\"\n", ifname);

Lukas Wunner March 4, 2020, 9:50 a.m. UTC | #3

On Thu, Nov 07, 2019 at 11:51:49PM +0100, Pablo Neira Ayuso wrote:
> On Thu, Oct 31, 2019 at 02:41:00PM +0100, Lukas Wunner wrote:
> > Introduce a netfilter egress hook to complement the existing ingress hook.
> > 
> > User space support for nft is submitted in a separate patch.
> > 
> > The need for this arose because I had to filter egress packets which do
> > not match a specific ethertype.  The most common solution appears to be
> > to enslave the interface to a bridge and use ebtables, but that's
> > cumbersome to configure and comes with a (small) performance penalty.
> > An alternative approach is tc, but that doesn't afford equivalent
> > matching options as netfilter.  A bit of googling reveals that more
> > people have expressed a desire for egress filtering in the past:
> > 
> > https://www.spinics.net/lists/netfilter/msg50038.html
> > https://unix.stackexchange.com/questions/512371
> > 
> > I am first performing traffic control with sch_handle_egress() before
> > performing filtering with nf_egress().  That order is identical to
> > ingress processing.  I'm wondering whether an inverse order would be
> > more logical or more beneficial.  Among other things it would allow
> > marking packets with netfilter on egress before performing traffic
> > control based on that mark.  Thoughts?
> 
> Would you provide some numbers on the performance impact for this new
> hook?

Just a gentle ping for this series.  I'd still be very interested to
get it upstream.  When posting the above-quoted RFC, Daniel NAK'ed it
saying that "weak justification" was provided "for something that sits
in critical fastpath".

However I followed up with numbers showing that the series actually
results in a speedup rather than a slowdown if the feature isn't used:

https://lore.kernel.org/netfilter-devel/20191123131108.dlnrbutabh5i55ix@wunner.de/

So what's the consensus?  Shall I post a non-RFC version, rebased on
current nf-next/master?

Thanks!

Lukas

Pablo Neira Ayuso March 4, 2020, 12:31 p.m. UTC | #4

On Wed, Mar 04, 2020 at 10:50:32AM +0100, Lukas Wunner wrote:
[...]
> So what's the consensus?  Shall I post a non-RFC version, rebased on
> current nf-next/master?

Please, move on and rebase on top of nf-next/master.

Thank you.