mbox series

[nf-next,v3,0/3] Netfilter egress hook

Message ID cover.1598517739.git.lukas@wunner.de
Headers show
Series Netfilter egress hook | expand

Message

Lukas Wunner Aug. 27, 2020, 8:55 a.m. UTC
Introduce a netfilter egress hook to allow filtering outbound AF_PACKETs
such as DHCP and to prepare for in-kernel NAT64/NAT46.

An earlier version of this series was applied by Pablo Neira Ayuso back
in March and subsequently reverted by Daniel Borkmann over performance
concerns.  I've now reworked the series following a discussion between
Daniel and Florian Westphal:

https://lore.kernel.org/netdev/20200318123315.GI979@breakpoint.cc/

Briefly, traffic control and netfilter handling is moved out of the
__dev_queue_xmit() hotpath into a noinline function which is dynamically
patched in using a static_key.  In that function, each of tc and nft are
patched in with additional static_keys.

Thus, if neither tc nor nft is used, performance improves compared to
the status quo (see measurements in patch [3/3]).  However if tc is
used, performance degrades a little due to the "noinline", the additional
outer static key and the added netfilter code.  That's kind of a bummer.
If anyone has ideas how to mitigate this performance degradation, please
come forward.

To test the new netfilter egress hook, apply this nft patch to add rules
from user space:

https://lore.kernel.org/netfilter-devel/d6b6896fdd8408e4ddbd66ab524709e5cf82ea32.1583929080.git.lukas@wunner.de/

Thanks!

Lukas Wunner (3):
  netfilter: Rename ingress hook include file
  netfilter: Generalize ingress hook
  netfilter: Introduce egress hook

 include/linux/netdevice.h         |   8 +++
 include/linux/netfilter_ingress.h |  58 -----------------
 include/linux/netfilter_netdev.h  | 102 ++++++++++++++++++++++++++++++
 include/linux/rtnetlink.h         |   2 +-
 include/uapi/linux/netfilter.h    |   1 +
 net/core/dev.c                    |  56 +++++++++++++---
 net/netfilter/Kconfig             |   8 +++
 net/netfilter/core.c              |  24 +++++--
 net/netfilter/nft_chain_filter.c  |   4 +-
 net/sched/Kconfig                 |   3 +
 10 files changed, 194 insertions(+), 72 deletions(-)
 delete mode 100644 include/linux/netfilter_ingress.h
 create mode 100644 include/linux/netfilter_netdev.h

Comments

nevola Aug. 27, 2020, 10:36 a.m. UTC | #1
Hi Lukas, thank you for your patches.

On Thu, Aug 27, 2020 at 10:55 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> Introduce a netfilter egress hook to allow filtering outbound AF_PACKETs
> such as DHCP and to prepare for in-kernel NAT64/NAT46.
>

Actually, we've found 2 additional use cases in container-based nodes
that use the egress hook:

1. intra-node DSR load balancing connectivity
2. container-based outbound security policies

We've been using your previous patch in an experimental project and
it's working fine.

Great job!
Daniel Borkmann Aug. 28, 2020, 7:14 a.m. UTC | #2
Hi Lukas,

On 8/27/20 10:55 AM, Lukas Wunner wrote:
> Introduce a netfilter egress hook to allow filtering outbound AF_PACKETs
> such as DHCP and to prepare for in-kernel NAT64/NAT46.

Thinking more about this, how will this allow to sufficiently filter AF_PACKET?
It won't. Any AF_PACKET application can freely set PACKET_QDISC_BYPASS without
additional privileges and then dev_queue_xmit() is being bypassed in the host ns.
This is therefore ineffective and not sufficient. (From container side these can
be caught w/ host veth on ingress, but not in host ns, of course, so hook won't
be invoked.)

Thanks,
Daniel
Eric Dumazet Aug. 28, 2020, 9:14 a.m. UTC | #3
On 8/28/20 12:14 AM, Daniel Borkmann wrote:
> Hi Lukas,
> 
> On 8/27/20 10:55 AM, Lukas Wunner wrote:
>> Introduce a netfilter egress hook to allow filtering outbound AF_PACKETs
>> such as DHCP and to prepare for in-kernel NAT64/NAT46.
> 
> Thinking more about this, how will this allow to sufficiently filter AF_PACKET?
> It won't. Any AF_PACKET application can freely set PACKET_QDISC_BYPASS without
> additional privileges and then dev_queue_xmit() is being bypassed in the host ns.
> This is therefore ineffective and not sufficient. (From container side these can
> be caught w/ host veth on ingress, but not in host ns, of course, so hook won't
> be invoked.)


Presumably dev_direct_xmit() could be augmented to support the hook.

dev_direct_xmit() (packet_direct_xmit()) was introduced to bypass qdisc,
not to bypass everything.