diff mbox series

[nft,1/4] doc: add documentation about list hooks feature

Message ID 20240726015837.14572-2-fw@strlen.de
State Changes Requested
Headers show
Series list hooks refactoring | expand

Commit Message

Florian Westphal July 26, 2024, 1:58 a.m. UTC
Add a brief segment about 'nft list hooks' and a summary
of the output format.

As nft.txt is quite large, split the additonal commands
into their own file.

The existing listing section is removed; list subcommand is
already mentioned in the relevant statement sections.

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Makefile.am                 |   1 +
 doc/additional-commands.txt | 115 ++++++++++++++++++++++++++++++++++++
 doc/nft.txt                 |  63 +-------------------
 3 files changed, 117 insertions(+), 62 deletions(-)
 create mode 100644 doc/additional-commands.txt

Comments

Pablo Neira Ayuso July 26, 2024, 9 a.m. UTC | #1
On Fri, Jul 26, 2024 at 03:58:28AM +0200, Florian Westphal wrote:
> Add a brief segment about 'nft list hooks' and a summary
> of the output format.
> 
> As nft.txt is quite large, split the additonal commands
> into their own file.
> 
> The existing listing section is removed; list subcommand is
> already mentioned in the relevant statement sections.
> 
> Reported-by: Phil Sutter <phil@nwl.cc>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  Makefile.am                 |   1 +
>  doc/additional-commands.txt | 115 ++++++++++++++++++++++++++++++++++++
>  doc/nft.txt                 |  63 +-------------------
>  3 files changed, 117 insertions(+), 62 deletions(-)
>  create mode 100644 doc/additional-commands.txt
> 
> diff --git a/Makefile.am b/Makefile.am
> index 9088170bfc68..ef198dafcbc8 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -322,6 +322,7 @@ A2X_OPTS_MANPAGE = \
>  ASCIIDOC_MAIN = doc/nft.txt
>  
>  ASCIIDOC_INCLUDES = \
> +	doc/additional-commands.txt \
>  	doc/data-types.txt \
>  	doc/payload-expression.txt \
>  	doc/primary-expression.txt \
> diff --git a/doc/additional-commands.txt b/doc/additional-commands.txt
> new file mode 100644
> index 000000000000..dd1b3d2d87d4
> --- /dev/null
> +++ b/doc/additional-commands.txt
> @@ -0,0 +1,115 @@
> +LIST HOOKS
> +~~~~~~~~~~
> +
> +This shows the low-level netfilter processing pipeline, including
> +functions registered by kernel modules such as nf_conntrack. +
> +
> +[verse]
> +____
> +*list hooks* ['family']
> +*list hooks netdev device* 'DEVICE_NAME'
> +____
> +
> +*list hooks* is enough to display everything that is active
> +on the system, however, it does currently omit hooks that are
> +tied to a specific network device (netdev family). To obtain
> +those, the network device needs to be queried by name.

IIRC, the idea is to display the ingress path pipeline according to
the device (if specified)

        list hooks netdev eth0

as for egress, as it is not possible to know where the packet is
going, it is probably good to allow the user to specify the output
device, so it gets the entire pipeline for ingress and egress
paths, ie.

list hooks netdev eth0 eth1

Note that this is not implemented. This has limitations, discovering
eth{0,1} belongs to bridge device would need more work (not asking to
do this now, but it could be a nice usability feature to discover the
pipeline?).
Florian Westphal July 26, 2024, 12:31 p.m. UTC | #2
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > +*list hooks* is enough to display everything that is active
> > +on the system, however, it does currently omit hooks that are
> > +tied to a specific network device (netdev family). To obtain
> > +those, the network device needs to be queried by name.
> 
> IIRC, the idea is to display the ingress path pipeline according to
> the device (if specified)
> 
>         list hooks netdev eth0
> 
> as for egress, as it is not possible to know where the packet is
> going, it is probably good to allow the user to specify the output
> device, so it gets the entire pipeline for ingress and egress
> paths, ie.
> 
> list hooks netdev eth0 eth1

Not really, why would eth0 and eth1 be related here?

What would make more sense to me is to allow

list hooks netdev

and then have nft fetch list of all network devices and then query them
all.

If a packet coming in on devX will be forwarded to devY depends on the
type of packet and the configuration, e.g. arp/ip vs. bridge/routing
or even encapsulation...

> Note that this is not implemented. This has limitations, discovering
> eth{0,1} belongs to bridge device would need more work (not asking to
> do this now, but it could be a nice usability feature to discover the
> pipeline?).

Bridge?  I don't think we have bridge family support for netdev hooks?
AFAIU its only netdev and inet.

This thing should only list the nf hooks registered for the device,
and not start to guess.  So for "list hooks br0", return ingress and
egress hooks for the virtual device, not the bridge ports.
Pablo Neira Ayuso July 28, 2024, 11:19 p.m. UTC | #3
Hi Florian,

On Fri, Jul 26, 2024 at 02:31:52PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > +*list hooks* is enough to display everything that is active
> > > +on the system, however, it does currently omit hooks that are
> > > +tied to a specific network device (netdev family). To obtain
> > > +those, the network device needs to be queried by name.
> > 
> > IIRC, the idea is to display the ingress path pipeline according to
> > the device (if specified)
> > 
> >         list hooks netdev eth0
> > 
> > as for egress, as it is not possible to know where the packet is
> > going, it is probably good to allow the user to specify the output
> > device, so it gets the entire pipeline for ingress and egress
> > paths, ie.
> > 
> > list hooks netdev eth0 eth1
> 
> Not really, why would eth0 and eth1 be related here?

Note that you can specify:

  list hooks ip device enp0s25

this shows the hooks that will be exercised for a given packet family,
ie. IPv4 packets will exercise the following hooks.

family ip {
        hook ingress {
                 0000000000 chain netdev x y [nf_tables]
        }
        hook prerouting {
                -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
                -0000000200 ipv4_conntrack_in [nf_conntrack]
        }
        hook input {
                 0000000000 chain ip filter in [nf_tables]
                +2147483647 nf_confirm [nf_conntrack]
        }
        hook forward {
                -0000000225 selinux_ip_forward
        }
        hook output {
                -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
                -0000000225 selinux_ip_output
                -0000000200 ipv4_conntrack_local [nf_conntrack]
        }
        hook postrouting {
                +0000000225 selinux_ip_postroute
                +2147483647 nf_confirm [nf_conntrack]
        }
}

This is _not_ showing the list of hooks for a given family.

What I meant is that user could filter out by ingress and egress
device to fetch the hooks that are traversed in such case, ie.

  list hooks ip iifname eth0 oifname eth1

to get the traversal of hooks for IPv4 packets, assuming eth0 as
ingress device and eth1 as egress device.

> What would make more sense to me is to allow
> 
> list hooks netdev
>
> and then have nft fetch list of all network devices and then query them
> all.

Makes sense, it currently fails with EINVAL because no device has been
specified.

> If a packet coming in on devX will be forwarded to devY depends on the
> type of packet and the configuration, e.g. arp/ip vs. bridge/routing
> or even encapsulation...
> 
> > Note that this is not implemented. This has limitations, discovering
> > eth{0,1} belongs to bridge device would need more work (not asking to
> > do this now, but it could be a nice usability feature to discover the
> > pipeline?).
> 
> Bridge?  I don't think we have bridge family support for netdev hooks?
> AFAIU its only netdev and inet.
>
> This thing should only list the nf hooks registered for the device,
> and not start to guess.  So for "list hooks br0", return ingress and
> egress hooks for the virtual device, not the bridge ports.

OK
Florian Westphal July 28, 2024, 11:37 p.m. UTC | #4
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Not really, why would eth0 and eth1 be related here?
> 
> Note that you can specify:
> 
>   list hooks ip device enp0s25
> 
> this shows the hooks that will be exercised for a given packet family,
> ie. IPv4 packets will exercise the following hooks.
> 
> family ip {
>         hook ingress {
>                  0000000000 chain netdev x y [nf_tables]
>         }
>         hook prerouting {
>                 -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
>                 -0000000200 ipv4_conntrack_in [nf_conntrack]
>         }
>         hook input {
>                  0000000000 chain ip filter in [nf_tables]
>                 +2147483647 nf_confirm [nf_conntrack]
>         }
>         hook forward {
>                 -0000000225 selinux_ip_forward
>         }
>         hook output {
>                 -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
>                 -0000000225 selinux_ip_output
>                 -0000000200 ipv4_conntrack_local [nf_conntrack]
>         }
>         hook postrouting {
>                 +0000000225 selinux_ip_postroute
>                 +2147483647 nf_confirm [nf_conntrack]
>         }
> }
> 
> This is _not_ showing the list of hooks for a given family.

I now realize that whats in the tree today is not what I wrote originally.
So this is neither showing the hooks that will be execrised (packet
can't be input and forward...).  But ok.  I don't know what to do now.

> What I meant is that user could filter out by ingress and egress
> device to fetch the hooks that are traversed in such case, ie.
> 
>   list hooks ip iifname eth0 oifname eth1
> 
> to get the traversal of hooks for IPv4 packets, assuming eth0 as
> ingress device and eth1 as egress device.

No idea how to make this, or I fail to understand.

> > What would make more sense to me is to allow
> > 
> > list hooks netdev
> >
> > and then have nft fetch list of all network devices and then query them
> > all.
> 
> Makes sense, it currently fails with EINVAL because no device has been
> specified.

I'll try to add it, but I don't know if I should toss these patches
first or not :/
Pablo Neira Ayuso July 29, 2024, 12:21 a.m. UTC | #5
On Mon, Jul 29, 2024 at 01:37:36AM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Not really, why would eth0 and eth1 be related here?
> > 
> > Note that you can specify:
> > 
> >   list hooks ip device enp0s25
> > 
> > this shows the hooks that will be exercised for a given packet family,
> > ie. IPv4 packets will exercise the following hooks.
> > 
> > family ip {
> >         hook ingress {
> >                  0000000000 chain netdev x y [nf_tables]
> >         }
> >         hook prerouting {
> >                 -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
> >                 -0000000200 ipv4_conntrack_in [nf_conntrack]
> >         }
> >         hook input {
> >                  0000000000 chain ip filter in [nf_tables]
> >                 +2147483647 nf_confirm [nf_conntrack]
> >         }
> >         hook forward {
> >                 -0000000225 selinux_ip_forward
> >         }
> >         hook output {
> >                 -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
> >                 -0000000225 selinux_ip_output
> >                 -0000000200 ipv4_conntrack_local [nf_conntrack]
> >         }
> >         hook postrouting {
> >                 +0000000225 selinux_ip_postroute
> >                 +2147483647 nf_confirm [nf_conntrack]
> >         }
> > }
> > 
> > This is _not_ showing the list of hooks for a given family.
> 
> I now realize that whats in the tree today is not what I wrote originally.

We agreed to change it.

> So this is neither showing the hooks that will be execrised (packet
> can't be input and forward...).  But ok.  I don't know what to do now.

As it is not possible to know where the packet is going (input /
forward), this listing represents what hooks can be potentially
exercised for such packet family, hence, netdev hooks are included in
the ip family view.

If you don't like this behaviour and prefer a strict view per hook
family it should be possible to revisit. User will have to get all the
pieces together to understand what the hook pipeline is instead. This
command has not been documented so far.

> > What I meant is that user could filter out by ingress and egress
> > device to fetch the hooks that are traversed in such case, ie.
> > 
> >   list hooks ip iifname eth0 oifname eth1
> > 
> > to get the traversal of hooks for IPv4 packets, assuming eth0 as
> > ingress device and eth1 as egress device.
> 
> No idea how to make this, or I fail to understand.

Not important for this series, this can be extended later.

> > > What would make more sense to me is to allow
> > > 
> > > list hooks netdev
> > >
> > > and then have nft fetch list of all network devices and then query them
> > > all.
> > 
> > Makes sense, it currently fails with EINVAL because no device has been
> > specified.
> 
> I'll try to add it, but I don't know if I should toss these patches
> first or not :/

I think patchset looks fine, but documentation update needs a revamp
if you agree in the existing behaviour.
Florian Westphal July 29, 2024, 3:32 p.m. UTC | #6
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Jul 29, 2024 at 01:37:36AM +0200, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > Not really, why would eth0 and eth1 be related here?
> > > 
> > > Note that you can specify:
> > > 
> > >   list hooks ip device enp0s25
> > > 
> > > this shows the hooks that will be exercised for a given packet family,
> > > ie. IPv4 packets will exercise the following hooks.
> > > 
> > > family ip {
> > >         hook ingress {
> > >                  0000000000 chain netdev x y [nf_tables]
> > >         }
> > >         hook prerouting {
> > >                 -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
> > >                 -0000000200 ipv4_conntrack_in [nf_conntrack]
> > >         }
> > >         hook input {
> > >                  0000000000 chain ip filter in [nf_tables]
> > >                 +2147483647 nf_confirm [nf_conntrack]
> > >         }
> > >         hook forward {
> > >                 -0000000225 selinux_ip_forward
> > >         }
> > >         hook output {
> > >                 -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
> > >                 -0000000225 selinux_ip_output
> > >                 -0000000200 ipv4_conntrack_local [nf_conntrack]
> > >         }
> > >         hook postrouting {
> > >                 +0000000225 selinux_ip_postroute
> > >                 +2147483647 nf_confirm [nf_conntrack]
> > >         }
> > > }
> > > 
> > > This is _not_ showing the list of hooks for a given family.
> > 
> > I now realize that whats in the tree today is not what I wrote originally.
> 
> We agreed to change it.

Sorry.  I'm old and do not remember.
TL;DR: I think we should revert to strict behaviour, i.e.
nft list hooks foo

queries hooks registered as NFPROTO_FOO.

NFPROTO_INET expands to shorthand for 'list hooks ip and ip6'.

AFAICS the problem is that ip, ip6 and arp are both NFPROTO_ values
(protocol families), but also l3 protocols, whereas netdev and bridge
are only families.

Hence, what to do on bridge becomes murky, there is just a large number
of possible l3 protocols that can pass through these hooks.

Netdev is simple because its scoped only to one single interface.

Sorry for the wall of text below, but maybe it helps to understand
why i think the above (no-guesses, treat strictly as nfproto) makes sense.

> > So this is neither showing the hooks that will be execrised (packet
> > can't be input and forward...).  But ok.  I don't know what to do now.
> 
> As it is not possible to know where the packet is going (input /
> forward), this listing represents what hooks can be potentially
> exercised for such packet family, hence, netdev hooks are included in
> the ip family view.

I was confused because it lists netdev hooks when asking for
bridge, and when I made this facility my intent was for this to
return hooks that were registered with NFPROTO_$QUERY.

So i did not expect to see netdev hooks.

list hooks arp device eth0
my expectation:
1. no output OR
2. NFPROTO_ARP in and output hooks OR
3. error, e.g. "device not supported for arp family".

But I get this instead:
  family arp {
          hook ingress {
                   0000000000 chain netdev ingress in_public [nf_tables]
          }
  }

Which I considered a bug but if we say that the query should return
hooks that this protocol family is exposed to, then that might be ok.

Should it list bridge hooks too?  Same rationale would apply as to
why it shows the netdev family ingress hook:
ARP packets will be picked up by bridge hooks if they arrive on
interfaces that serve as a bridge port.

I think that original code was simpler, less guesswork on
userspace side, what-you-ask-for-is-what-you-get, i.e. for
"list hooks arp" shows only hooks registered with NFPROTO_ARP on
kernel side, and nothing else.

For "list hooks arp device eth0", we could either return an error,
or pass this to kernel in case we gain arp+device at some point,
even though i don't see why anyone would add that.

inet ingress is already awkward in my opinion, I'm not sure why it
got added.

> If you don't like this behaviour and prefer a strict view per hook
> family it should be possible to revisit. User will have to get all the
> pieces together to understand what the hook pipeline is instead. This
> command has not been documented so far.

Yes.  In theory it should be possible to do this, so to go with arp example:

  nft list hooks arp device foo

would list:

1. netdev ingress and egress hook for the queried device
2. arp in and output hooks (there is no forward for arp)
3. bridge pre,in,forward,output and postrouting

but does that make sense?  I don't think so, because it gets more
complicated for bridge query:

nft list hooks bridge

1. can't show netdev, we lack device to query -> query all interfaces?
   But why would one clutter output with netdev in/egress when bridge
   was asked for?
2. show pre/in/fwd/out/postrouting NFPROTO_BRIDGE hooks
3. should it show ip/ip6 hooks? They could be relevant in case
   of broute expression in nftables.
   ip/ip6 Output+postrouting are travesed in case of local packets.

and it would have to show arp hooks, no?  for arp packet, we might pass
them up to local stack.  Local arp query pass through arp:output.

So I'm just worried that this gets complicated/murky as to what is shown
(and what is omitted).  For bridge, we would have to end up dumping
everything and treat it like 'list hooks'....

I do admit that it would be nice to have something like:

nft list pipeline meta input eth0 ip saddr 1.2.3.4 ip daddr 5.6.7.8

and then have nft:
1. list NFPROTO_NETDEV for eth0 ingress only (no egress)
2. list NFPROTO_INET hooks for ingress eth0
3. list NFPROTO_IPV4 hooks for prerouting
4. query FIB to get output device for 1.2.3.4->5.6.7.8
5. list NFPROTO_IPV4 for EITHER input or forward+postrouting
6. for forward case, list NFPROTO_NETDEV egress hooks for the
fib-derviced output device

But thats hard, because there might be rules in place that
alter ip daddr or ip saddr, or packet mark etc etc so the
pipeline shown can be a wrong.

And bridge needs extra work, we need to figure out if eth0
is a bridge port and then dump NFPROTO_BRIDGE.
And query l2 address of the ip address to query bridge fib to
figure out of this enters local delivery or forwarding case, or
both...

Or, require use of 'ether daddr' for bridge.  But still, its a lot
of work to make this.

> I think patchset looks fine, but documentation update needs a revamp
> if you agree in the existing behaviour.

I agree that we first need to figure out what 'nft list hooks xxxx'
should do.  I would prefer 'no guesses' approach, i.e.

1. nft list hooks
  dump everything EXCEPT netdev families/devices
2. nft list hooks netdev device foo
   dump ingress/egress netdev hooks,
   INCLUDING inet ingress (its scoped to the device).
3. nft list hooks arp
   dump arp input and output, if any
4. nft list hooks bridge
   dump bridge pre/input/forward/out/postrouting
5. nft list hooks ip
   dump ip pre/input/forward/out/postrouting
6. nft list hooks ip6 -> see above
7. nft list hooks inet -> alias for dumping both ip+ip6 hooks.

This also means that i'd make 'inet device foo' return a warning
that the device will be ignored because "inet ingress" is syntactic
frontend sugar similar to inet pseudo-family.

We could try the 'detect pipeline' later.  But, as per example
above, I don't think its easy, unless one omits all packet rewrites
(stateless nat, dnat, redirect) and everything else that causes
re-routing, e.g. policy routing, tproxy, tc(x), etc.

And then there is l3 and l2 multicasting...

But, admittingly, it might be nice to have something and it might be
good enough if pipeline alterations by ruleset and other entities
are omitted.
Pablo Neira Ayuso July 30, 2024, 11:34 p.m. UTC | #7
On Mon, Jul 29, 2024 at 05:32:11PM +0200, Florian Westphal wrote:
[...]
> TL;DR: I think we should revert to strict behaviour, i.e.
> nft list hooks foo

Agreed.

> queries hooks registered as NFPROTO_FOO.
> 
> NFPROTO_INET expands to shorthand for 'list hooks ip and ip6'.
> 
> AFAICS the problem is that ip, ip6 and arp are both NFPROTO_ values
> (protocol families), but also l3 protocols, whereas netdev and bridge
> are only families.
> 
> Hence, what to do on bridge becomes murky, there is just a large number
> of possible l3 protocols that can pass through these hooks.
> 
> Netdev is simple because its scoped only to one single interface.
> 
> Sorry for the wall of text below, but maybe it helps to understand
> why i think the above (no-guesses, treat strictly as nfproto) makes sense.

Makes sense, thanks for explaining.

[...]
> > If you don't like this behaviour and prefer a strict view per hook
> > family it should be possible to revisit. User will have to get all the
> > pieces together to understand what the hook pipeline is instead. This
> > command has not been documented so far.
> 
> Yes.  In theory it should be possible to do this, so to go with arp example:
> 
>   nft list hooks arp device foo
> 
> would list:
> 
> 1. netdev ingress and egress hook for the queried device
> 2. arp in and output hooks (there is no forward for arp)
> 3. bridge pre,in,forward,output and postrouting
> 
> but does that make sense?  I don't think so, because it gets more
> complicated for bridge query:
> 
> nft list hooks bridge
> 
> 1. can't show netdev, we lack device to query -> query all interfaces?
>    But why would one clutter output with netdev in/egress when bridge
>    was asked for?
> 2. show pre/in/fwd/out/postrouting NFPROTO_BRIDGE hooks
> 3. should it show ip/ip6 hooks? They could be relevant in case
>    of broute expression in nftables.
>    ip/ip6 Output+postrouting are travesed in case of local packets.
> 
> and it would have to show arp hooks, no?  for arp packet, we might pass
> them up to local stack.  Local arp query pass through arp:output.
> 
> So I'm just worried that this gets complicated/murky as to what is shown
> (and what is omitted).  For bridge, we would have to end up dumping
> everything and treat it like 'list hooks'....
> 
> I do admit that it would be nice to have something like:
> 
> nft list pipeline meta input eth0 ip saddr 1.2.3.4 ip daddr 5.6.7.8
> 
> and then have nft:
> 1. list NFPROTO_NETDEV for eth0 ingress only (no egress)
> 2. list NFPROTO_INET hooks for ingress eth0
> 3. list NFPROTO_IPV4 hooks for prerouting
> 4. query FIB to get output device for 1.2.3.4->5.6.7.8
> 5. list NFPROTO_IPV4 for EITHER input or forward+postrouting
> 6. for forward case, list NFPROTO_NETDEV egress hooks for the
> fib-derviced output device
> 
> But thats hard, because there might be rules in place that
> alter ip daddr or ip saddr, or packet mark etc etc so the
> pipeline shown can be a wrong.

Yes, providing a pipeline description is a much wider task than just
listing hooks. Please, move on with restoring list hooks.

Thanks Florian.
Phil Sutter Aug. 13, 2024, 11:06 a.m. UTC | #8
Hi,

On Mon, Jul 29, 2024 at 05:32:11PM +0200, Florian Westphal wrote:
[...]
> inet ingress is already awkward in my opinion, I'm not sure why it
> got added.

I suppose the ability to reuse other inet family ruleset parts for
ingress was a sought-after feature.

In hindsight, making tables family-agnostic and thus pure "containers"
for grouping ruleset parts might be a superior design choice. (Or maybe
I just miss the rub in doing that. :)

[...]
> I agree that we first need to figure out what 'nft list hooks xxxx'
> should do.  I would prefer 'no guesses' approach, i.e.

Let me suggest a deviation which seems more user-friendly:

> 1. nft list hooks
>   dump everything EXCEPT netdev families/devices

Include netdev here, make it really list *all* hooks. Iterating over
the list of currently existing NICs in this netns is no big deal, is
it?

> 2. nft list hooks netdev device foo
>    dump ingress/egress netdev hooks,
>    INCLUDING inet ingress (its scoped to the device).

Drop 'netdev' from the syntax here. The output really is "all hooks
specific to that NIC", not necessarily only netdev ones. (And "device"
is a distinct identifier for network interfaces in nftables syntax.)

> 3. nft list hooks arp
>    dump arp input and output, if any
> 4. nft list hooks bridge
>    dump bridge pre/input/forward/out/postrouting
> 5. nft list hooks ip
>    dump ip pre/input/forward/out/postrouting
> 6. nft list hooks ip6 -> see above
> 7. nft list hooks inet -> alias for dumping both ip+ip6 hooks.

I wonder if this is more confusing than not - on the netfilter hooks
layer, it doesn't quite exist. The only thing which persists a tad
longer is inet ingress, indicated by nf_register_net_hook() passing it
down to __nf_register_net_hook for nf_hook_entry_head() to return the
same value as for netdev ingress. I guess they could be spliced even
further up.

> This also means that i'd make 'inet device foo' return a warning
> that the device will be ignored because "inet ingress" is syntactic
> frontend sugar similar to inet pseudo-family.

Iff my claim holds true, there is no such thing as an inet hook and
thus also no inet device one. :)

> We could try the 'detect pipeline' later.  But, as per example
> above, I don't think its easy, unless one omits all packet rewrites
> (stateless nat, dnat, redirect) and everything else that causes
> re-routing, e.g. policy routing, tproxy, tc(x), etc.
> 
> And then there is l3 and l2 multicasting...
> 
> But, admittingly, it might be nice to have something and it might be
> good enough if pipeline alterations by ruleset and other entities
> are omitted.

I seem to recall us discussing something like that on NFWS, but to
simulate traversal of a packet with given properties through the
ruleset. Which would also identify the hooks involved, I guess?

Cheers, Phil
Pablo Neira Ayuso Aug. 19, 2024, 10:56 a.m. UTC | #9
Hi Phil, Florian,

@Florian, could you push out what you have to flush your queue in this front?

On Tue, Aug 13, 2024 at 01:06:49PM +0200, Phil Sutter wrote:
> On Mon, Jul 29, 2024 at 05:32:11PM +0200, Florian Westphal wrote:
> [...]
> Let me suggest a deviation which seems more user-friendly:
> 
> > 1. nft list hooks
> >   dump everything EXCEPT netdev families/devices
> 
> Include netdev here, make it really list *all* hooks. Iterating over
> the list of currently existing NICs in this netns is no big deal, is
> it?

I like this suggestion.

I think it should be possible to incrementally extend with these
suggestions, it should be possible to submit patches for your review.

> > 2. nft list hooks netdev device foo
> >    dump ingress/egress netdev hooks,
> >    INCLUDING inet ingress (its scoped to the device).
> 
> Drop 'netdev' from the syntax here. The output really is "all hooks
> specific to that NIC", not necessarily only netdev ones. (And "device"
> is a distinct identifier for network interfaces in nftables syntax.)

I think allowing 'device foo' without family would be good.

> > 3. nft list hooks arp
> >    dump arp input and output, if any
> > 4. nft list hooks bridge
> >    dump bridge pre/input/forward/out/postrouting
> > 5. nft list hooks ip
> >    dump ip pre/input/forward/out/postrouting
> > 6. nft list hooks ip6 -> see above
> > 7. nft list hooks inet -> alias for dumping both ip+ip6 hooks.
> 
> I wonder if this is more confusing than not - on the netfilter hooks
> layer, it doesn't quite exist. The only thing which persists a tad
> longer is inet ingress, indicated by nf_register_net_hook() passing it
> down to __nf_register_net_hook for nf_hook_entry_head() to return the
> same value as for netdev ingress. I guess they could be spliced even
> further up.

inet is an "alias". I think dumpg both ip+ip6 hooks should be fine.

> > This also means that i'd make 'inet device foo' return a warning
> > that the device will be ignored because "inet ingress" is syntactic
> > frontend sugar similar to inet pseudo-family.
> 
> Iff my claim holds true, there is no such thing as an inet hook and
> thus also no inet device one. :)
> 
> > We could try the 'detect pipeline' later.  But, as per example
> > above, I don't think its easy, unless one omits all packet rewrites
> > (stateless nat, dnat, redirect) and everything else that causes
> > re-routing, e.g. policy routing, tproxy, tc(x), etc.
> > 
> > And then there is l3 and l2 multicasting...
> > 
> > But, admittingly, it might be nice to have something and it might be
> > good enough if pipeline alterations by ruleset and other entities
> > are omitted.
> 
> I seem to recall us discussing something like that on NFWS, but to
> simulate traversal of a packet with given properties through the
> ruleset. Which would also identify the hooks involved, I guess?

Yes, this all is more complicated and it needs a whole lot of work to
add this feature in a complete way, which makes sense to schedule this
for the _future_.
Florian Westphal Aug. 19, 2024, 12:10 p.m. UTC | #10
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Phil, Florian,
> 
> @Florian, could you push out what you have to flush your queue in this front?

OK, I pushed the patches to nftables.git.

> > > 1. nft list hooks
> > >   dump everything EXCEPT netdev families/devices
> > 
> > Include netdev here, make it really list *all* hooks. Iterating over
> > the list of currently existing NICs in this netns is no big deal, is
> > it?
> 
> I like this suggestion.

Fail enough, I will send a patch for this later this week.

> > > 2. nft list hooks netdev device foo
> > >    dump ingress/egress netdev hooks,
> > >    INCLUDING inet ingress (its scoped to the device).
> > 
> > Drop 'netdev' from the syntax here. The output really is "all hooks
> > specific to that NIC", not necessarily only netdev ones. (And "device"
> > is a distinct identifier for network interfaces in nftables syntax.)
> 
> I think allowing 'device foo' without family would be good.

OK,  I'm still unclear however because internally only netdev
families exist at the device level, so I'm not sure how to represent
this.

But dumping the existing network devices and querying them all is not
and issue so I will make a patch for this.
diff mbox series

Patch

diff --git a/Makefile.am b/Makefile.am
index 9088170bfc68..ef198dafcbc8 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -322,6 +322,7 @@  A2X_OPTS_MANPAGE = \
 ASCIIDOC_MAIN = doc/nft.txt
 
 ASCIIDOC_INCLUDES = \
+	doc/additional-commands.txt \
 	doc/data-types.txt \
 	doc/payload-expression.txt \
 	doc/primary-expression.txt \
diff --git a/doc/additional-commands.txt b/doc/additional-commands.txt
new file mode 100644
index 000000000000..dd1b3d2d87d4
--- /dev/null
+++ b/doc/additional-commands.txt
@@ -0,0 +1,115 @@ 
+LIST HOOKS
+~~~~~~~~~~
+
+This shows the low-level netfilter processing pipeline, including
+functions registered by kernel modules such as nf_conntrack. +
+
+[verse]
+____
+*list hooks* ['family']
+*list hooks netdev device* 'DEVICE_NAME'
+____
+
+*list hooks* is enough to display everything that is active
+on the system, however, it does currently omit hooks that are
+tied to a specific network device (netdev family). To obtain
+those, the network device needs to be queried by name.
+Example Usage:
+
+.List all active netfilter hooks in either the ip or ip6 stack
+--------------------------------------------------------------
+% nft list hooks inet
+family ip {
+        hook prerouting {
+                -0000000400 ipv4_conntrack_defrag [nf_defrag_ipv4]
+                -0000000200 ipv4_conntrack_in [nf_conntrack]
+                -0000000100 nf_nat_ipv4_pre_routing [nf_nat]
+        }
+        hook input {
+                 0000000000 chain inet filter input [nf_tables]
+                +0000000100 nf_nat_ipv4_local_in [nf_nat]
+[..]
+--------------------------------------------------------------
+
+The above shows a host that has nat, conntrack and ipv4 packet
+defragmentation enabled.
+For each hook location for the queried family a list of active hooks
+using the format +
+
+*priority* *identifier* [*module_name*]
+
+will be shown.
+
+The *priority* value dictates the order in which the hooks are called.
+The list is sorted, the lowest number is run first.
+
+The priority value of hooks registered by the kernel cannot be changed.
+For basechains registered by nftables, this value corresponds to the
+*priority* value specified in the base chain definition.
+
+After the numerical value, information about the hook is shown.
+For basechains defined in nftables this includes the table family,
+the table name and the basechains name.
+For hooks coming from kernel modules, the function name is used
+instead.
+
+If a *module name* is given, the hook was registered by the kernel
+module with this name.  You can use 'modinfo *module name*' to
+obtain more information about the module.
+
+This functionality requires a kernel built with the option +
+CONFIG_NETFILTER_NETLINK_HOOK
+enabled, either as a module or builtin. The module is named
+*nfnetlink_hook*.
+
+MONITOR
+~~~~~~~
+The monitor command allows you to listen to Netlink events produced by the
+nf_tables subsystem. These are either related to creation and deletion of
+objects or to packets for which *meta nftrace* was enabled. When they
+occur, nft will print to stdout the monitored events in either JSON or
+native nft format. +
+
+[verse]
+____
+*monitor* [*new* | *destroy*] 'MONITOR_OBJECT'
+*monitor* *trace*
+
+'MONITOR_OBJECT' := *tables* | *chains* | *sets* | *rules* | *elements* | *ruleset*
+____
+
+To filter events related to a concrete object, use one of the keywords in
+'MONITOR_OBJECT'.
+
+To filter events related to a concrete action, use keyword *new* or *destroy*.
+
+The second form of invocation takes no further options and exclusively prints
+events generated for packets with *nftrace* enabled.
+
+Hit ^C to finish the monitor operation.
+
+.Listen to all events, report in native nft format
+--------------------------------------------------
+% nft monitor
+--------------------------------------------------
+
+.Listen to deleted rules, report in JSON format
+-----------------------------------------------
+% nft -j monitor destroy rules
+-----------------------------------------------
+
+.Listen to both new and destroyed chains, in native nft format
+-----------------------------------------------------------------
+% nft monitor chains
+-------------------------------
+
+.Listen to ruleset events such as table, chain, rule, set, counters and quotas, in native nft format
+----------------------------------------------------------------------------------------------------
+% nft monitor ruleset
+---------------------
+
+.Trace incoming packets from host 10.0.0.1
+------------------------------------------
+% nft add rule filter input ip saddr 10.0.0.1 meta nftrace set 1
+% nft monitor trace
+------------------------------------------
diff --git a/doc/nft.txt b/doc/nft.txt
index 3f4593a29831..7e8c8695522d 100644
--- a/doc/nft.txt
+++ b/doc/nft.txt
@@ -766,17 +766,6 @@  and subtraction can be used to set relative priority, e.g. filter + 5 equals to
 *destroy*:: Delete the specified flowtable, it does not fail if it does not exist.
 *list*:: List all flowtables.
 
-LISTING
--------
-[verse]
-*list { secmarks | synproxys | flow tables | meters | hooks }* ['family']
-*list { secmarks | synproxys | flow tables | meters | hooks } table* ['family'] 'table'
-*list ct { timeout | expectation | helper | helpers } table* ['family'] 'table'
-
-Inspect configured objects.
-*list hooks* shows the full hook pipeline, including those registered by
-kernel modules, such as nf_conntrack.
-
 STATEFUL OBJECTS
 ----------------
 [verse]
@@ -908,57 +897,7 @@  ADDITIONAL COMMANDS
 -------------------
 These are some additional commands included in nft.
 
-MONITOR
-~~~~~~~~
-The monitor command allows you to listen to Netlink events produced by the
-nf_tables subsystem. These are either related to creation and deletion of
-objects or to packets for which *meta nftrace* was enabled. When they
-occur, nft will print to stdout the monitored events in either JSON or
-native nft format. +
-
-[verse]
-____
-*monitor* [*new* | *destroy*] 'MONITOR_OBJECT'
-*monitor* *trace*
-
-'MONITOR_OBJECT' := *tables* | *chains* | *sets* | *rules* | *elements* | *ruleset*
-____
-
-To filter events related to a concrete object, use one of the keywords in
-'MONITOR_OBJECT'.
-
-To filter events related to a concrete action, use keyword *new* or *destroy*.
-
-The second form of invocation takes no further options and exclusively prints
-events generated for packets with *nftrace* enabled.
-
-Hit ^C to finish the monitor operation.
-
-.Listen to all events, report in native nft format
---------------------------------------------------
-% nft monitor
---------------------------------------------------
-
-.Listen to deleted rules, report in JSON format
------------------------------------------------
-% nft -j monitor destroy rules
------------------------------------------------
-
-.Listen to both new and destroyed chains, in native nft format
------------------------------------------------------------------
-% nft monitor chains
--------------------------------
-
-.Listen to ruleset events such as table, chain, rule, set, counters and quotas, in native nft format
-----------------------------------------------------------------------------------------------------
-% nft monitor ruleset
----------------------
-
-.Trace incoming packets from host 10.0.0.1
-------------------------------------------
-% nft add rule filter input ip saddr 10.0.0.1 meta nftrace set 1
-% nft monitor trace
-------------------------------------------
+include::additional-commands.txt[]
 
 ERROR REPORTING
 ---------------