diff mbox series

[RFC,net-next,2/3] netfilter: nft_fib: Mask upper DSCP bits before FIB lookup

Message ID 20240725131729.1729103-3-idosch@nvidia.com
State RFC, archived
Headers show
Series Preparations for FIB rule DSCP selector | expand

Commit Message

Ido Schimmel July 25, 2024, 1:17 p.m. UTC
As part of its functionality, the nftables FIB expression module
performs a FIB lookup, but unlike other users of the FIB lookup API, it
does so without masking the upper DSCP bits. In particular, this differs
from the equivalent iptables match ("rpfilter") that does mask the upper
DSCP bits before the FIB lookup.

Align the module to other users of the FIB lookup API and mask the upper
DSCP bits using IPTOS_RT_MASK before the lookup.

No regressions in nft_fib.sh:

 # ./nft_fib.sh
 PASS: fib expression did not cause unwanted packet drops
 PASS: fib expression did drop packets for 1.1.1.1
 PASS: fib expression did drop packets for 1c3::c01d
 PASS: fib expression forward check with policy based routing

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 net/ipv4/netfilter/nft_fib_ipv4.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Guillaume Nault July 26, 2024, 1:15 p.m. UTC | #1
On Thu, Jul 25, 2024 at 04:17:28PM +0300, Ido Schimmel wrote:
> As part of its functionality, the nftables FIB expression module
> performs a FIB lookup, but unlike other users of the FIB lookup API, it
> does so without masking the upper DSCP bits. In particular, this differs
> from the equivalent iptables match ("rpfilter") that does mask the upper
> DSCP bits before the FIB lookup.
> 
> Align the module to other users of the FIB lookup API and mask the upper
> DSCP bits using IPTOS_RT_MASK before the lookup.

If Florian and Pablo are okay with this change and the long term plan
to allow full DSCP match, then I'm all for it.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Florian Westphal July 26, 2024, 1:32 p.m. UTC | #2
Ido Schimmel <idosch@nvidia.com> wrote:
> @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
>  	if (priv->flags & NFTA_FIB_F_MARK)
>  		fl4.flowi4_mark = pkt->skb->mark;
>  
> -	fl4.flowi4_tos = iph->tos & DSCP_BITS;
> +	fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;

If this is supposed to get centralised, wouldn't it
make more sense to not mask it, or will that happen later?

I thought plan was to ditch RT_MASK...
Guillaume Nault July 26, 2024, 3:40 p.m. UTC | #3
On Fri, Jul 26, 2024 at 03:32:48PM +0200, Florian Westphal wrote:
> Ido Schimmel <idosch@nvidia.com> wrote:
> > @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
> >  	if (priv->flags & NFTA_FIB_F_MARK)
> >  		fl4.flowi4_mark = pkt->skb->mark;
> >  
> > -	fl4.flowi4_tos = iph->tos & DSCP_BITS;
> > +	fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
> 
> If this is supposed to get centralised, wouldn't it
> make more sense to not mask it, or will that happen later?

I think Ido prefers to have this behaviour introduced in a dedicated
patch, rather than as a side effect of the centralisation done in
patch 3/3.

Once patch 3/3 is applied, the next step would be to remove all those
redundant masks (including this new nft_fib4_eval() one), so that
fib4_rule_match(), fib_table_lookup() and fib_select_default() could
see the full DSCP.

This will allow the final step of allowing IPv4 routes and fib-rules to
be configured for matching either the DSCP bits or only the old TOS ones.

> I thought plan was to ditch RT_MASK...

That was my preference too. But Ido's affraid of potential users who
may depend on fib-rules with tos=0x1c matching packets with
dsfield=0xfc. Centralising the mask would allow us to configure this
behaviour upon route or fib-rule creation.

Here's the original thread that lead to this RFC, if anyone wants more
details on the current plan:
https://lore.kernel.org/netdev/ZnwCWejSuOTqriJc@shredder.mtl.com/
Florian Westphal July 28, 2024, 2:30 a.m. UTC | #4
Ido Schimmel <idosch@nvidia.com> wrote:
>  void nft_fib4_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
>  			const struct nft_pktinfo *pkt)
>  {
> @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
>  	if (priv->flags & NFTA_FIB_F_MARK)
>  		fl4.flowi4_mark = pkt->skb->mark;
>  
> -	fl4.flowi4_tos = iph->tos & DSCP_BITS;
> +	fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;

I was confused because cover letter talks about allowing both tos or dscp depending on
new nlattr for ipv4, but then this patch makes that impossible because dscp bits get masked.

patch 3 says:
----
A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.
----

But nft_fib_ipv4.c already does that.

So I would suggest to just drop this patch and then get rid of '&
DSCP_BITS' once everything is in place.

But feel free to handle this as you prefer.
Ido Schimmel July 28, 2024, 10:51 a.m. UTC | #5
On Sun, Jul 28, 2024 at 04:30:40AM +0200, Florian Westphal wrote:
> Ido Schimmel <idosch@nvidia.com> wrote:
> >  void nft_fib4_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
> >  			const struct nft_pktinfo *pkt)
> >  {
> > @@ -110,7 +108,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
> >  	if (priv->flags & NFTA_FIB_F_MARK)
> >  		fl4.flowi4_mark = pkt->skb->mark;
> >  
> > -	fl4.flowi4_tos = iph->tos & DSCP_BITS;
> > +	fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
> 
> I was confused because cover letter talks about allowing both tos or dscp depending on
> new nlattr for ipv4, but then this patch makes that impossible because dscp bits get masked.
> 
> patch 3 says:
> ----
> A prerequisite for allowing FIB rules to match on DSCP is to adjust all
> the call sites to initialize the high order DSCP bits and remove their
> masking along the path to the core where the field is matched on.
> ----
> 
> But nft_fib_ipv4.c already does that.
> 
> So I would suggest to just drop this patch and then get rid of '&
> DSCP_BITS' once everything is in place.
> 
> But feel free to handle this as you prefer.

My preference is to first align all the users to mask the upper DSCP
bits (patches #1-#2), then move the masking to the core (patch #3) and
finally remove the masking from the various call sites (future
patchsets). Will make it clearer in the cover letter.

Thanks!
diff mbox series

Patch

diff --git a/net/ipv4/netfilter/nft_fib_ipv4.c b/net/ipv4/netfilter/nft_fib_ipv4.c
index 9eee535c64dd..df94bc28c3d7 100644
--- a/net/ipv4/netfilter/nft_fib_ipv4.c
+++ b/net/ipv4/netfilter/nft_fib_ipv4.c
@@ -22,8 +22,6 @@  static __be32 get_saddr(__be32 addr)
 	return addr;
 }
 
-#define DSCP_BITS     0xfc
-
 void nft_fib4_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
 			const struct nft_pktinfo *pkt)
 {
@@ -110,7 +108,7 @@  void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
 	if (priv->flags & NFTA_FIB_F_MARK)
 		fl4.flowi4_mark = pkt->skb->mark;
 
-	fl4.flowi4_tos = iph->tos & DSCP_BITS;
+	fl4.flowi4_tos = iph->tos & IPTOS_RT_MASK;
 
 	if (priv->flags & NFTA_FIB_F_DADDR) {
 		fl4.daddr = iph->daddr;