diff mbox series

[v2] xfrm: interface: Don't hide plain packets from netfilter

Message ID 20201207134309.16762-1-phil@nwl.cc
State Superseded
Headers show
Series [v2] xfrm: interface: Don't hide plain packets from netfilter | expand

Commit Message

Phil Sutter Dec. 7, 2020, 1:43 p.m. UTC
With an IPsec tunnel without dedicated interface, netfilter sees locally
generated packets twice as they exit the physical interface: Once as "the
inner packet" with IPsec context attached and once as the encrypted
(ESP) packet.

With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
hook anymore, making it impossible to match on both inner header values
and associated IPsec data from that hook.

Fix this by looping packets transmitted from xfrm_interface through
NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
behaviour consistent again from netfilter's point of view.

Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
Changes since v1:
- Extend recipients list, no code changes.
---
 net/xfrm/xfrm_interface.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Nicolas Dichtel Dec. 8, 2020, 9:02 a.m. UTC | #1
Le 07/12/2020 à 14:43, Phil Sutter a écrit :
> With an IPsec tunnel without dedicated interface, netfilter sees locally
> generated packets twice as they exit the physical interface: Once as "the
> inner packet" with IPsec context attached and once as the encrypted
> (ESP) packet.
> 
> With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
> hook anymore, making it impossible to match on both inner header values
> and associated IPsec data from that hook.
> 
> Fix this by looping packets transmitted from xfrm_interface through
> NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
> behaviour consistent again from netfilter's point of view.
> 
> Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces")
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
> Changes since v1:
> - Extend recipients list, no code changes.
> ---
>  net/xfrm/xfrm_interface.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
> index aa4cdcf69d471..24af61c95b4d4 100644
> --- a/net/xfrm/xfrm_interface.c
> +++ b/net/xfrm/xfrm_interface.c
> @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
>  	skb_dst_set(skb, dst);
>  	skb->dev = tdev;
>  
> -	err = dst_output(xi->net, skb->sk, skb);
> +	err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net,
skb->protocol must be correctly set, maybe better to use it instead of
skb_dst(skb)->ops->family?

> +		      skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output);
And here, tdev instead of skb_dst(skb)->dev ?

>  	if (net_xmit_eval(err) == 0) {
>  		struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
>  
>
Phil Sutter Dec. 8, 2020, 2 p.m. UTC | #2
Hi Nicolas,

On Tue, Dec 08, 2020 at 10:02:16AM +0100, Nicolas Dichtel wrote:
> Le 07/12/2020 à 14:43, Phil Sutter a écrit :
[...]
> > diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
> > index aa4cdcf69d471..24af61c95b4d4 100644
> > --- a/net/xfrm/xfrm_interface.c
> > +++ b/net/xfrm/xfrm_interface.c
> > @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
> >  	skb_dst_set(skb, dst);
> >  	skb->dev = tdev;
> >  
> > -	err = dst_output(xi->net, skb->sk, skb);
> > +	err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net,
> skb->protocol must be correctly set, maybe better to use it instead of
> skb_dst(skb)->ops->family?

skb->protocol holds ETH_P_* values in network byte order, NF_HOOK()
expects an NFPROTO_* value, so this would at least not be a simple
replacement. Actually I copied the code from xfrm_output_resume() in
xfrm_output.c, where skb_dst(skb)->ops is dereferenced without checking
as well. Do you think this is risky?

> > +		      skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output);
> And here, tdev instead of skb_dst(skb)->dev ?

Well yes, tdev was set to dst->dev earlier. Likewise I could use dst
directly instead of skb_dst(skb) to simplify the call a bit further.
OTOH I like how in the version above it is clear that skb's dst should
be used, irrespective of the code above (and any later changes that may
receive). No strong opinion though, so if your version is regarded the
preferred one, I'm fine with that.

Thanks, Phil
Nicolas Dichtel Dec. 8, 2020, 2:45 p.m. UTC | #3
Le 08/12/2020 à 15:00, Phil Sutter a écrit :
> Hi Nicolas,
> 
> On Tue, Dec 08, 2020 at 10:02:16AM +0100, Nicolas Dichtel wrote:
>> Le 07/12/2020 à 14:43, Phil Sutter a écrit :
> [...]
>>> diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
>>> index aa4cdcf69d471..24af61c95b4d4 100644
>>> --- a/net/xfrm/xfrm_interface.c
>>> +++ b/net/xfrm/xfrm_interface.c
>>> @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
>>>  	skb_dst_set(skb, dst);
>>>  	skb->dev = tdev;
>>>  
>>> -	err = dst_output(xi->net, skb->sk, skb);
>>> +	err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net,
>> skb->protocol must be correctly set, maybe better to use it instead of
>> skb_dst(skb)->ops->family?
> 
> skb->protocol holds ETH_P_* values in network byte order, NF_HOOK()
> expects an NFPROTO_* value, so this would at least not be a simple
Yes, right. I forgot that.

> 
>>> +		      skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output);
>> And here, tdev instead of skb_dst(skb)->dev ?
> 
> Well yes, tdev was set to dst->dev earlier. Likewise I could use dst
> directly instead of skb_dst(skb) to simplify the call a bit further.
> OTOH I like how in the version above it is clear that skb's dst should
> be used, irrespective of the code above (and any later changes that may
> receive). No strong opinion though, so if your version is regarded the
> preferred one, I'm fine with that.
I would vote for tdev, because:
 - the reader don't have to wonder why tdev is used for dst->dev and not
   for NF_HOOK();
 - tdev has probably been declared so that dst->dev is dereferenced only once;
 - using the same variable everywhere in the function eases code reading.


Thank you,
Nicolas
Eyal Birger Dec. 8, 2020, 2:47 p.m. UTC | #4
Hi Phil,

On Mon, Dec 7, 2020 at 4:07 PM Phil Sutter <phil@nwl.cc> wrote:
>
> With an IPsec tunnel without dedicated interface, netfilter sees locally
> generated packets twice as they exit the physical interface: Once as "the
> inner packet" with IPsec context attached and once as the encrypted
> (ESP) packet.
>
> With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
> hook anymore, making it impossible to match on both inner header values
> and associated IPsec data from that hook.
>

Why wouldn't locally generated traffic not traverse the
NF_INET_LOCAL_OUT hook via e.g. __ip_local_out() when xmitted on an xfrmi?
I would expect it to appear in netfilter, but without the IPsec
context, as it's not
there yet.

> Fix this by looping packets transmitted from xfrm_interface through
> NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
> behaviour consistent again from netfilter's point of view.

When an XFRM interface is used when forwarding, why would it be correct
for NF_INET_LOCAL_OUT to observe the inner packet?

What am I missing?

Thanks!
Eyal.
Phil Sutter Dec. 8, 2020, 6:51 p.m. UTC | #5
Hi Eyal,

On Tue, Dec 08, 2020 at 04:47:02PM +0200, Eyal Birger wrote:
> On Mon, Dec 7, 2020 at 4:07 PM Phil Sutter <phil@nwl.cc> wrote:
> >
> > With an IPsec tunnel without dedicated interface, netfilter sees locally
> > generated packets twice as they exit the physical interface: Once as "the
> > inner packet" with IPsec context attached and once as the encrypted
> > (ESP) packet.
> >
> > With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
> > hook anymore, making it impossible to match on both inner header values
> > and associated IPsec data from that hook.
> >
> 
> Why wouldn't locally generated traffic not traverse the
> NF_INET_LOCAL_OUT hook via e.g. __ip_local_out() when xmitted on an xfrmi?
> I would expect it to appear in netfilter, but without the IPsec
> context, as it's not
> there yet.

Yes, that's right. Having an iptables rule with LOG target in OUTPUT
chain, a packet sent from the local host is logged multiple times:

| IN= OUT=xfrm SRC=192.168.111.1 DST=192.168.111.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21840 DF
| PROTO=ICMP TYPE=8 CODE=0 ID=56857 SEQ=1
| IN= OUT=eth0 SRC=192.168.111.1 DST=192.168.111.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21840 DF PROTO=ICMP TYPE=8 CODE=0 ID=56857 SEQ=1
| IN= OUT=eth0 SRC=192.168.1.1 DST=192.168.1.2 LEN=140 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ESP SPI=0x1000

First when being sent to xfrm interface, then two times between xfrm and
eth0, the second time as ESP packet. This is with my patch applied.
Without it, the second log entry is missing. I'm arguing the above is
consistent to IPsec without xfrm interface:

| IN= OUT=eth1 SRC=192.168.112.1 DST=192.168.112.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49341 DF PROTO=ICMP TYPE=8 CODE=0 ID=44114 SEQ=1
| IN= OUT=eth1 SRC=192.168.2.1 DST=192.168.2.2 LEN=140 TOS=0x00 PREC=0x00 TTL=64 ID=37109 DF PROTO=ESP SPI=0x1000

The packet appears twice being sent to eth1, the second time as ESP
packet. I understand xfrm interface as a collector of to-be-xfrmed
packets, dropping those which do not match a policy.

> > Fix this by looping packets transmitted from xfrm_interface through
> > NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
> > behaviour consistent again from netfilter's point of view.
> 
> When an XFRM interface is used when forwarding, why would it be correct
> for NF_INET_LOCAL_OUT to observe the inner packet?

A valid question, indeed. One could interpret packets being forwarded by
those tunneling devices emit the packets one feeds them from the local
host. I just checked and ip_vti behaves identical to xfrm_interface
prior to my patch, so maybe my patch is crap and the inability to match
on ipsec context data when using any of those devices is just by design.

Thanks, Phil
Eyal Birger Dec. 9, 2020, 2:40 p.m. UTC | #6
Hi Phil,

On Tue, Dec 8, 2020 at 8:51 PM Phil Sutter <phil@nwl.cc> wrote:
>
> Hi Eyal,
>
> On Tue, Dec 08, 2020 at 04:47:02PM +0200, Eyal Birger wrote:
> > On Mon, Dec 7, 2020 at 4:07 PM Phil Sutter <phil@nwl.cc> wrote:
> > >
> > > With an IPsec tunnel without dedicated interface, netfilter sees locally
> > > generated packets twice as they exit the physical interface: Once as "the
> > > inner packet" with IPsec context attached and once as the encrypted
> > > (ESP) packet.
> > >
> > > With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
> > > hook anymore, making it impossible to match on both inner header values
> > > and associated IPsec data from that hook.
> > >
> >
> > Why wouldn't locally generated traffic not traverse the
> > NF_INET_LOCAL_OUT hook via e.g. __ip_local_out() when xmitted on an xfrmi?
> > I would expect it to appear in netfilter, but without the IPsec
> > context, as it's not
> > there yet.
>
> Yes, that's right. Having an iptables rule with LOG target in OUTPUT
> chain, a packet sent from the local host is logged multiple times:
>
> | IN= OUT=xfrm SRC=192.168.111.1 DST=192.168.111.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21840 DF
> | PROTO=ICMP TYPE=8 CODE=0 ID=56857 SEQ=1
> | IN= OUT=eth0 SRC=192.168.111.1 DST=192.168.111.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21840 DF PROTO=ICMP TYPE=8 CODE=0 ID=56857 SEQ=1
> | IN= OUT=eth0 SRC=192.168.1.1 DST=192.168.1.2 LEN=140 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ESP SPI=0x1000
>
> First when being sent to xfrm interface, then two times between xfrm and
> eth0, the second time as ESP packet. This is with my patch applied.
> Without it, the second log entry is missing. I'm arguing the above is
> consistent to IPsec without xfrm interface:
>
> | IN= OUT=eth1 SRC=192.168.112.1 DST=192.168.112.2 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49341 DF PROTO=ICMP TYPE=8 CODE=0 ID=44114 SEQ=1
> | IN= OUT=eth1 SRC=192.168.2.1 DST=192.168.2.2 LEN=140 TOS=0x00 PREC=0x00 TTL=64 ID=37109 DF PROTO=ESP SPI=0x1000
>
> The packet appears twice being sent to eth1, the second time as ESP
> packet. I understand xfrm interface as a collector of to-be-xfrmed
> packets, dropping those which do not match a policy.
>
> > > Fix this by looping packets transmitted from xfrm_interface through
> > > NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
> > > behaviour consistent again from netfilter's point of view.
> >
> > When an XFRM interface is used when forwarding, why would it be correct
> > for NF_INET_LOCAL_OUT to observe the inner packet?
>
> A valid question, indeed. One could interpret packets being forwarded by
> those tunneling devices emit the packets one feeds them from the local
> host. I just checked and ip_vti behaves identical to xfrm_interface
> prior to my patch, so maybe my patch is crap and the inability to match
> on ipsec context data when using any of those devices is just by design.
>

I would find such interpretation and behavior to be surprising for an IPsec
forwarder...
I guess some functionality of policy matching is lost with these
devices; although they do offer the ability to match ipsec traffic based on
the destination interface it is possible to have multiple ipsec flows share
the same device so netfilter doesn't provide the ability to distinguish
between different flows on the outbound direction in such cases.

Thanks,
Eyal.
Nicolas Dichtel Dec. 10, 2020, 11:10 a.m. UTC | #7
Le 09/12/2020 à 15:40, Eyal Birger a écrit :
> Hi Phil,
> 
> On Tue, Dec 8, 2020 at 8:51 PM Phil Sutter <phil@nwl.cc> wrote:
>>
>> Hi Eyal,
>>
>> On Tue, Dec 08, 2020 at 04:47:02PM +0200, Eyal Birger wrote:
>>> On Mon, Dec 7, 2020 at 4:07 PM Phil Sutter <phil@nwl.cc> wrote:
[snip]
>>
>> The packet appears twice being sent to eth1, the second time as ESP
>> packet. I understand xfrm interface as a collector of to-be-xfrmed
>> packets, dropping those which do not match a policy.
>>
>>>> Fix this by looping packets transmitted from xfrm_interface through
>>>> NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
>>>> behaviour consistent again from netfilter's point of view.
>>>
>>> When an XFRM interface is used when forwarding, why would it be correct
>>> for NF_INET_LOCAL_OUT to observe the inner packet?
I think it is valid because:
 - it would be consistent with ip tunnels (see iptunnel_xmit())
 - it would be consistent with the standard xfrm path see [1]
 - from the POV of the forwarder, the packet is locally emitted, the src @ is
   owned by the forwarder.

[1] https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg

>>
>> A valid question, indeed. One could interpret packets being forwarded by
>> those tunneling devices emit the packets one feeds them from the local
>> host. I just checked and ip_vti behaves identical to xfrm_interface
>> prior to my patch, so maybe my patch is crap and the inability to match
>> on ipsec context data when using any of those devices is just by design.
There was no real design for vti[6] interfaces, it's why xfrmi interfaces have
been added. But they should be consistent I think, so this patch should handle
xfrmi and vti[6] together.


Regards,
Nicolas

>>
> 
> I would find such interpretation and behavior to be surprising for an IPsec
> forwarder...
> I guess some functionality of policy matching is lost with these
> devices; although they do offer the ability to match ipsec traffic based on
> the destination interface it is possible to have multiple ipsec flows share
> the same device so netfilter doesn't provide the ability to distinguish
> between different flows on the outbound direction in such cases.
> 
> Thanks,
> Eyal.
>
Eyal Birger Dec. 10, 2020, 11:48 a.m. UTC | #8
Hi Nicolas,

On Thu, Dec 10, 2020 at 1:10 PM Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
>
> Le 09/12/2020 à 15:40, Eyal Birger a écrit :
> > Hi Phil,
> >
> > On Tue, Dec 8, 2020 at 8:51 PM Phil Sutter <phil@nwl.cc> wrote:
> >>
> >> Hi Eyal,
> >>
> >> On Tue, Dec 08, 2020 at 04:47:02PM +0200, Eyal Birger wrote:
> >>> On Mon, Dec 7, 2020 at 4:07 PM Phil Sutter <phil@nwl.cc> wrote:
> [snip]
> >>
> >> The packet appears twice being sent to eth1, the second time as ESP
> >> packet. I understand xfrm interface as a collector of to-be-xfrmed
> >> packets, dropping those which do not match a policy.
> >>
> >>>> Fix this by looping packets transmitted from xfrm_interface through
> >>>> NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
> >>>> behaviour consistent again from netfilter's point of view.
> >>>
> >>> When an XFRM interface is used when forwarding, why would it be correct
> >>> for NF_INET_LOCAL_OUT to observe the inner packet?
> I think it is valid because:
>  - it would be consistent with ip tunnels (see iptunnel_xmit())

Are you referring to the flow:
  iptunnel_xmit()
    ip_local_out()
      __ip_local_out()
        nf_hook(.., NF_INET_LOCAL_OUT, ...)

If I understand that flow correctly it operates on the outer packet
as it is called after all the header had been pushed already. no?
Or are you referring to a different flow?

>  - it would be consistent with the standard xfrm path see [1]

In the regular path as well I understand the OUTPUT hooks are called
after xfrm encoding in the forwarding case, so they can't see the inner
packet.

>  - from the POV of the forwarder, the packet is locally emitted, the src @ is
>    owned by the forwarder.

The inner IP source address is not owned by the forwarder to my understanding.

> >>
> >> A valid question, indeed. One could interpret packets being forwarded by
> >> those tunneling devices emit the packets one feeds them from the local
> >> host. I just checked and ip_vti behaves identical to xfrm_interface
> >> prior to my patch, so maybe my patch is crap and the inability to match
> >> on ipsec context data when using any of those devices is just by design.
> There was no real design for vti[6] interfaces, it's why xfrmi interfaces have
> been added. But they should be consistent I think, so this patch should handle
> xfrmi and vti[6] together.

I also think they should be consistent. But it'd still be confusing to me
to get an OUTPUT hook on the inner packet in the forwarding case.

Thanks,
Eyal.
Nicolas Dichtel Dec. 10, 2020, 1:18 p.m. UTC | #9
Le 10/12/2020 à 12:48, Eyal Birger a écrit :
> Hi Nicolas,
Hi Eyal,

> 
> On Thu, Dec 10, 2020 at 1:10 PM Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
[snip]
> I also think they should be consistent. But it'd still be confusing to me
> to get an OUTPUT hook on the inner packet in the forwarding case.
I re-read the whole thread and I agree with you. There is no reason to pass the
inner packet through the OUTPUT hook (my comment about the consistency with ip
tunnels is still valid ;-)).
Sorry for the confusion.

Phil, with nftables, you can match the 'kind' of the interface, that should be
enough to match packets, isn't it?


Regards,
Nicolas
Phil Sutter Dec. 10, 2020, 5:57 p.m. UTC | #10
Hi Nicolas,

On Thu, Dec 10, 2020 at 02:18:45PM +0100, Nicolas Dichtel wrote:
> Le 10/12/2020 à 12:48, Eyal Birger a écrit :
> > On Thu, Dec 10, 2020 at 1:10 PM Nicolas Dichtel
> > <nicolas.dichtel@6wind.com> wrote:
> [snip]
> > I also think they should be consistent. But it'd still be confusing to me
> > to get an OUTPUT hook on the inner packet in the forwarding case.
> I re-read the whole thread and I agree with you. There is no reason to pass the
> inner packet through the OUTPUT hook (my comment about the consistency with ip
> tunnels is still valid ;-)).
> Sorry for the confusion.
> 
> Phil, with nftables, you can match the 'kind' of the interface, that should be
> enough to match packets, isn't it?

Yes, sure. Also, the inner packet passes POSTROUTING hook with ipsec
context present, it's just not visible in OUTPUT. Of course the broader
question is what do people use ipsec context matches for. If it's really
just to ensure traffic is encrypted, xfrm_interface alone is sufficient.

Originally this was reported as "ipsec match stops working if
xfrm_interface is used" and I suspected it's a bug in the driver.
Knowing the behaviour is expected (and at least consistent with vti),
the case is closed from my side. :)

Thanks, Phil
diff mbox series

Patch

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index aa4cdcf69d471..24af61c95b4d4 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -317,7 +317,8 @@  xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 	skb_dst_set(skb, dst);
 	skb->dev = tdev;
 
-	err = dst_output(xi->net, skb->sk, skb);
+	err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net,
+		      skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output);
 	if (net_xmit_eval(err) == 0) {
 		struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);