Message ID | 20200805024131.2091206-1-liuhangbin@gmail.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] Revert "vxlan: fix tos value before xmit" | expand |
On Wed, Aug 05, 2020 at 10:41:31AM +0800, Hangbin Liu wrote: > This reverts commit 71130f29979c7c7956b040673e6b9d5643003176. > > In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to > make sure the tos value are filtered by RT_TOS() based on RFC1349. > > 0 1 2 3 4 5 6 7 > +-----+-----+-----+-----+-----+-----+-----+-----+ > | PRECEDENCE | TOS | MBZ | > +-----+-----+-----+-----+-----+-----+-----+-----+ > > But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like > > 0 1 2 3 4 5 6 7 > +-----+-----+-----+-----+-----+-----+-----+-----+ > | DS FIELD, DSCP | ECN FIELD | > +-----+-----+-----+-----+-----+-----+-----+-----+ > > So with > > IPTOS_TOS_MASK 0x1E > RT_TOS(tos) ((tos)&IPTOS_TOS_MASK) > > the first 3 bits DSCP info will get lost. > > To take all the DSCP info in xmit, we should revert the patch and just push > all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later. > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> I guess an explicit Fixes: 71130f29979c ("vxlan: fix tos value before xmit"). tag would help the -stable maintainers. Apart from that, Acked-by: Guillaume Nault <gnault@redhat.com>
On Wed, Aug 05, 2020 at 10:44:27AM +0200, Guillaume Nault wrote: > On Wed, Aug 05, 2020 at 10:41:31AM +0800, Hangbin Liu wrote: > > This reverts commit 71130f29979c7c7956b040673e6b9d5643003176. > > > > In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to > > make sure the tos value are filtered by RT_TOS() based on RFC1349. > > > > 0 1 2 3 4 5 6 7 > > +-----+-----+-----+-----+-----+-----+-----+-----+ > > | PRECEDENCE | TOS | MBZ | > > +-----+-----+-----+-----+-----+-----+-----+-----+ > > > > But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like > > > > 0 1 2 3 4 5 6 7 > > +-----+-----+-----+-----+-----+-----+-----+-----+ > > | DS FIELD, DSCP | ECN FIELD | > > +-----+-----+-----+-----+-----+-----+-----+-----+ > > > > So with > > > > IPTOS_TOS_MASK 0x1E > > RT_TOS(tos) ((tos)&IPTOS_TOS_MASK) > > > > the first 3 bits DSCP info will get lost. > > > > To take all the DSCP info in xmit, we should revert the patch and just push > > all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later. > > > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > > I guess an explicit > Fixes: 71130f29979c ("vxlan: fix tos value before xmit"). > tag would help the -stable maintainers. > > Apart from that, > Acked-by: Guillaume Nault <gnault@redhat.com> > Thanks Guillaume. I saw some revert patches have the Fixes flag and some are not, so I didn't add it. Hi David, Should I re-post the patch with Fixes flag? Thanks Hangbin
From: Hangbin Liu <liuhangbin@gmail.com> Date: Wed, 5 Aug 2020 18:18:07 +0800 > Should I re-post the patch with Fixes flag? No, I took care the Fixes tag and queued this up for -stable. But you do need to explain what kind of testing you even did on this change we are reverting. Did you make this change purely on theoretical grounds and a code audit? Because it is clear now that this commit broke things and did not fix anything at all. Please explain.
On Wed, Aug 05, 2020 at 12:11:10PM -0700, David Miller wrote: > From: Hangbin Liu <liuhangbin@gmail.com> > Date: Wed, 5 Aug 2020 18:18:07 +0800 > > > Should I re-post the patch with Fixes flag? > > No, I took care the Fixes tag and queued this up for -stable. Thanks > > But you do need to explain what kind of testing you even did on this > change we are reverting. Did you make this change purely on > theoretical grounds and a code audit? > > Because it is clear now that this commit broke things and did not fix > anything at all. > > Please explain. Yes, I do have a bug report about this and did testing before post the patch. But the test script is long and the reason for the issue is very clear(3 bits of DSCP are omitted). So I only explained the theory in the commit message. The rough steps are setting vxlan tunnel on OVS. set inner packet tos to 1011 1010 (0xba) and outer vxlan to 1111 1100(0xfc). The outer packet's tos should be 0xfe at latest as it inherit the inner ECN bit. But with RT_TOS(tos) We actually got tos 0x1e as the first 3 bits are omitted. Now here is detailed testing steps: 1. On Host A (which has commit 71130f29979c "vxlan: fix tos value before xmit"): # cat ovs.sh #!/bin/bash remoteip=192.168.1.207 ip link set eth1 up ip addr add 192.168.1.156/24 dev eth1 systemctl restart openvswitch ovs-vsctl --may-exist add-br br-int -- set Bridge br-int datapath_type=system -- br-set-external-id br-int bridge-id br-int ovs-vsctl add-port br-int vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip=$remoteip ip netns add private ip link add name veth-host type veth peer name veth-guest ovs-vsctl add-port br-int veth-host ip link set dev veth-guest netns private ip link set dev veth-host up ip -n private link set dev veth-guest up ip -n private link set dev lo up ip -n private a a dev veth-guest 192.168.123.1/24 ovs-vsctl set interface vxlan0 options:tos=0xfc 2. On Host B (which has reverted commit 71130f29979c) # cat ovs.sh #!/bin/bash remoteip=192.168.1.156 ip link set eth1 up ip addr add 192.168.1.207/24 dev eth1 systemctl restart openvswitch ovs-vsctl --may-exist add-br br-int -- set Bridge br-int datapath_type=system -- br-set-external-id br-int bridge-id br-int ovs-vsctl add-port br-int vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip=$remoteip ip netns add private ip link add name veth-host type veth peer name veth-guest ovs-vsctl add-port br-int veth-host ip link set dev veth-guest netns private ip link set dev veth-host up ip -n private link set dev veth-guest up ip -n private link set dev lo up ip -n private a a dev veth-guest 192.168.123.2/24 ovs-vsctl set interface vxlan0 options:tos=0xfc 3. On Host A, ping host B # ip netns exec private ping 192.168.123.2 -c1 -W1 -Q 0xba 4. Capture the packets from Host B # tcpdump -i eth1 -nn -l -vvv 22:34:37.663803 IP (tos 0x1e,ECT(0), ttl 64, id 63743, offset 0, flags [DF], proto UDP (17), length 134) 192.168.1.156.55502 > 192.168.1.207.4789: [no cksum] VXLAN, flags [I] (0x08), vni 0 ^^ you can see the tos value is 0x1e from Host A IP (tos 0xba,ECT(0), ttl 64, id 37413, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.123.1 > 192.168.123.2: ICMP echo request, id 22930, seq 1, length 64 22:34:37.664624 IP (tos 0xfe,ECT(0), ttl 64, id 8233, offset 0, flags [DF], proto UDP (17), length 134) 192.168.1.207.47657 > 192.168.1.156.4789: [no cksum] VXLAN, flags [I] (0x08), vni 0 ^^ From Host B it's 0xfe IP (tos 0xba,ECT(0), ttl 64, id 42030, offset 0, flags [none], proto ICMP (1), length 84) 192.168.123.2 > 192.168.123.1: ICMP echo reply, id 22930, seq 1, length 64 ^C Thanks Hangbin
From: Hangbin Liu <liuhangbin@gmail.com> Date: Thu, 6 Aug 2020 10:52:41 +0800 > The rough steps are setting vxlan tunnel on OVS. set inner packet tos to > 1011 1010 (0xba) and outer vxlan to 1111 1100(0xfc). The outer packet's tos > should be 0xfe at latest as it inherit the inner ECN bit. But with RT_TOS(tos) > We actually got tos 0x1e as the first 3 bits are omitted. > > Now here is detailed testing steps: This explains why we need to revert the RT_TOS() change. I'm asking what testing you did on the original change that added RT_TOS(), which we reverted, and which didn't fix anything. I want to know how we got into this situation in the first place, adding a change that only added negative effects. Thank you.
On Tue, Aug 11, 2020 at 05:02:23PM -0700, David Miller wrote: > From: Hangbin Liu <liuhangbin@gmail.com> > Date: Thu, 6 Aug 2020 10:52:41 +0800 > > > The rough steps are setting vxlan tunnel on OVS. set inner packet tos to > > 1011 1010 (0xba) and outer vxlan to 1111 1100(0xfc). The outer packet's tos > > should be 0xfe at latest as it inherit the inner ECN bit. But with RT_TOS(tos) > > We actually got tos 0x1e as the first 3 bits are omitted. > > > > Now here is detailed testing steps: > > This explains why we need to revert the RT_TOS() change. > > I'm asking what testing you did on the original change that added > RT_TOS(), which we reverted, and which didn't fix anything. Oh, I know what you mean now. > > I want to know how we got into this situation in the first place, > adding a change that only added negative effects. The reason is still based on the definition of RT_TOS. I have a report about the difference tos action between geneve and vxlan. For geneve: geneve_get_v4_rt() - fl4->flowi4_tos = RT_TOS(tos); geneve_xmit_skb() - tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, ip_hdr(skb), skb); For vxlan: vxlan_xmit_one() - tos = ip_tunnel_ecn_encap(tos, old_iph, skb); So geneve will use RT_TOS(tos) when xmit, while vxlan will take all tos bits. At that time I only read the code and thought we should obey the RT_TOS rule, So I submit the previous patch. Later Petr Machata remind me that we need to take care of DSCP fields. So I asked you if we should change RT_TOS() to DSCP_TOS()[1]. You replied """ The RT_TOS() value elides the two lowest bits so that we can store other pieces of binary state into those two lower bits. So you can't just blindly change the RT_TOS() definition without breaking a bunch of things. """ I'm sorry I didn't take more time to think about the your reply and just give up my thoughts. Since we bring up this topic again. Would you please help explain about what "The RT_TOS() value elides the two lowest bits" means? I'm not sure if you are talking about ECN or not. [1] https://www.spinics.net/lists/netdev/msg631249.html Thanks Hangbin
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index a7c3939264b0..35a7d409d8d3 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2722,7 +2722,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, ndst = &rt->dst; skb_tunnel_check_pmtu(skb, ndst, VXLAN_HEADROOM); - tos = ip_tunnel_ecn_encap(RT_TOS(tos), old_iph, skb); + tos = ip_tunnel_ecn_encap(tos, old_iph, skb); ttl = ttl ? : ip4_dst_hoplimit(&rt->dst); err = vxlan_build_skb(skb, ndst, sizeof(struct iphdr), vni, md, flags, udp_sum); @@ -2762,7 +2762,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, skb_tunnel_check_pmtu(skb, ndst, VXLAN6_HEADROOM); - tos = ip_tunnel_ecn_encap(RT_TOS(tos), old_iph, skb); + tos = ip_tunnel_ecn_encap(tos, old_iph, skb); ttl = ttl ? : ip6_dst_hoplimit(ndst); skb_scrub_packet(skb, xnet); err = vxlan_build_skb(skb, ndst, sizeof(struct ipv6hdr),
This reverts commit 71130f29979c7c7956b040673e6b9d5643003176. In commit 71130f29979c ("vxlan: fix tos value before xmit") we want to make sure the tos value are filtered by RT_TOS() based on RFC1349. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | PRECEDENCE | TOS | MBZ | +-----+-----+-----+-----+-----+-----+-----+-----+ But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | DS FIELD, DSCP | ECN FIELD | +-----+-----+-----+-----+-----+-----+-----+-----+ So with IPTOS_TOS_MASK 0x1E RT_TOS(tos) ((tos)&IPTOS_TOS_MASK) the first 3 bits DSCP info will get lost. To take all the DSCP info in xmit, we should revert the patch and just push all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- drivers/net/vxlan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)