diff mbox series

[ovs-dev] tnl: fix multiple tunnel pop for native tnl

Message ID 1651582494-27721-1-git-send-email-lic121@chinatelecom.cn
State Not Applicable
Headers show
Series [ovs-dev] tnl: fix multiple tunnel pop for native tnl | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/intel-ovs-compilation success test: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Cheng Li May 3, 2022, 12:54 p.m. UTC
In compose_output_action__(), terminate_native_tunnel() is called
to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
is appended. Instead of outputing the pkt to the origin port, pkt
is redirected into the tnl port.

pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
testing a pkt regardless of the out port until Ilya's patch[1].
Patch[1] adds the condition that terminal ports must have IP
address. This patch is to make addtional port ip check that port
ip address must match the pkt dst ip. This check covers the mirror
port case:
To tcpdump underlay pkt, we may create mirror port of dpdk port.
When compose_output_action__() is called on the mirror port,
pkt is redirected into tnl port again. As a result, ping shows
DUP. fast path flow has two tnl_pop actions.

```
[root@ovs1 vagrant]# ping 172.16.100.2
PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
64 bytes from 172.16.100.2: icmp_seq=1 ttl=64 time=0.606 ms
64 bytes from 172.16.100.2: icmp_seq=1 ttl=64 time=0.617 ms (DUP!)
```

```
[root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472), packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
```

[1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.")

Signed-off-by: lic121 <lic121@chinatelecom.cn>
---
 ofproto/ofproto-dpif-xlate.c | 45 +++++++++++++++++++++++++++-------------
 tests/tunnel-push-pop.at     | 49 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 14 deletions(-)

Comments

Peng He May 3, 2022, 1:31 p.m. UTC | #1
Hi,
this issue has been found and resolved by two independent patches:

http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/
http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/

but I recently found my patch is still not fixing all the cases.

would like to have a discussion about this.



lic121 <lic121@chinatelecom.cn> 于2022年5月3日周二 20:55写道:

> In compose_output_action__(), terminate_native_tunnel() is called
> to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
> is appended. Instead of outputing the pkt to the origin port, pkt
> is redirected into the tnl port.
>
> pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
> testing a pkt regardless of the out port until Ilya's patch[1].
> Patch[1] adds the condition that terminal ports must have IP
> address. This patch is to make addtional port ip check that port
> ip address must match the pkt dst ip. This check covers the mirror
> port case:
> To tcpdump underlay pkt, we may create mirror port of dpdk port.
> When compose_output_action__() is called on the mirror port,
> pkt is redirected into tnl port again. As a result, ping shows
> DUP. fast path flow has two tnl_pop actions.
>
> ```
> [root@ovs1 vagrant]# ping 172.16.100.2
> PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
> 64 bytes from 172.16.100.2: icmp_seq=1 ttl=64 time=0.606 ms
> 64 bytes from 172.16.100.2: icmp_seq=1 ttl=64 time=0.617 ms (DUP!)
> ```
>
> ```
> [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
> recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472),
> packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
> ```
>
> [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on
> ports with IP addresses.")
>
> Signed-off-by: lic121 <lic121@chinatelecom.cn>
> ---
>  ofproto/ofproto-dpif-xlate.c | 45 +++++++++++++++++++++++++++-------------
>  tests/tunnel-push-pop.at     | 49
> ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 80 insertions(+), 14 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 17f7e28..30e636e 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct xlate_ctx
> *ctx, const struct flow *flow)
>  }
>
>  static bool
> -xport_has_ip(const struct xport *xport)
> +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
>  {
>      struct in6_addr *ip_addr, *mask;
>      int n_in6 = 0;
> +    int i;
> +    bool hasip = false;
>
>      if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
>          n_in6 = 0;
>      } else {
> +        for (i = 0; i < n_in6; i++) {
> +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
> +                hasip = true;
> +                break;
> +            }
> +        }
>          free(ip_addr);
>          free(mask);
>      }
> -    return n_in6 ? true : false;
> +    return hasip;
>  }
>
>  static bool
> @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx *ctx,
> const struct xport *xport,
>                          odp_port_t *tnl_port)
>  {
>      *tnl_port = ODPP_NONE;
> +    bool ip_pkt = true;
> +    struct in6_addr ip_dst;
> +
> +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
> +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
> +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
> +        ip_dst = flow->ipv6_dst;
> +    } else {
> +        ip_pkt = false;
> +    }
>
>      /* XXX: Write better Filter for tunnel port. We can use in_port
>       * in tunnel-port flow to avoid these checks completely.
>       *
> -     * Port without an IP address cannot be a tunnel termination point.
> -     * Not performing a lookup in this case to avoid unwildcarding extra
> -     * flow fields (dl_dst). */
> -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> -        && xport_has_ip(xport)) {
> +     * Not performing a lookup if dst ip not match, to avoid unwildcarding
> +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
> +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
> +                flow->nw_proto == IPPROTO_ICMPV6) &&
> +                is_neighbor_reply_correct(ctx, flow)) {
> +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> +                            ctx->xin->allow_side_effects);
> +
> +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
>          *tnl_port = tnl_port_map_lookup(flow, wc);
>
>          /* If no tunnel port was found and it's about an ARP or ICMPv6
> packet,
>           * do tunnel neighbor snooping. */
> -        if (*tnl_port == ODPP_NONE &&
> -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
> -             flow->nw_proto == IPPROTO_ICMPV6) &&
> -             is_neighbor_reply_correct(ctx, flow)) {
> -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> -                            ctx->xin->allow_side_effects);
> -        } else if (*tnl_port != ODPP_NONE &&
> +        if (*tnl_port != ODPP_NONE &&
>                     ctx->xin->allow_side_effects &&
>                     dl_type_is_ip_any(flow->dl_type)) {
>              struct eth_addr mac = flow->dl_src;
> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
> index c633441..b3a3bf9 100644
> --- a/tests/tunnel-push-pop.at
> +++ b/tests/tunnel-push-pop.at
> @@ -808,6 +808,55 @@ NXST_FLOW reply:
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([tunnel_push_pop - multiple output])
> +
> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy
> ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy
> ofport_request=2])
> +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br
> datapath_type=dummy], [0])
> +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
> +                       options:remote_ip=1.1.2.92 options:key=456
> options:seq=true ofport_request=4], [0])
> +
> +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
> +dummy@ovs-dummy: hit:0 missed:0
> +  br0:
> +    br0 65534/100: (dummy-internal)
> +    m0 2/2: (dummy)
> +    p0 1/1: (dummy)
> +  int-br:
> +    int-br 65534/3: (dummy-internal)
> +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
> +])
> +
> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24], [0], [OK
> +])
> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 m0], [0], [OK
> +])
> +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
> +
> +dnl Use arp reply to achieve tunnel next hop mac binding
> +AT_CHECK([ovs-appctl netdev-dummy/receive p0
> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
> +
> +dnl Ouput gre pkt to both br0 port and m0 port
> +dnl m0 port to simulate mirror port, or flood case
> +AT_CHECK([ovs-ofctl add-flow br0
> 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
> +
> +AT_CHECK([ovs-appctl netdev-dummy/receive p0
> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
> +
> +dnl port br0: has the configured ip address of tnl port t1, so
> 'tnl_pop(4)'
> +dnl port m0: does not have a configured tnl ip address, so 'output:2'
> +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
> +flow-dump from the main thread:
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6),
> packets:0, bytes:0, used:never, actions:100,2
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no),
> packets:0, bytes:0, used:never, actions:tnl_pop(4),2
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
>
>  OVS_VSWITCHD_START(
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ilya Maximets May 3, 2022, 8:18 p.m. UTC | #2
On 5/3/22 15:31, Peng He wrote:
> Hi,
> this issue has been found and resolved by two independent patches:
> 
> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/>
> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/>
> 
> but I recently found my patch is still not fixing all the cases.
> 
> would like to have a discussion about this.

Hrm.  It seems to me that the problem is more genric then mirrors.
The reason is that tnl_port_map_lookup() logic doesn't work for
a bridge with more than one tunnel termination port.
Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
as tunnel endpoint."), only the bridge port was allowed to terminate
the tunnel.  MAC+IP lookup performed by the classifier inside the
tnl_port_map_lookup() was enough to determine if termination is
needed or if the packet has to be forwarded without tunnel_pop.
Now we have multiple ports that may or may not terminate the tunnel.

Assuming we have 2 different tunnel ports and two different normal
ports with different IP addresses and we expect the first tunnel
to be terminated on the first port and the second tunnel to be
terminated on the second port.  At the same time, the packet from
the first tunnel should be forwarded to the second port without
tunnel_pop and the packet from the second tunnel should be
forwarded to the first port without tunnel_pop.

Since the classifier in the tunnel-port.c will have rules for
both tunnels and it only matches on MAC+IP (+ some other fileds,
but we don't care much right now), we will have decapsulation
triggered in all cases.  The same as what happened with the
mirrored traffic.

Two patcehs mentioned above only focused on mirrors, so they
will not fix the problem in a common case.  The patch below
is closer to a solution, but it will skip the matches on dl_dst,
and, possibly, other tunnel header fileds, which we absolutely
need in a non-mirror case with 2 tunnels.

The correct solution would be to add an output port match to
the classifier, but clasifier is not designed to do that and
'struct match' doesn't have an appropriate field.  It shoudl
be a separate classifier per ipdev, or something like that.

Thoughts?

In any case, every part of the packet that was used to make a
decision must be unwildcarded.  IP on the port is not part of
the packet, but dst_ip is.

Best regards, Ilya Maximets.

> 
> 
> 
> lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>> 于2022年5月3日周二 20:55写道:
> 
>     In compose_output_action__(), terminate_native_tunnel() is called
>     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
>     is appended. Instead of outputing the pkt to the origin port, pkt
>     is redirected into the tnl port.
> 
>     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
>     testing a pkt regardless of the out port until Ilya's patch[1].
>     Patch[1] adds the condition that terminal ports must have IP
>     address. This patch is to make addtional port ip check that port
>     ip address must match the pkt dst ip. This check covers the mirror
>     port case:
>     To tcpdump underlay pkt, we may create mirror port of dpdk port.
>     When compose_output_action__() is called on the mirror port,
>     pkt is redirected into tnl port again. As a result, ping shows
>     DUP. fast path flow has two tnl_pop actions.
> 
>     ```
>     [root@ovs1 vagrant]# ping 172.16.100.2
>     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.606 ms
>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.617 ms (DUP!)
>     ```
> 
>     ```
>     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
>     recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472), packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
>     ```
> 
>     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.")
> 
>     Signed-off-by: lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>>
>     ---
>      ofproto/ofproto-dpif-xlate.c | 45 +++++++++++++++++++++++++++-------------
>      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49 ++++++++++++++++++++++++++++++++++++++++++++
>      2 files changed, 80 insertions(+), 14 deletions(-)
> 
>     diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>     index 17f7e28..30e636e 100644
>     --- a/ofproto/ofproto-dpif-xlate.c
>     +++ b/ofproto/ofproto-dpif-xlate.c
>     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct xlate_ctx *ctx, const struct flow *flow)
>      }
> 
>      static bool
>     -xport_has_ip(const struct xport *xport)
>     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
>      {
>          struct in6_addr *ip_addr, *mask;
>          int n_in6 = 0;
>     +    int i;
>     +    bool hasip = false;
> 
>          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
>              n_in6 = 0;
>          } else {
>     +        for (i = 0; i < n_in6; i++) {
>     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
>     +                hasip = true;
>     +                break;
>     +            }
>     +        }
>              free(ip_addr);
>              free(mask);
>          }
>     -    return n_in6 ? true : false;
>     +    return hasip;
>      }
> 
>      static bool
>     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
>                              odp_port_t *tnl_port)
>      {
>          *tnl_port = ODPP_NONE;
>     +    bool ip_pkt = true;
>     +    struct in6_addr ip_dst;
>     +
>     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
>     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
>     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>     +        ip_dst = flow->ipv6_dst;
>     +    } else {
>     +        ip_pkt = false;
>     +    }
> 
>          /* XXX: Write better Filter for tunnel port. We can use in_port
>           * in tunnel-port flow to avoid these checks completely.
>           *
>     -     * Port without an IP address cannot be a tunnel termination point.
>     -     * Not performing a lookup in this case to avoid unwildcarding extra
>     -     * flow fields (dl_dst). */
>     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>     -        && xport_has_ip(xport)) {
>     +     * Not performing a lookup if dst ip not match, to avoid unwildcarding
>     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
>     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
>     +                flow->nw_proto == IPPROTO_ICMPV6) &&
>     +                is_neighbor_reply_correct(ctx, flow)) {
>     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>     +                            ctx->xin->allow_side_effects);
>     +
>     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
>              *tnl_port = tnl_port_map_lookup(flow, wc);
> 
>              /* If no tunnel port was found and it's about an ARP or ICMPv6 packet,
>               * do tunnel neighbor snooping. */
>     -        if (*tnl_port == ODPP_NONE &&
>     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
>     -             flow->nw_proto == IPPROTO_ICMPV6) &&
>     -             is_neighbor_reply_correct(ctx, flow)) {
>     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>     -                            ctx->xin->allow_side_effects);
>     -        } else if (*tnl_port != ODPP_NONE &&
>     +        if (*tnl_port != ODPP_NONE &&
>                         ctx->xin->allow_side_effects &&
>                         dl_type_is_ip_any(flow->dl_type)) {
>                  struct eth_addr mac = flow->dl_src;
>     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>     index c633441..b3a3bf9 100644
>     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>     @@ -808,6 +808,55 @@ NXST_FLOW reply:
>      OVS_VSWITCHD_STOP
>      AT_CLEANUP
> 
>     +AT_SETUP([tunnel_push_pop - multiple output])
>     +
>     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy ofport_request=2])
>     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy], [0])
>     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
>     +                       options:remote_ip=1.1.2.92 options:key=456 options:seq=true ofport_request=4], [0])
>     +
>     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
>     +dummy@ovs-dummy: hit:0 missed:0
>     +  br0:
>     +    br0 65534/100: (dummy-internal)
>     +    m0 2/2: (dummy)
>     +    p0 1/1: (dummy)
>     +  int-br:
>     +    int-br 65534/3: (dummy-internal)
>     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
>     +])
>     +
>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 <http://1.1.2.88/24>], [0], [OK
>     +])
>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 <http://1.1.3.88/24>], [0], [OK
>     +])
>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> br0], [0], [OK
>     +])
>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> m0], [0], [OK
>     +])
>     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
>     +
>     +dnl Use arp reply to achieve tunnel next hop mac binding
>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
>     +
>     +dnl Ouput gre pkt to both br0 port and m0 port
>     +dnl m0 port to simulate mirror port, or flood case
>     +AT_CHECK([ovs-ofctl add-flow br0 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
>     +
>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
>     +
>     +dnl port br0: has the configured ip address of tnl port t1, so 'tnl_pop(4)'
>     +dnl port m0: does not have a configured tnl ip address, so 'output:2'
>     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
>     +flow-dump from the main thread:
>     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6), packets:0, bytes:0, used:never, actions:100,2
>     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no), packets:0, bytes:0, used:never, actions:tnl_pop(4),2
>     +])
>     +
>     +OVS_VSWITCHD_STOP
>     +AT_CLEANUP
>     +
>      AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
> 
>      OVS_VSWITCHD_START(
>     -- 
>     1.8.3.1
> 
>     _______________________________________________
>     dev mailing list
>     dev@openvswitch.org <mailto:dev@openvswitch.org>
>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> 
> 
> 
> -- 
> hepeng
Cheng Li May 4, 2022, 3:23 a.m. UTC | #3
On Tue, May 03, 2022 at 10:18:18PM +0200, Ilya Maximets wrote:
> On 5/3/22 15:31, Peng He wrote:
> > Hi,
> > this issue has been found and resolved by two independent patches:
> > 
> > http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/>
> > http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/>
> > 
> > but I recently found my patch is still not fixing all the cases.
> > 
> > would like to have a discussion about this.
> 
> Hrm.  It seems to me that the problem is more genric then mirrors.
> The reason is that tnl_port_map_lookup() logic doesn't work for
> a bridge with more than one tunnel termination port.
> Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
> as tunnel endpoint."), only the bridge port was allowed to terminate
> the tunnel.  MAC+IP lookup performed by the classifier inside the
> tnl_port_map_lookup() was enough to determine if termination is
> needed or if the packet has to be forwarded without tunnel_pop.
> Now we have multiple ports that may or may not terminate the tunnel.
> 
> Assuming we have 2 different tunnel ports and two different normal
> ports with different IP addresses and we expect the first tunnel
> to be terminated on the first port and the second tunnel to be
> terminated on the second port.  At the same time, the packet from
> the first tunnel should be forwarded to the second port without
> tunnel_pop and the packet from the second tunnel should be
> forwarded to the first port without tunnel_pop.
> 
> Since the classifier in the tunnel-port.c will have rules for
> both tunnels and it only matches on MAC+IP (+ some other fileds,
> but we don't care much right now), we will have decapsulation
> triggered in all cases.  The same as what happened with the
> mirrored traffic.
> 
> Two patcehs mentioned above only focused on mirrors, so they
> will not fix the problem in a common case.  The patch below
> is closer to a solution, but it will skip the matches on dl_dst,
> and, possibly, other tunnel header fileds, which we absolutely
> need in a non-mirror case with 2 tunnels.

Hi Ilya, you mean we need to check dl_dst as well before tnl classifier
lookup? For the case that two ports have the same ip address?

> 
> The correct solution would be to add an output port match to
> the classifier, but clasifier is not designed to do that and
> 'struct match' doesn't have an appropriate field.  It shoudl
> be a separate classifier per ipdev, or something like that.
> 
> Thoughts?
> 
> In any case, every part of the packet that was used to make a
> decision must be unwildcarded.  IP on the port is not part of
> the packet, but dst_ip is.
> 
> Best regards, Ilya Maximets.
> 
> > 
> > 
> > 
> > lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>> 于2022年5月3日周二 20:55写道:
> > 
> >     In compose_output_action__(), terminate_native_tunnel() is called
> >     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
> >     is appended. Instead of outputing the pkt to the origin port, pkt
> >     is redirected into the tnl port.
> > 
> >     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
> >     testing a pkt regardless of the out port until Ilya's patch[1].
> >     Patch[1] adds the condition that terminal ports must have IP
> >     address. This patch is to make addtional port ip check that port
> >     ip address must match the pkt dst ip. This check covers the mirror
> >     port case:
> >     To tcpdump underlay pkt, we may create mirror port of dpdk port.
> >     When compose_output_action__() is called on the mirror port,
> >     pkt is redirected into tnl port again. As a result, ping shows
> >     DUP. fast path flow has two tnl_pop actions.
> > 
> >     ```
> >     [root@ovs1 vagrant]# ping 172.16.100.2
> >     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
> >     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.606 ms
> >     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.617 ms (DUP!)
> >     ```
> > 
> >     ```
> >     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
> >     recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472), packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
> >     ```
> > 
> >     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.")
> > 
> >     Signed-off-by: lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>>
> >     ---
> >      ofproto/ofproto-dpif-xlate.c | 45 +++++++++++++++++++++++++++-------------
> >      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49 ++++++++++++++++++++++++++++++++++++++++++++
> >      2 files changed, 80 insertions(+), 14 deletions(-)
> > 
> >     diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >     index 17f7e28..30e636e 100644
> >     --- a/ofproto/ofproto-dpif-xlate.c
> >     +++ b/ofproto/ofproto-dpif-xlate.c
> >     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct xlate_ctx *ctx, const struct flow *flow)
> >      }
> > 
> >      static bool
> >     -xport_has_ip(const struct xport *xport)
> >     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
> >      {
> >          struct in6_addr *ip_addr, *mask;
> >          int n_in6 = 0;
> >     +    int i;
> >     +    bool hasip = false;
> > 
> >          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
> >              n_in6 = 0;
> >          } else {
> >     +        for (i = 0; i < n_in6; i++) {
> >     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
> >     +                hasip = true;
> >     +                break;
> >     +            }
> >     +        }
> >              free(ip_addr);
> >              free(mask);
> >          }
> >     -    return n_in6 ? true : false;
> >     +    return hasip;
> >      }
> > 
> >      static bool
> >     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
> >                              odp_port_t *tnl_port)
> >      {
> >          *tnl_port = ODPP_NONE;
> >     +    bool ip_pkt = true;
> >     +    struct in6_addr ip_dst;
> >     +
> >     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
> >     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
> >     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
> >     +        ip_dst = flow->ipv6_dst;
> >     +    } else {
> >     +        ip_pkt = false;
> >     +    }
> > 
> >          /* XXX: Write better Filter for tunnel port. We can use in_port
> >           * in tunnel-port flow to avoid these checks completely.
> >           *
> >     -     * Port without an IP address cannot be a tunnel termination point.
> >     -     * Not performing a lookup in this case to avoid unwildcarding extra
> >     -     * flow fields (dl_dst). */
> >     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> >     -        && xport_has_ip(xport)) {
> >     +     * Not performing a lookup if dst ip not match, to avoid unwildcarding
> >     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
> >     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
> >     +                flow->nw_proto == IPPROTO_ICMPV6) &&
> >     +                is_neighbor_reply_correct(ctx, flow)) {
> >     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> >     +                            ctx->xin->allow_side_effects);
> >     +
> >     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> >     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
> >              *tnl_port = tnl_port_map_lookup(flow, wc);
> > 
> >              /* If no tunnel port was found and it's about an ARP or ICMPv6 packet,
> >               * do tunnel neighbor snooping. */
> >     -        if (*tnl_port == ODPP_NONE &&
> >     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
> >     -             flow->nw_proto == IPPROTO_ICMPV6) &&
> >     -             is_neighbor_reply_correct(ctx, flow)) {
> >     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> >     -                            ctx->xin->allow_side_effects);
> >     -        } else if (*tnl_port != ODPP_NONE &&
> >     +        if (*tnl_port != ODPP_NONE &&
> >                         ctx->xin->allow_side_effects &&
> >                         dl_type_is_ip_any(flow->dl_type)) {
> >                  struct eth_addr mac = flow->dl_src;
> >     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >     index c633441..b3a3bf9 100644
> >     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >     @@ -808,6 +808,55 @@ NXST_FLOW reply:
> >      OVS_VSWITCHD_STOP
> >      AT_CLEANUP
> > 
> >     +AT_SETUP([tunnel_push_pop - multiple output])
> >     +
> >     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> >     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy ofport_request=2])
> >     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy], [0])
> >     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
> >     +                       options:remote_ip=1.1.2.92 options:key=456 options:seq=true ofport_request=4], [0])
> >     +
> >     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
> >     +dummy@ovs-dummy: hit:0 missed:0
> >     +  br0:
> >     +    br0 65534/100: (dummy-internal)
> >     +    m0 2/2: (dummy)
> >     +    p0 1/1: (dummy)
> >     +  int-br:
> >     +    int-br 65534/3: (dummy-internal)
> >     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
> >     +])
> >     +
> >     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 <http://1.1.2.88/24>], [0], [OK
> >     +])
> >     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 <http://1.1.3.88/24>], [0], [OK
> >     +])
> >     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> br0], [0], [OK
> >     +])
> >     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> m0], [0], [OK
> >     +])
> >     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
> >     +
> >     +dnl Use arp reply to achieve tunnel next hop mac binding
> >     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
> >     +
> >     +dnl Ouput gre pkt to both br0 port and m0 port
> >     +dnl m0 port to simulate mirror port, or flood case
> >     +AT_CHECK([ovs-ofctl add-flow br0 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
> >     +
> >     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
> >     +
> >     +dnl port br0: has the configured ip address of tnl port t1, so 'tnl_pop(4)'
> >     +dnl port m0: does not have a configured tnl ip address, so 'output:2'
> >     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
> >     +flow-dump from the main thread:
> >     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6), packets:0, bytes:0, used:never, actions:100,2
> >     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no), packets:0, bytes:0, used:never, actions:tnl_pop(4),2
> >     +])
> >     +
> >     +OVS_VSWITCHD_STOP
> >     +AT_CLEANUP
> >     +
> >      AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
> > 
> >      OVS_VSWITCHD_START(
> >     -- 
> >     1.8.3.1
> > 
> >     _______________________________________________
> >     dev mailing list
> >     dev@openvswitch.org <mailto:dev@openvswitch.org>
> >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> > 
> > 
> > 
> > -- 
> > hepeng
>
Ilya Maximets May 4, 2022, 10:53 a.m. UTC | #4
On 5/4/22 05:23, lic121 wrote:
> On Tue, May 03, 2022 at 10:18:18PM +0200, Ilya Maximets wrote:
>> On 5/3/22 15:31, Peng He wrote:
>>> Hi,
>>> this issue has been found and resolved by two independent patches:
>>>
>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/>
>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/>
>>>
>>> but I recently found my patch is still not fixing all the cases.
>>>
>>> would like to have a discussion about this.
>>
>> Hrm.  It seems to me that the problem is more genric then mirrors.
>> The reason is that tnl_port_map_lookup() logic doesn't work for
>> a bridge with more than one tunnel termination port.
>> Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
>> as tunnel endpoint."), only the bridge port was allowed to terminate
>> the tunnel.  MAC+IP lookup performed by the classifier inside the
>> tnl_port_map_lookup() was enough to determine if termination is
>> needed or if the packet has to be forwarded without tunnel_pop.
>> Now we have multiple ports that may or may not terminate the tunnel.
>>
>> Assuming we have 2 different tunnel ports and two different normal
>> ports with different IP addresses and we expect the first tunnel
>> to be terminated on the first port and the second tunnel to be
>> terminated on the second port.  At the same time, the packet from
>> the first tunnel should be forwarded to the second port without
>> tunnel_pop and the packet from the second tunnel should be
>> forwarded to the first port without tunnel_pop.
>>
>> Since the classifier in the tunnel-port.c will have rules for
>> both tunnels and it only matches on MAC+IP (+ some other fileds,
>> but we don't care much right now), we will have decapsulation
>> triggered in all cases.  The same as what happened with the
>> mirrored traffic.
>>
>> Two patcehs mentioned above only focused on mirrors, so they
>> will not fix the problem in a common case.  The patch below
>> is closer to a solution, but it will skip the matches on dl_dst,
>> and, possibly, other tunnel header fileds, which we absolutely
>> need in a non-mirror case with 2 tunnels.
> 
> Hi Ilya, you mean we need to check dl_dst as well before tnl classifier
> lookup? For the case that two ports have the same ip address?

If two ports on the same bridge has the same IP, that sounds
like a misconfiguration (I'd argue that having IP assigned
to a bridge member is a misconfiguration in general, but we
have what we have...), so I'm not sure if we need to check
for IP additionally.  But we must unwildcard (add to a flow
match) fields that we're using to make a decision.
tnl_port_map_lookup() does that in a generic way.  It has
information about all the MACs, IPs, VLANs and other fileds
that may or may not be significant for the lookup.
Generic classifier lookup inside that function unwildcards
all the fileds that was actually used during lookup.

So, I'm not convinced if just checking the IP is sufficient,
but we definitely have to unwildcard it in a flow match if
we're checking it.

> 
>>
>> The correct solution would be to add an output port match to
>> the classifier, but clasifier is not designed to do that and
>> 'struct match' doesn't have an appropriate field.  It shoudl
>> be a separate classifier per ipdev, or something like that.
>>
>> Thoughts?
>>
>> In any case, every part of the packet that was used to make a
>> decision must be unwildcarded.  IP on the port is not part of
>> the packet, but dst_ip is.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>>
>>>
>>> lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>> 于2022年5月3日周二 20:55写道:
>>>
>>>     In compose_output_action__(), terminate_native_tunnel() is called
>>>     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
>>>     is appended. Instead of outputing the pkt to the origin port, pkt
>>>     is redirected into the tnl port.
>>>
>>>     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
>>>     testing a pkt regardless of the out port until Ilya's patch[1].
>>>     Patch[1] adds the condition that terminal ports must have IP
>>>     address. This patch is to make addtional port ip check that port
>>>     ip address must match the pkt dst ip. This check covers the mirror
>>>     port case:
>>>     To tcpdump underlay pkt, we may create mirror port of dpdk port.
>>>     When compose_output_action__() is called on the mirror port,
>>>     pkt is redirected into tnl port again. As a result, ping shows
>>>     DUP. fast path flow has two tnl_pop actions.
>>>
>>>     ```
>>>     [root@ovs1 vagrant]# ping 172.16.100.2
>>>     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
>>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.606 ms
>>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.617 ms (DUP!)
>>>     ```
>>>
>>>     ```
>>>     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
>>>     recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472), packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
>>>     ```
>>>
>>>     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.")
>>>
>>>     Signed-off-by: lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>>
>>>     ---
>>>      ofproto/ofproto-dpif-xlate.c | 45 +++++++++++++++++++++++++++-------------
>>>      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49 ++++++++++++++++++++++++++++++++++++++++++++
>>>      2 files changed, 80 insertions(+), 14 deletions(-)
>>>
>>>     diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>>     index 17f7e28..30e636e 100644
>>>     --- a/ofproto/ofproto-dpif-xlate.c
>>>     +++ b/ofproto/ofproto-dpif-xlate.c
>>>     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct xlate_ctx *ctx, const struct flow *flow)
>>>      }
>>>
>>>      static bool
>>>     -xport_has_ip(const struct xport *xport)
>>>     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
>>>      {
>>>          struct in6_addr *ip_addr, *mask;
>>>          int n_in6 = 0;
>>>     +    int i;
>>>     +    bool hasip = false;
>>>
>>>          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
>>>              n_in6 = 0;
>>>          } else {
>>>     +        for (i = 0; i < n_in6; i++) {
>>>     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
>>>     +                hasip = true;
>>>     +                break;
>>>     +            }
>>>     +        }
>>>              free(ip_addr);
>>>              free(mask);
>>>          }
>>>     -    return n_in6 ? true : false;
>>>     +    return hasip;
>>>      }
>>>
>>>      static bool
>>>     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
>>>                              odp_port_t *tnl_port)
>>>      {
>>>          *tnl_port = ODPP_NONE;
>>>     +    bool ip_pkt = true;
>>>     +    struct in6_addr ip_dst;
>>>     +
>>>     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
>>>     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
>>>     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
>>>     +        ip_dst = flow->ipv6_dst;
>>>     +    } else {
>>>     +        ip_pkt = false;
>>>     +    }
>>>
>>>          /* XXX: Write better Filter for tunnel port. We can use in_port
>>>           * in tunnel-port flow to avoid these checks completely.
>>>           *
>>>     -     * Port without an IP address cannot be a tunnel termination point.
>>>     -     * Not performing a lookup in this case to avoid unwildcarding extra
>>>     -     * flow fields (dl_dst). */
>>>     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>>>     -        && xport_has_ip(xport)) {
>>>     +     * Not performing a lookup if dst ip not match, to avoid unwildcarding
>>>     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
>>>     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
>>>     +                flow->nw_proto == IPPROTO_ICMPV6) &&
>>>     +                is_neighbor_reply_correct(ctx, flow)) {
>>>     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>>>     +                            ctx->xin->allow_side_effects);
>>>     +
>>>     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
>>>     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
>>>              *tnl_port = tnl_port_map_lookup(flow, wc);
>>>
>>>              /* If no tunnel port was found and it's about an ARP or ICMPv6 packet,
>>>               * do tunnel neighbor snooping. */
>>>     -        if (*tnl_port == ODPP_NONE &&
>>>     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
>>>     -             flow->nw_proto == IPPROTO_ICMPV6) &&
>>>     -             is_neighbor_reply_correct(ctx, flow)) {
>>>     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
>>>     -                            ctx->xin->allow_side_effects);
>>>     -        } else if (*tnl_port != ODPP_NONE &&
>>>     +        if (*tnl_port != ODPP_NONE &&
>>>                         ctx->xin->allow_side_effects &&
>>>                         dl_type_is_ip_any(flow->dl_type)) {
>>>                  struct eth_addr mac = flow->dl_src;
>>>     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>>>     index c633441..b3a3bf9 100644
>>>     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>>>     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
>>>     @@ -808,6 +808,55 @@ NXST_FLOW reply:
>>>      OVS_VSWITCHD_STOP
>>>      AT_CLEANUP
>>>
>>>     +AT_SETUP([tunnel_push_pop - multiple output])
>>>     +
>>>     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
>>>     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy ofport_request=2])
>>>     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy], [0])
>>>     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
>>>     +                       options:remote_ip=1.1.2.92 options:key=456 options:seq=true ofport_request=4], [0])
>>>     +
>>>     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
>>>     +dummy@ovs-dummy: hit:0 missed:0
>>>     +  br0:
>>>     +    br0 65534/100: (dummy-internal)
>>>     +    m0 2/2: (dummy)
>>>     +    p0 1/1: (dummy)
>>>     +  int-br:
>>>     +    int-br 65534/3: (dummy-internal)
>>>     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
>>>     +])
>>>     +
>>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 <http://1.1.2.88/24>], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 <http://1.1.3.88/24>], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> br0], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> m0], [0], [OK
>>>     +])
>>>     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
>>>     +
>>>     +dnl Use arp reply to achieve tunnel next hop mac binding
>>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
>>>     +
>>>     +dnl Ouput gre pkt to both br0 port and m0 port
>>>     +dnl m0 port to simulate mirror port, or flood case
>>>     +AT_CHECK([ovs-ofctl add-flow br0 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
>>>     +
>>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
>>>     +
>>>     +dnl port br0: has the configured ip address of tnl port t1, so 'tnl_pop(4)'
>>>     +dnl port m0: does not have a configured tnl ip address, so 'output:2'
>>>     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
>>>     +flow-dump from the main thread:
>>>     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6), packets:0, bytes:0, used:never, actions:100,2
>>>     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no), packets:0, bytes:0, used:never, actions:tnl_pop(4),2
>>>     +])
>>>     +
>>>     +OVS_VSWITCHD_STOP
>>>     +AT_CLEANUP
>>>     +
>>>      AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
>>>
>>>      OVS_VSWITCHD_START(
>>>     -- 
>>>     1.8.3.1
>>>
>>>     _______________________________________________
>>>     dev mailing list
>>>     dev@openvswitch.org <mailto:dev@openvswitch.org>
>>>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>>
>>>
>>>
>>> -- 
>>> hepeng
>>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Cheng Li May 4, 2022, 2 p.m. UTC | #5
On Wed, May 04, 2022 at 12:53:34PM +0200, Ilya Maximets wrote:
> On 5/4/22 05:23, lic121 wrote:
> > On Tue, May 03, 2022 at 10:18:18PM +0200, Ilya Maximets wrote:
> >> On 5/3/22 15:31, Peng He wrote:
> >>> Hi,
> >>> this issue has been found and resolved by two independent patches:
> >>>
> >>> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/>
> >>> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/ <http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/>
> >>>
> >>> but I recently found my patch is still not fixing all the cases.
> >>>
> >>> would like to have a discussion about this.
> >>
> >> Hrm.  It seems to me that the problem is more genric then mirrors.
> >> The reason is that tnl_port_map_lookup() logic doesn't work for
> >> a bridge with more than one tunnel termination port.
> >> Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
> >> as tunnel endpoint."), only the bridge port was allowed to terminate
> >> the tunnel.  MAC+IP lookup performed by the classifier inside the
> >> tnl_port_map_lookup() was enough to determine if termination is
> >> needed or if the packet has to be forwarded without tunnel_pop.
> >> Now we have multiple ports that may or may not terminate the tunnel.
> >>
> >> Assuming we have 2 different tunnel ports and two different normal
> >> ports with different IP addresses and we expect the first tunnel
> >> to be terminated on the first port and the second tunnel to be
> >> terminated on the second port.  At the same time, the packet from
> >> the first tunnel should be forwarded to the second port without
> >> tunnel_pop and the packet from the second tunnel should be
> >> forwarded to the first port without tunnel_pop.
> >>
> >> Since the classifier in the tunnel-port.c will have rules for
> >> both tunnels and it only matches on MAC+IP (+ some other fileds,
> >> but we don't care much right now), we will have decapsulation
> >> triggered in all cases.  The same as what happened with the
> >> mirrored traffic.
> >>
> >> Two patcehs mentioned above only focused on mirrors, so they
> >> will not fix the problem in a common case.  The patch below
> >> is closer to a solution, but it will skip the matches on dl_dst,
> >> and, possibly, other tunnel header fileds, which we absolutely
> >> need in a non-mirror case with 2 tunnels.
> > 
> > Hi Ilya, you mean we need to check dl_dst as well before tnl classifier
> > lookup? For the case that two ports have the same ip address?
> 
> If two ports on the same bridge has the same IP, that sounds
> like a misconfiguration (I'd argue that having IP assigned
> to a bridge member is a misconfiguration in general, but we
> have what we have...), so I'm not sure if we need to check
> for IP additionally.  But we must unwildcard (add to a flow
> match) fields that we're using to make a decision.

Thanks for the explanation, yes you are right. We do need to unwildcard.

> tnl_port_map_lookup() does that in a generic way.  It has
> information about all the MACs, IPs, VLANs and other fileds
> that may or may not be significant for the lookup.
> Generic classifier lookup inside that function unwildcards
> all the fileds that was actually used during lookup.
> 
> So, I'm not convinced if just checking the IP is sufficient,

An ovs port mainly has two propertyies: IP and MAC. MAC is not a property
for most tunnels. If two ports have the same IP but different MAC. We
can't tell which port is the right tunnel terminal port. So I checked
the IP only, but missed the unwildcard as you mentioned.

> but we definitely have to unwildcard it in a flow match if
> we're checking it.
> 
> > 
> >>
> >> The correct solution would be to add an output port match to
> >> the classifier, but clasifier is not designed to do that and
> >> 'struct match' doesn't have an appropriate field.  It shoudl
> >> be a separate classifier per ipdev, or something like that.
> >>
> >> Thoughts?
> >>
> >> In any case, every part of the packet that was used to make a
> >> decision must be unwildcarded.  IP on the port is not part of
> >> the packet, but dst_ip is.
> >>
> >> Best regards, Ilya Maximets.
> >>
> >>>
> >>>
> >>>
> >>> lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>> 于2022年5月3日周二 20:55写道:
> >>>
> >>>     In compose_output_action__(), terminate_native_tunnel() is called
> >>>     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
> >>>     is appended. Instead of outputing the pkt to the origin port, pkt
> >>>     is redirected into the tnl port.
> >>>
> >>>     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
> >>>     testing a pkt regardless of the out port until Ilya's patch[1].
> >>>     Patch[1] adds the condition that terminal ports must have IP
> >>>     address. This patch is to make addtional port ip check that port
> >>>     ip address must match the pkt dst ip. This check covers the mirror
> >>>     port case:
> >>>     To tcpdump underlay pkt, we may create mirror port of dpdk port.
> >>>     When compose_output_action__() is called on the mirror port,
> >>>     pkt is redirected into tnl port again. As a result, ping shows
> >>>     DUP. fast path flow has two tnl_pop actions.
> >>>
> >>>     ```
> >>>     [root@ovs1 vagrant]# ping 172.16.100.2
> >>>     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
> >>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.606 ms
> >>>     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1 ttl=64 time=0.617 ms (DUP!)
> >>>     ```
> >>>
> >>>     ```
> >>>     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
> >>>     recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472), packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
> >>>     ```
> >>>
> >>>     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.")
> >>>
> >>>     Signed-off-by: lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>>
> >>>     ---
> >>>      ofproto/ofproto-dpif-xlate.c | 45 +++++++++++++++++++++++++++-------------
> >>>      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49 ++++++++++++++++++++++++++++++++++++++++++++
> >>>      2 files changed, 80 insertions(+), 14 deletions(-)
> >>>
> >>>     diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >>>     index 17f7e28..30e636e 100644
> >>>     --- a/ofproto/ofproto-dpif-xlate.c
> >>>     +++ b/ofproto/ofproto-dpif-xlate.c
> >>>     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct xlate_ctx *ctx, const struct flow *flow)
> >>>      }
> >>>
> >>>      static bool
> >>>     -xport_has_ip(const struct xport *xport)
> >>>     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
> >>>      {
> >>>          struct in6_addr *ip_addr, *mask;
> >>>          int n_in6 = 0;
> >>>     +    int i;
> >>>     +    bool hasip = false;
> >>>
> >>>          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
> >>>              n_in6 = 0;
> >>>          } else {
> >>>     +        for (i = 0; i < n_in6; i++) {
> >>>     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
> >>>     +                hasip = true;
> >>>     +                break;
> >>>     +            }
> >>>     +        }
> >>>              free(ip_addr);
> >>>              free(mask);
> >>>          }
> >>>     -    return n_in6 ? true : false;
> >>>     +    return hasip;
> >>>      }
> >>>
> >>>      static bool
> >>>     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
> >>>                              odp_port_t *tnl_port)
> >>>      {
> >>>          *tnl_port = ODPP_NONE;
> >>>     +    bool ip_pkt = true;
> >>>     +    struct in6_addr ip_dst;
> >>>     +
> >>>     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
> >>>     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
> >>>     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
> >>>     +        ip_dst = flow->ipv6_dst;
> >>>     +    } else {
> >>>     +        ip_pkt = false;
> >>>     +    }
> >>>
> >>>          /* XXX: Write better Filter for tunnel port. We can use in_port
> >>>           * in tunnel-port flow to avoid these checks completely.
> >>>           *
> >>>     -     * Port without an IP address cannot be a tunnel termination point.
> >>>     -     * Not performing a lookup in this case to avoid unwildcarding extra
> >>>     -     * flow fields (dl_dst). */
> >>>     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> >>>     -        && xport_has_ip(xport)) {
> >>>     +     * Not performing a lookup if dst ip not match, to avoid unwildcarding
> >>>     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
> >>>     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
> >>>     +                flow->nw_proto == IPPROTO_ICMPV6) &&
> >>>     +                is_neighbor_reply_correct(ctx, flow)) {
> >>>     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> >>>     +                            ctx->xin->allow_side_effects);
> >>>     +
> >>>     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> >>>     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
> >>>              *tnl_port = tnl_port_map_lookup(flow, wc);
> >>>
> >>>              /* If no tunnel port was found and it's about an ARP or ICMPv6 packet,
> >>>               * do tunnel neighbor snooping. */
> >>>     -        if (*tnl_port == ODPP_NONE &&
> >>>     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
> >>>     -             flow->nw_proto == IPPROTO_ICMPV6) &&
> >>>     -             is_neighbor_reply_correct(ctx, flow)) {
> >>>     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> >>>     -                            ctx->xin->allow_side_effects);
> >>>     -        } else if (*tnl_port != ODPP_NONE &&
> >>>     +        if (*tnl_port != ODPP_NONE &&
> >>>                         ctx->xin->allow_side_effects &&
> >>>                         dl_type_is_ip_any(flow->dl_type)) {
> >>>                  struct eth_addr mac = flow->dl_src;
> >>>     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >>>     index c633441..b3a3bf9 100644
> >>>     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >>>     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> >>>     @@ -808,6 +808,55 @@ NXST_FLOW reply:
> >>>      OVS_VSWITCHD_STOP
> >>>      AT_CLEANUP
> >>>
> >>>     +AT_SETUP([tunnel_push_pop - multiple output])
> >>>     +
> >>>     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> >>>     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy ofport_request=2])
> >>>     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy], [0])
> >>>     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
> >>>     +                       options:remote_ip=1.1.2.92 options:key=456 options:seq=true ofport_request=4], [0])
> >>>     +
> >>>     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
> >>>     +dummy@ovs-dummy: hit:0 missed:0
> >>>     +  br0:
> >>>     +    br0 65534/100: (dummy-internal)
> >>>     +    m0 2/2: (dummy)
> >>>     +    p0 1/1: (dummy)
> >>>     +  int-br:
> >>>     +    int-br 65534/3: (dummy-internal)
> >>>     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
> >>>     +])
> >>>     +
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 <http://1.1.2.88/24>], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 <http://1.1.3.88/24>], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <http://1.1.2.92/24> br0], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <http://1.1.3.92/24> m0], [0], [OK
> >>>     +])
> >>>     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
> >>>     +
> >>>     +dnl Use arp reply to achieve tunnel next hop mac binding
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
> >>>     +
> >>>     +dnl Ouput gre pkt to both br0 port and m0 port
> >>>     +dnl m0 port to simulate mirror port, or flood case
> >>>     +AT_CHECK([ovs-ofctl add-flow br0 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
> >>>     +
> >>>     +AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
> >>>     +
> >>>     +dnl port br0: has the configured ip address of tnl port t1, so 'tnl_pop(4)'
> >>>     +dnl port m0: does not have a configured tnl ip address, so 'output:2'
> >>>     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
> >>>     +flow-dump from the main thread:
> >>>     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6), packets:0, bytes:0, used:never, actions:100,2
> >>>     +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no), packets:0, bytes:0, used:never, actions:tnl_pop(4),2
> >>>     +])
> >>>     +
> >>>     +OVS_VSWITCHD_STOP
> >>>     +AT_CLEANUP
> >>>     +
> >>>      AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
> >>>
> >>>      OVS_VSWITCHD_START(
> >>>     -- 
> >>>     1.8.3.1
> >>>
> >>>     _______________________________________________
> >>>     dev mailing list
> >>>     dev@openvswitch.org <mailto:dev@openvswitch.org>
> >>>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >>>
> >>>
> >>>
> >>> -- 
> >>> hepeng
> >>
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Peng He May 5, 2022, 2:14 a.m. UTC | #6
Hi, lic121 and Ilya,

Ilya, it looks like I did not get the first reply from you.

But it's ok, I now understand the solution, which is to restrict the
termination condition rather than specifying the mirror context,
and yes, this is a more general fix.

Thanks for the explanation. I have some further questions.


lic121 <lic121@chinatelecom.cn> 于2022年5月4日周三 11:23写道:

> On Tue, May 03, 2022 at 10:18:18PM +0200, Ilya Maximets wrote:
> > On 5/3/22 15:31, Peng He wrote:
> > > Hi,
> > > this issue has been found and resolved by two independent patches:
> > >
> > >
> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/
> <
> http://patchwork.ozlabs.org/project/openvswitch/patch/20190726121839.92497-1-wang.li@139.com/
> >
> > >
> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/
> <
> http://patchwork.ozlabs.org/project/openvswitch/patch/20201009121503.43893-1-hepeng.0320@bytedance.com/
> >
> > >
> > > but I recently found my patch is still not fixing all the cases.
> > >
> > > would like to have a discussion about this.
> >
> > Hrm.  It seems to me that the problem is more genric then mirrors.
> > The reason is that tnl_port_map_lookup() logic doesn't work for
> > a bridge with more than one tunnel termination port.
> > Before commit dc0bd12f5b04 ("userspace: Enable non-bridge port
> > as tunnel endpoint."), only the bridge port was allowed to terminate
> > the tunnel.  MAC+IP lookup performed by the classifier inside the
> > tnl_port_map_lookup() was enough to determine if termination is
> > needed or if the packet has to be forwarded without tunnel_pop.
> > Now we have multiple ports that may or may not terminate the tunnel.
> >
> > Assuming we have 2 different tunnel ports and two different normal
> > ports with different IP addresses and we expect the first tunnel
> > to be terminated on the first port and the second tunnel to be
> > terminated on the second port.  At the same time, the packet from
> > the first tunnel should be forwarded to the second port without
> > tunnel_pop and the packet from the second tunnel should be
> > forwarded to the first port without tunnel_pop.
> >
> > Since the classifier in the tunnel-port.c will have rules for
> > both tunnels and it only matches on MAC+IP (+ some other fileds,
> > but we don't care much right now), we will have decapsulation
> > triggered in all cases.  The same as what happened with the
> > mirrored traffic.
> >
> > Two patcehs mentioned above only focused on mirrors, so they
> > will not fix the problem in a common case.  The patch below
> > is closer to a solution, but it will skip the matches on dl_dst,
> > and, possibly, other tunnel header fileds, which we absolutely
> > need in a non-mirror case with 2 tunnels.
>





>
> Hi Ilya, you mean we need to check dl_dst as well before tnl classifier
> lookup? For the case that two ports have the same ip address?
>
> >
> > The correct solution would be to add an output port match to
> > the classifier, but clasifier is not designed to do that and
> > 'struct match' doesn't have an appropriate field.  It shoudl
> > be a separate classifier per ipdev, or something like that.
> >
> > Thoughts?
>


 did you mean that if the output port is NORMAL or LOCAL, the tunnel filter
will work,
however, if it's a concrete port like a normal one, we should not decap it
but just forward it.
Event the IP addresses and port matches?



> >
> > In any case, every part of the packet that was used to make a
> > decision must be unwildcarded.  IP on the port is not part of
> > the packet, but dst_ip is.
> >
> > Best regards, Ilya Maximets.
> >
> > >
> > >
> > >
> > > lic121 <lic121@chinatelecom.cn <mailto:lic121@chinatelecom.cn>>
> 于2022年5月3日周二 20:55写道:
> > >
> > >     In compose_output_action__(), terminate_native_tunnel() is called
> > >     to test if a pkt matches a tnl port. If yes, a TUNNEL_POP action
> > >     is appended. Instead of outputing the pkt to the origin port, pkt
> > >     is redirected into the tnl port.
> > >
> > >     pkt metadata(i.e. nw_proto, dst_ip, dst_port) is considered when
> > >     testing a pkt regardless of the out port until Ilya's patch[1].
> > >     Patch[1] adds the condition that terminal ports must have IP
> > >     address. This patch is to make addtional port ip check that port
> > >     ip address must match the pkt dst ip. This check covers the mirror
> > >     port case:
> > >     To tcpdump underlay pkt, we may create mirror port of dpdk port.
> > >     When compose_output_action__() is called on the mirror port,
> > >     pkt is redirected into tnl port again. As a result, ping shows
> > >     DUP. fast path flow has two tnl_pop actions.
> > >
> > >     ```
> > >     [root@ovs1 vagrant]# ping 172.16.100.2
> > >     PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data.
> > >     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1
> ttl=64 time=0.606 ms
> > >     64 bytes from 172.16.100.2 <http://172.16.100.2>: icmp_seq=1
> ttl=64 time=0.617 ms (DUP!)
> > >     ```
> > >
> > >     ```
> > >     [root@ovs ovs]# ovs-appctl dpctl/dump-flows |grep tnl_pop
> > >
>  recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=52:54:00:a6:c6:de,dst=ea:b4:09:72:44:4c),eth_type(0x0800),ipv4(dst=192.168.124.33,proto=17,frag=no),udp(dst=8472),
> packets:1, bytes:148, used:0.637s, actions:tnl_pop(2),tnl_pop(2)
> > >     ```
> > >
> > >     [1] (d584bb2b6a1 "ofproto-dpif-xlate: Terminate native tunnels
> only on ports with IP addresses.")
> > >
> > >     Signed-off-by: lic121 <lic121@chinatelecom.cn <mailto:
> lic121@chinatelecom.cn>>
> > >     ---
> > >      ofproto/ofproto-dpif-xlate.c | 45
> +++++++++++++++++++++++++++-------------
> > >      tests/tunnel-push-pop.at <http://tunnel-push-pop.at>     | 49
> ++++++++++++++++++++++++++++++++++++++++++++
> > >      2 files changed, 80 insertions(+), 14 deletions(-)
> > >
> > >     diff --git a/ofproto/ofproto-dpif-xlate.c
> b/ofproto/ofproto-dpif-xlate.c
> > >     index 17f7e28..30e636e 100644
> > >     --- a/ofproto/ofproto-dpif-xlate.c
> > >     +++ b/ofproto/ofproto-dpif-xlate.c
> > >     @@ -4107,18 +4107,26 @@ is_neighbor_reply_correct(const struct
> xlate_ctx *ctx, const struct flow *flow)
> > >      }
> > >
> > >      static bool
> > >     -xport_has_ip(const struct xport *xport)
> > >     +xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
> > >      {
> > >          struct in6_addr *ip_addr, *mask;
> > >          int n_in6 = 0;
> > >     +    int i;
> > >     +    bool hasip = false;
> > >
> > >          if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask,
> &n_in6)) {
> > >              n_in6 = 0;
> > >          } else {
> > >     +        for (i = 0; i < n_in6; i++) {
> > >     +            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
> > >     +                hasip = true;
> > >     +                break;
> > >     +            }
> > >     +        }
> > >              free(ip_addr);
> > >              free(mask);
> > >          }
> > >     -    return n_in6 ? true : false;
> > >     +    return hasip;
> > >      }
> > >
> > >      static bool
> > >     @@ -4127,26 +4135,35 @@ terminate_native_tunnel(struct xlate_ctx
> *ctx, const struct xport *xport,
> > >                              odp_port_t *tnl_port)
> > >      {
> > >          *tnl_port = ODPP_NONE;
> > >     +    bool ip_pkt = true;
> > >     +    struct in6_addr ip_dst;
> > >     +
> > >     +    if (flow->dl_type == htons(ETH_TYPE_IP)) {
> > >     +        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
> > >     +    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
> > >     +        ip_dst = flow->ipv6_dst;
> > >     +    } else {
> > >     +        ip_pkt = false;
> > >     +    }
> > >
> > >          /* XXX: Write better Filter for tunnel port. We can use
> in_port
> > >           * in tunnel-port flow to avoid these checks completely.
> > >           *
> > >     -     * Port without an IP address cannot be a tunnel termination
> point.
> > >     -     * Not performing a lookup in this case to avoid
> unwildcarding extra
> > >     -     * flow fields (dl_dst). */
> > >     -    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> > >     -        && xport_has_ip(xport)) {
> > >     +     * Not performing a lookup if dst ip not match, to avoid
> unwildcarding
> > >     +     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop.
> */
> > >     +    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
> > >     +                flow->nw_proto == IPPROTO_ICMPV6) &&
> > >     +                is_neighbor_reply_correct(ctx, flow)) {
> > >     +            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> > >     +                            ctx->xin->allow_side_effects);
> > >     +
> > >     +    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
> > >     +        && ip_pkt && xport_has_ip(xport, ip_dst)) {
> > >              *tnl_port = tnl_port_map_lookup(flow, wc);
> > >
> > >              /* If no tunnel port was found and it's about an ARP or
> ICMPv6 packet,
> > >               * do tunnel neighbor snooping. */
> > >     -        if (*tnl_port == ODPP_NONE &&
> > >     -            (flow->dl_type == htons(ETH_TYPE_ARP) ||
> > >     -             flow->nw_proto == IPPROTO_ICMPV6) &&
> > >     -             is_neighbor_reply_correct(ctx, flow)) {
> > >     -            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
> > >     -                            ctx->xin->allow_side_effects);
> > >     -        } else if (*tnl_port != ODPP_NONE &&
> > >     +        if (*tnl_port != ODPP_NONE &&
> > >                         ctx->xin->allow_side_effects &&
> > >                         dl_type_is_ip_any(flow->dl_type)) {
> > >                  struct eth_addr mac = flow->dl_src;
> > >     diff --git a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> > >     index c633441..b3a3bf9 100644
> > >     --- a/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> > >     +++ b/tests/tunnel-push-pop.at <http://tunnel-push-pop.at>
> > >     @@ -808,6 +808,55 @@ NXST_FLOW reply:
> > >      OVS_VSWITCHD_STOP
> > >      AT_CLEANUP
> > >
> > >     +AT_SETUP([tunnel_push_pop - multiple output])
> > >     +
> > >     +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0
> type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
> > >     +AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0
> type=dummy ofport_request=2])
> > >     +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br
> datapath_type=dummy], [0])
> > >     +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1
> type=gre \
> > >     +                       options:remote_ip=1.1.2.92 options:key=456
> options:seq=true ofport_request=4], [0])
> > >     +
> > >     +AT_CHECK([ovs-appctl dpif/show], [0], [dnl
> > >     +dummy@ovs-dummy: hit:0 missed:0
> > >     +  br0:
> > >     +    br0 65534/100: (dummy-internal)
> > >     +    m0 2/2: (dummy)
> > >     +    p0 1/1: (dummy)
> > >     +  int-br:
> > >     +    int-br 65534/3: (dummy-internal)
> > >     +    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
> > >     +])
> > >     +
> > >     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24 <
> http://1.1.2.88/24>], [0], [OK
> > >     +])
> > >     +AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24 <
> http://1.1.3.88/24>], [0], [OK
> > >     +])
> > >     +AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 <
> http://1.1.2.92/24> br0], [0], [OK
> > >     +])
> > >     +AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 <
> http://1.1.3.92/24> m0], [0], [OK
> > >     +])
> > >     +AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
> > >     +
> > >     +dnl Use arp reply to achieve tunnel next hop mac binding
> > >     +AT_CHECK([ovs-appctl netdev-dummy/receive p0
> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
> > >     +
> > >     +dnl Ouput gre pkt to both br0 port and m0 port
> > >     +dnl m0 port to simulate mirror port, or flood case
> > >     +AT_CHECK([ovs-ofctl add-flow br0
> 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
> > >     +
> > >     +AT_CHECK([ovs-appctl netdev-dummy/receive p0
> 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
> > >     +
> > >     +dnl port br0: has the configured ip address of tnl port t1, so
> 'tnl_pop(4)'
> > >     +dnl port m0: does not have a configured tnl ip address, so
> 'output:2'
> > >     +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
> > >     +flow-dump from the main thread:
> > >
>  +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6),
> packets:0, bytes:0, used:never, actions:100,2
> > >
>  +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no),
> packets:0, bytes:0, used:never, actions:tnl_pop(4),2
> > >     +])
> > >     +
> > >     +OVS_VSWITCHD_STOP
> > >     +AT_CLEANUP
> > >     +
> > >      AT_SETUP([tunnel_push_pop - flow translation on unrelated
> bridges])
> > >
> > >      OVS_VSWITCHD_START(
> > >     --
> > >     1.8.3.1
> > >
> > >     _______________________________________________
> > >     dev mailing list
> > >     dev@openvswitch.org <mailto:dev@openvswitch.org>
> > >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev <
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> > >
> > >
> > >
> > > --
> > > hepeng
> >
>
Ilya Maximets May 10, 2022, 10:27 p.m. UTC | #7
On 5/5/22 04:14, Peng He wrote:
> Hi, lic121 and Ilya,
> 
> Ilya, it looks like I did not get the first reply from you.

I'm not sure what is happening with this thread.  I'm also not receiving
some emails from it.

> 
> But it's ok, I now understand the solution, which is to restrict the
> termination condition rather than specifying the mirror context,
> and yes, this is a more general fix.

I'm getting second thoughts about all this though.
Having IP addresses on two bridge ports seems to be a misconfiguration.
Why do you need an IP address on the mirror port?  AFAICT, ovs-tcpdump
doesn't set IP address to the mirror port, or am I missing something?

Why I think it is a misconfiguration: In case of a kernel datapath and
a normal tunneling decapsulation is happening outside of the OVS, so
you will have 2 equal vxlan packets being sent to 2 bridge ports and
they will enter the normal kernel in the same network namespace.
So it's a DUP tunnel packet.  They will likely both be decapsulated
within that network namespace (because the destination interface is
there), and we'll have a DUP icmp packet.  I didn't try that though.

The correct mirroring configuration would be to have a mirror port
be one side of a veth pair with the other side in the different
network namespace, or just a separate physical port.  In both cases
the mirror port doesn't need to have an IP address.

What do you think?  Does that make sense?

Best regards, Ilya Maximets.
Peng He May 11, 2022, 1:45 a.m. UTC | #8
Hi, Ilya,

Ilya Maximets <i.maximets@ovn.org> 于2022年5月11日周三 06:28写道:

> On 5/5/22 04:14, Peng He wrote:
> > Hi, lic121 and Ilya,
> >
> > Ilya, it looks like I did not get the first reply from you.
>
> I'm not sure what is happening with this thread.  I'm also not receiving
> some emails from it.
>
> >
> > But it's ok, I now understand the solution, which is to restrict the
> > termination condition rather than specifying the mirror context,
> > and yes, this is a more general fix.
>
> I'm getting second thoughts about all this though.
> Having IP addresses on two bridge ports seems to be a misconfiguration.
> Why do you need an IP address on the mirror port?  AFAICT, ovs-tcpdump
> doesn't set IP address to the mirror port, or am I missing something?
>

I have also asked lic121, and he told me that,
yes, the mirror port is not assigned any IP addresses.
But an IPv6 link address has been assigned to the port automatically, and
that's
why it still translates into two tnl_pop actions. the DUP is because the
packet has
been cloned due to the two tunl_pop actions.

IMO, lic121's patch has also fixed the two tunnel issues you mentioned, by
matching
the dst IP address, when packets coming from one normal port are forwarded
to another normal port, the termination will not be performed due to the
dst IP address
does not match.

So maybe adding a unmask to his patch will solve the issue?



> Why I think it is a misconfiguration: In case of a kernel datapath and
> a normal tunneling decapsulation is happening outside of the OVS, so
> you will have 2 equal vxlan packets being sent to 2 bridge ports and
> they will enter the normal kernel in the same network namespace.
> So it's a DUP tunnel packet.  They will likely both be decapsulated
> within that network namespace (because the destination interface is
> there), and we'll have a DUP icmp packet.  I didn't try that though.
>
> The correct mirroring configuration would be to have a mirror port
> be one side of a veth pair with the other side in the different
> network namespace, or just a separate physical port.  In both cases
> the mirror port doesn't need to have an IP address.
>
> What do you think?  Does that make sense?
>
> Best regards, Ilya Maximets.
>
Ilya Maximets May 16, 2022, 6:33 p.m. UTC | #9
On 5/11/22 03:45, Peng He wrote:
> Hi, Ilya,
> 
> Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> 于2022年5月11日周三 06:28写道:
> 
>     On 5/5/22 04:14, Peng He wrote:
>     > Hi, lic121 and Ilya,
>     >
>     > Ilya, it looks like I did not get the first reply from you.
> 
>     I'm not sure what is happening with this thread.  I'm also not receiving
>     some emails from it.
> 
>     >
>     > But it's ok, I now understand the solution, which is to restrict the
>     > termination condition rather than specifying the mirror context,
>     > and yes, this is a more general fix.
> 
>     I'm getting second thoughts about all this though.
>     Having IP addresses on two bridge ports seems to be a misconfiguration.
>     Why do you need an IP address on the mirror port?  AFAICT, ovs-tcpdump
>     doesn't set IP address to the mirror port, or am I missing something?
> 
> 
> I have also asked lic121, and he told me that, 
> yes, the mirror port is not assigned any IP addresses. 
> But an IPv6 link address has been assigned to the port automatically, and that's
> why it still translates into two tnl_pop actions. the DUP is because the packet has
> been cloned due to the two tunl_pop actions.

Hmm, I see.  Link-local IPv6 address is an issue indeed.
But can we solve that by just adding 'ip addr flush' to the
ovs-tcpdump script?

> 
> IMO, lic121's patch has also fixed the two tunnel issues you mentioned, by matching
> the dst IP address, when packets coming from one normal port are forwarded
> to another normal port, the termination will not be performed due to the dst IP address
> does not match.
> 
> So maybe adding a unmask to his patch will solve the issue?

I'm not convinced that these are actual issues anymore.
I'd say such configuration is incorrect, because it will
lead to duplicated packets even in the scenario with
kernel tunneling.  So, I'm not sure these cases needs a
fix.  in general, creating a match outside of the tunnel
classifier doesn't look like a great solution, because
the classifier will perform almost the same matching again
just a few lines of code below.

>  
> 
> 
>     Why I think it is a misconfiguration: In case of a kernel datapath and
>     a normal tunneling decapsulation is happening outside of the OVS, so
>     you will have 2 equal vxlan packets being sent to 2 bridge ports and
>     they will enter the normal kernel in the same network namespace.
>     So it's a DUP tunnel packet.  They will likely both be decapsulated
>     within that network namespace (because the destination interface is
>     there), and we'll have a DUP icmp packet.  I didn't try that though.
> 
>     The correct mirroring configuration would be to have a mirror port
>     be one side of a veth pair with the other side in the different
>     network namespace, or just a separate physical port.  In both cases
>     the mirror port doesn't need to have an IP address.
> 
>     What do you think?  Does that make sense?
> 
>     Best regards, Ilya Maximets.
> 
> 
> 
> -- 
> hepeng
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 17f7e28..30e636e 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4107,18 +4107,26 @@  is_neighbor_reply_correct(const struct xlate_ctx *ctx, const struct flow *flow)
 }
 
 static bool
-xport_has_ip(const struct xport *xport)
+xport_has_ip(const struct xport *xport, const struct in6_addr ip6)
 {
     struct in6_addr *ip_addr, *mask;
     int n_in6 = 0;
+    int i;
+    bool hasip = false;
 
     if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
         n_in6 = 0;
     } else {
+        for (i = 0; i < n_in6; i++) {
+            if (IN6_ARE_ADDR_EQUAL(ip_addr + i, &ip6)) {
+                hasip = true;
+                break;
+            }
+        }
         free(ip_addr);
         free(mask);
     }
-    return n_in6 ? true : false;
+    return hasip;
 }
 
 static bool
@@ -4127,26 +4135,35 @@  terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
                         odp_port_t *tnl_port)
 {
     *tnl_port = ODPP_NONE;
+    bool ip_pkt = true;
+    struct in6_addr ip_dst;
+
+    if (flow->dl_type == htons(ETH_TYPE_IP)) {
+        ip_dst = in6_addr_mapped_ipv4(flow->nw_dst);
+    } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) {
+        ip_dst = flow->ipv6_dst;
+    } else {
+        ip_pkt = false;
+    }
 
     /* XXX: Write better Filter for tunnel port. We can use in_port
      * in tunnel-port flow to avoid these checks completely.
      *
-     * Port without an IP address cannot be a tunnel termination point.
-     * Not performing a lookup in this case to avoid unwildcarding extra
-     * flow fields (dl_dst). */
-    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
-        && xport_has_ip(xport)) {
+     * Not performing a lookup if dst ip not match, to avoid unwildcarding
+     * extra flow fields (dl_dst). Also avoid duplicate tnl_pop. */
+    if ((flow->dl_type == htons(ETH_TYPE_ARP) ||
+                flow->nw_proto == IPPROTO_ICMPV6) &&
+                is_neighbor_reply_correct(ctx, flow)) {
+            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
+                            ctx->xin->allow_side_effects);
+
+    } else if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
+        && ip_pkt && xport_has_ip(xport, ip_dst)) {
         *tnl_port = tnl_port_map_lookup(flow, wc);
 
         /* If no tunnel port was found and it's about an ARP or ICMPv6 packet,
          * do tunnel neighbor snooping. */
-        if (*tnl_port == ODPP_NONE &&
-            (flow->dl_type == htons(ETH_TYPE_ARP) ||
-             flow->nw_proto == IPPROTO_ICMPV6) &&
-             is_neighbor_reply_correct(ctx, flow)) {
-            tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
-                            ctx->xin->allow_side_effects);
-        } else if (*tnl_port != ODPP_NONE &&
+        if (*tnl_port != ODPP_NONE &&
                    ctx->xin->allow_side_effects &&
                    dl_type_is_ip_any(flow->dl_type)) {
             struct eth_addr mac = flow->dl_src;
diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
index c633441..b3a3bf9 100644
--- a/tests/tunnel-push-pop.at
+++ b/tests/tunnel-push-pop.at
@@ -808,6 +808,55 @@  NXST_FLOW reply:
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([tunnel_push_pop - multiple output])
+
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy ofport_request=1 other-config:hwaddr=aa:55:aa:55:00:00])
+AT_CHECK([ovs-vsctl add-port br0 m0 -- set Interface m0 type=dummy ofport_request=2])
+AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy], [0])
+AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=gre \
+                       options:remote_ip=1.1.2.92 options:key=456 options:seq=true ofport_request=4], [0])
+
+AT_CHECK([ovs-appctl dpif/show], [0], [dnl
+dummy@ovs-dummy: hit:0 missed:0
+  br0:
+    br0 65534/100: (dummy-internal)
+    m0 2/2: (dummy)
+    p0 1/1: (dummy)
+  int-br:
+    int-br 65534/3: (dummy-internal)
+    t1 4/4: (gre: key=456, remote_ip=1.1.2.92, seq=true)
+])
+
+AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.2.88/24], [0], [OK
+])
+AT_CHECK([ovs-appctl netdev-dummy/ip4addr m0 1.1.3.88/24], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 1.1.2.92/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 1.1.3.92/24 m0], [0], [OK
+])
+AT_CHECK([ovs-ofctl add-flow br0 'arp,priority=1,action=normal'])
+
+dnl Use arp reply to achieve tunnel next hop mac binding
+AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'])
+
+dnl Ouput gre pkt to both br0 port and m0 port
+dnl m0 port to simulate mirror port, or flood case
+AT_CHECK([ovs-ofctl add-flow br0 'in_port(1),ip,ip_proto=47,priority=99,action=br0,m0'])
+
+AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(src=1.1.2.92,dst=1.1.2.88,proto=47,tos=0,ttl=64,frag=no)'])
+
+dnl port br0: has the configured ip address of tnl port t1, so 'tnl_pop(4)'
+dnl port m0: does not have a configured tnl ip address, so 'output:2'
+AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [dnl
+flow-dump from the main thread:
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=f8:bc:12:44:34:b6,dst=aa:55:aa:55:00:00),eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.88,op=2,sha=f8:bc:12:44:34:b6), packets:0, bytes:0, used:never, actions:100,2
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(dst=aa:55:aa:55:00:00),eth_type(0x0800),ipv4(dst=1.1.2.88,proto=47,frag=no), packets:0, bytes:0, used:never, actions:tnl_pop(4),2
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
 
 OVS_VSWITCHD_START(