diff mbox series

[ovs-dev] northd: add router broadcast option to logical switch

Message ID DU0PR10MB5244FC4E6A785694FD84B6F9EAA89@DU0PR10MB5244.EURPRD10.PROD.OUTLOOK.COM
State Changes Requested
Headers show
Series [ovs-dev] northd: add router broadcast option to logical switch | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Felix Huettner Feb. 24, 2023, 11:04 a.m. UTC
Assume the following setup:

+----------------+
| Logical Router |
| lr001          +-+
+----------------+ |
                   |
+----------------+ |
| Logical Router | | +----------------+ +------------------+
| lr002          +-+-+ Logical Switch +-+ Phyiscal Network |
+----------------+ | | ls-ext         | |                  |
                   | +----------------+ +------------------+
      ...          |
                   |
+----------------+ |
| Logical Router | |
| lr300          +-+
+----------------+

If a arp request for the ip of lr001 on ls-ext is now received it is
only forwarded to that individual logical router.

If we however now receive a arp request for an ip not used by any of
lr001-lr300 we try to flood the arp request to all logical ports on ls-ext.
With around 300 routers this causes the arp request to be dropped after
some routers as we hit the 4096 resubmit limit.

In the most cases forwarding the arp requests to the logical routers is
pointless as we already know all of their ip addresses and they will
therefor not be able to answer the arp requests anyway.
Only if someone sends garps this is not the case. Then the request would
need to be flooded to all logical routers.

We can therefor not generally send these arp requests to MC_FLOOD_L2 as
this would break garps. As we can also not detect garps we need to leave
the solution to our users.

To do this we introduce the other_config `broadcast-arps-to-all-routers`
on logical switches (which is per default true). If set to false we add
a logical flow that forwards arp requests where we do not know a
specific target logical switch port to MC_FLOOD_L2, thereby bypassing
all logical routers.

Note that the testcase is quite flaky in the ci (as it takes verry long)
but runs well locally. I'm unsure how to best proceed there.

Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
---
 northd/northd.c         |   7 +++
 northd/ovn-northd.8.xml |   7 +++
 ovn-nb.xml              |  12 +++++
 tests/ovn.at            | 114 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 140 insertions(+)

--
2.39.1
Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.

Comments

Numan Siddique March 17, 2023, 7:54 p.m. UTC | #1
On Fri, Feb 24, 2023 at 6:04 AM Felix Hüttner via dev
<ovs-dev@openvswitch.org> wrote:
>
> Assume the following setup:
>
> +----------------+
> | Logical Router |
> | lr001          +-+
> +----------------+ |
>                    |
> +----------------+ |
> | Logical Router | | +----------------+ +------------------+
> | lr002          +-+-+ Logical Switch +-+ Phyiscal Network |
> +----------------+ | | ls-ext         | |                  |
>                    | +----------------+ +------------------+
>       ...          |
>                    |
> +----------------+ |
> | Logical Router | |
> | lr300          +-+
> +----------------+
>
> If a arp request for the ip of lr001 on ls-ext is now received it is
> only forwarded to that individual logical router.
>
> If we however now receive a arp request for an ip not used by any of
> lr001-lr300 we try to flood the arp request to all logical ports on ls-ext.
> With around 300 routers this causes the arp request to be dropped after
> some routers as we hit the 4096 resubmit limit.
>
> In the most cases forwarding the arp requests to the logical routers is
> pointless as we already know all of their ip addresses and they will
> therefor not be able to answer the arp requests anyway.
> Only if someone sends garps this is not the case. Then the request would
> need to be flooded to all logical routers.
>
> We can therefor not generally send these arp requests to MC_FLOOD_L2 as
> this would break garps. As we can also not detect garps we need to leave
> the solution to our users.
>
> To do this we introduce the other_config `broadcast-arps-to-all-routers`
> on logical switches (which is per default true). If set to false we add
> a logical flow that forwards arp requests where we do not know a
> specific target logical switch port to MC_FLOOD_L2, thereby bypassing
> all logical routers.
>
> Note that the testcase is quite flaky in the ci (as it takes verry long)
> but runs well locally. I'm unsure how to best proceed there.

Thanks for the patch and sorry for the delay in reviews.

I don't see any harm in supporting this option and the patch overall
looks ok to me.

Regarding the test case flakiness,  I don't think we want to introduce
a flaky test.
I'd suggest the following
    - Either find a solution to fix the flakiness in the CI or
   -  Drop this test case and add a test case in ovn-northd.at which
makes sure that when this
     option is set, ovn-northd generates the required logical flow and
when the option is cleared
     it clears the logical flow.  This should be sufficient IMO to
test this patch.


Also please add a NEWS entry for this new option.

Thanks
Numan



>
> Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
> ---
>  northd/northd.c         |   7 +++
>  northd/ovn-northd.8.xml |   7 +++
>  ovn-nb.xml              |  12 +++++
>  tests/ovn.at            | 114 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 140 insertions(+)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index c366b545e..6aff04cc5 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -8964,6 +8964,13 @@ build_lswitch_destination_lookup_bmcast(struct ovn_datapath *od,
>              }
>          }
>
> +
> +        if (!smap_get_bool(&od->nbs->other_config, "broadcast-arps-to-all-routers", true)) {
> +            ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 72,
> +                        "eth.mcast && (arp.op == 1 || nd_ns)",
> +                        "outport = \""MC_FLOOD_L2"\"; output;");
> +        }
> +
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 70, "eth.mcast",
>                        "outport = \""MC_FLOOD"\"; output;");
>      }
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 2eab2c4ae..033841383 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -1873,6 +1873,13 @@ output;
>          non-router logical ports.
>        </li>
>
> +      <li>
> +        A priority-72 flow that outputs all ARP requests and ND packets with
> +        an Ethernet broadcast or multicast <code>eth.dst</code> to the
> +        <code>MC_FLOOD_L2</code> multicast group if
> +        <code>other_config:broadcast-arps-to-all-routers=true</code>.
> +      </li>
> +
>        <li>
>          A priority-70 flow that outputs all packets with an Ethernet broadcast
>          or multicast <code>eth.dst</code> to the <code>MC_FLOOD</code>
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 8d56d0c6e..a41d5b11f 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -729,6 +729,18 @@
>          localnet ports, fabric traffic that belongs to other tagged networks may
>          be passed through such a port.
>        </column>
> +
> +      <column name="other_config" key="broadcast-arps-to-all-routers"
> +          type='{"type": "boolean"}'>
> +        Determines whether arp requests and ipv6 neighbor solicitations should
> +        be send to all routers and other switchports (default) or if it should
> +        only be send to switchports where the ip/mac address is unknown.
> +        Setting this to false can significantly reduce the load if the logical
> +        switch can receive arp requests for ips it does not know about.
> +        However setting this to false also means that garps are no longer
> +        forwarded to all routers and therefor the mac bindings of the routers
> +        are no longer updated.
> +      </column>
>      </group>
>
>      <group title="Common Columns">
> diff --git a/tests/ovn.at b/tests/ovn.at
> index dc5c5df3f..dfef5dacc 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -5048,6 +5048,120 @@ OVN_CLEANUP([hv1], [hv2], [hv3])
>  AT_CLEANUP
>  ])
>
> +# 1 hypervisor, 1 logical switch with 1 logical port, 300 logical router
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([1 HVs, 1 LS, 2 lports/LS, 300 LR])
> +AT_KEYWORDS([slowtest])
> +ovn_start
> +
> +# Logical network:
> +#
> +# One logical switch ls1.
> +# 300 logical routers lr001 - lr299. All connected to ls1
> +# with one subnet:
> +#    lrp001 on ls1 for subnet 192.168.0.1/20
> +#    lrp002 on ls1 for subnet 192.168.0.2/20
> +#    ...
> +#    lrp101 on ls1 for subnet 192.168.1.1/20
> +#    ...
> +#    lrp299 on ls1 for subnet 192.168.2.99/20
> +#
> +# also 2 VIF on the first hypervisor.
> +#    lp-vif1 on ls1 for subnet 192.168.3.1/20
> +#    lp-vif2 on ls1 without ip
> +ovn-nbctl \
> +    -- ls-add ls1 \
> +    -- lsp-add ls1 lp-vif1 \
> +    -- lsp-set-addresses lp-vif1 \
> +        "f0:00:00:00:00:01 192.168.3.1" \
> +    -- lsp-add ls1 lp-vif2 \
> +    -- lsp-set-addresses lp-vif2 \
> +        "f0:00:00:00:00:02 unknown"
> +
> +for i in 0 1 2; do
> +    for j in `seq 1 99`; do
> +        hex=$(printf '%02x' $j)
> +        ovn-nbctl \
> +            -- lr-add lr$i$j \
> +            -- lrp-add lr$i$j lrp$i$j 00:00:00:00:f$i:$hex 192.168.$i.$j/20 \
> +            -- lsp-add ls1 lrp$i$j-attachment \
> +            -- set Logical_Switch_Port lrp$i$j-attachment type=router \
> +                             options:router-port=lrp$i$j \
> +                             addresses='"00:00:00:00:f'$i':'$hex'"'
> +    done
> +done
> +
> +# Physical network:
> +#
> +# One hypervisors hv1.
> +# lp-vif1 on hv1.
> +# lp-vif2 on hv1.
> +
> +net_add n1
> +sim_add hv1
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.255.1
> +ovs-vsctl \
> +    -- add-port br-int vif1 \
> +    -- set Interface vif1 \
> +        external-ids:iface-id=lp-vif1 \
> +        options:tx_pcap=hv1/vif1-tx.pcap \
> +        options:rxq_pcap=hv1/vif1-rx.pcap \
> +        ofport-request=1 \
> +    -- add-port br-int vif2 \
> +    -- set Interface vif2 \
> +        external-ids:iface-id=lp-vif2 \
> +        options:tx_pcap=hv1/vif2-tx.pcap \
> +        options:rxq_pcap=hv1/vif2-rx.pcap \
> +        ofport-request=2
> +
> +ovs-vsctl show
> +ovs-vsctl list Interface
> +
> +# This might take longer than normally
> +export OVS_CTL_TIMEOUT=120
> +wait_for_ports_up
> +check ovn-nbctl --wait=hv --timeout=$OVS_CTL_TIMEOUT sync
> +
> +# now back to normal timeouts
> +export OVS_CTL_TIMEOUT=30
> +
> +# Pre-populate the hypervisors' ARP tables so that we don't lose any
> +# packets for ARP resolution (native tunneling doesn't queue packets
> +# for ARP resolution).
> +OVN_POPULATE_ARP
> +
> +send_arp() {
> +    ovs-appctl ofproto/trace br-int \
> +        arp,in_port=1,vlan_tci=0x0000,dl_src=f0:00:00:00:00:01,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.3.1,arp_tpa=1.2.3.4,arp_op=1,arp_sha=f0:00:00:00:00:01,arp_tha=ff:ff:ff:ff:ff:ff
> +}
> +
> +is_arp_dropped_on_recirc() {
> +    send_arp | grep "Too many resubmits"
> +}
> +
> +is_arp_received_by_vif2() {
> +    send_arp | grep "output:2"
> +}
> +
> +# we now send a test arp request that the routers can not answer.
> +# without the later setting this will die because of the recirc limit
> +
> +OVS_WAIT_UNTIL(is_arp_dropped_on_recirc)
> +
> +# now we reconfigure the logical switch to not forward these arp packets to
> +# the routers
> +ovn-nbctl set Logical_Switch ls1 other_config:broadcast-arps-to-all-routers=false
> +
> +OVS_WAIT_UNTIL(is_arp_received_by_vif2)
> +
> +# Gracefully terminate daemons
> +OVN_CLEANUP([hv1])
> +
> +AT_CLEANUP
> +])
> +
>  OVN_FOR_EACH_NORTHD([
>  AT_SETUP([IP relocation using GARP request])
>  ovn_start
> --
> 2.39.1
> Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index c366b545e..6aff04cc5 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8964,6 +8964,13 @@  build_lswitch_destination_lookup_bmcast(struct ovn_datapath *od,
             }
         }

+
+        if (!smap_get_bool(&od->nbs->other_config, "broadcast-arps-to-all-routers", true)) {
+            ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 72,
+                        "eth.mcast && (arp.op == 1 || nd_ns)",
+                        "outport = \""MC_FLOOD_L2"\"; output;");
+        }
+
         ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 70, "eth.mcast",
                       "outport = \""MC_FLOOD"\"; output;");
     }
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 2eab2c4ae..033841383 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -1873,6 +1873,13 @@  output;
         non-router logical ports.
       </li>

+      <li>
+        A priority-72 flow that outputs all ARP requests and ND packets with
+        an Ethernet broadcast or multicast <code>eth.dst</code> to the
+        <code>MC_FLOOD_L2</code> multicast group if
+        <code>other_config:broadcast-arps-to-all-routers=true</code>.
+      </li>
+
       <li>
         A priority-70 flow that outputs all packets with an Ethernet broadcast
         or multicast <code>eth.dst</code> to the <code>MC_FLOOD</code>
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 8d56d0c6e..a41d5b11f 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -729,6 +729,18 @@ 
         localnet ports, fabric traffic that belongs to other tagged networks may
         be passed through such a port.
       </column>
+
+      <column name="other_config" key="broadcast-arps-to-all-routers"
+          type='{"type": "boolean"}'>
+        Determines whether arp requests and ipv6 neighbor solicitations should
+        be send to all routers and other switchports (default) or if it should
+        only be send to switchports where the ip/mac address is unknown.
+        Setting this to false can significantly reduce the load if the logical
+        switch can receive arp requests for ips it does not know about.
+        However setting this to false also means that garps are no longer
+        forwarded to all routers and therefor the mac bindings of the routers
+        are no longer updated.
+      </column>
     </group>

     <group title="Common Columns">
diff --git a/tests/ovn.at b/tests/ovn.at
index dc5c5df3f..dfef5dacc 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -5048,6 +5048,120 @@  OVN_CLEANUP([hv1], [hv2], [hv3])
 AT_CLEANUP
 ])

+# 1 hypervisor, 1 logical switch with 1 logical port, 300 logical router
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([1 HVs, 1 LS, 2 lports/LS, 300 LR])
+AT_KEYWORDS([slowtest])
+ovn_start
+
+# Logical network:
+#
+# One logical switch ls1.
+# 300 logical routers lr001 - lr299. All connected to ls1
+# with one subnet:
+#    lrp001 on ls1 for subnet 192.168.0.1/20
+#    lrp002 on ls1 for subnet 192.168.0.2/20
+#    ...
+#    lrp101 on ls1 for subnet 192.168.1.1/20
+#    ...
+#    lrp299 on ls1 for subnet 192.168.2.99/20
+#
+# also 2 VIF on the first hypervisor.
+#    lp-vif1 on ls1 for subnet 192.168.3.1/20
+#    lp-vif2 on ls1 without ip
+ovn-nbctl \
+    -- ls-add ls1 \
+    -- lsp-add ls1 lp-vif1 \
+    -- lsp-set-addresses lp-vif1 \
+        "f0:00:00:00:00:01 192.168.3.1" \
+    -- lsp-add ls1 lp-vif2 \
+    -- lsp-set-addresses lp-vif2 \
+        "f0:00:00:00:00:02 unknown"
+
+for i in 0 1 2; do
+    for j in `seq 1 99`; do
+        hex=$(printf '%02x' $j)
+        ovn-nbctl \
+            -- lr-add lr$i$j \
+            -- lrp-add lr$i$j lrp$i$j 00:00:00:00:f$i:$hex 192.168.$i.$j/20 \
+            -- lsp-add ls1 lrp$i$j-attachment \
+            -- set Logical_Switch_Port lrp$i$j-attachment type=router \
+                             options:router-port=lrp$i$j \
+                             addresses='"00:00:00:00:f'$i':'$hex'"'
+    done
+done
+
+# Physical network:
+#
+# One hypervisors hv1.
+# lp-vif1 on hv1.
+# lp-vif2 on hv1.
+
+net_add n1
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.255.1
+ovs-vsctl \
+    -- add-port br-int vif1 \
+    -- set Interface vif1 \
+        external-ids:iface-id=lp-vif1 \
+        options:tx_pcap=hv1/vif1-tx.pcap \
+        options:rxq_pcap=hv1/vif1-rx.pcap \
+        ofport-request=1 \
+    -- add-port br-int vif2 \
+    -- set Interface vif2 \
+        external-ids:iface-id=lp-vif2 \
+        options:tx_pcap=hv1/vif2-tx.pcap \
+        options:rxq_pcap=hv1/vif2-rx.pcap \
+        ofport-request=2
+
+ovs-vsctl show
+ovs-vsctl list Interface
+
+# This might take longer than normally
+export OVS_CTL_TIMEOUT=120
+wait_for_ports_up
+check ovn-nbctl --wait=hv --timeout=$OVS_CTL_TIMEOUT sync
+
+# now back to normal timeouts
+export OVS_CTL_TIMEOUT=30
+
+# Pre-populate the hypervisors' ARP tables so that we don't lose any
+# packets for ARP resolution (native tunneling doesn't queue packets
+# for ARP resolution).
+OVN_POPULATE_ARP
+
+send_arp() {
+    ovs-appctl ofproto/trace br-int \
+        arp,in_port=1,vlan_tci=0x0000,dl_src=f0:00:00:00:00:01,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.3.1,arp_tpa=1.2.3.4,arp_op=1,arp_sha=f0:00:00:00:00:01,arp_tha=ff:ff:ff:ff:ff:ff
+}
+
+is_arp_dropped_on_recirc() {
+    send_arp | grep "Too many resubmits"
+}
+
+is_arp_received_by_vif2() {
+    send_arp | grep "output:2"
+}
+
+# we now send a test arp request that the routers can not answer.
+# without the later setting this will die because of the recirc limit
+
+OVS_WAIT_UNTIL(is_arp_dropped_on_recirc)
+
+# now we reconfigure the logical switch to not forward these arp packets to
+# the routers
+ovn-nbctl set Logical_Switch ls1 other_config:broadcast-arps-to-all-routers=false
+
+OVS_WAIT_UNTIL(is_arp_received_by_vif2)
+
+# Gracefully terminate daemons
+OVN_CLEANUP([hv1])
+
+AT_CLEANUP
+])
+
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([IP relocation using GARP request])
 ovn_start