Message ID | 1469224466-24733-1-git-send-email-rmoats@us.ibm.com |
---|---|
State | Accepted |
Headers | show |
Ben Pfaff <blp@ovn.org> wrote on 07/22/2016 06:57:59 PM: > From: Ben Pfaff <blp@ovn.org> > To: Ryan Moats/Omaha/IBM@IBMUS > Cc: dev@openvswitch.org > Date: 07/22/2016 06:58 PM > Subject: Re: [ovs-dev] [PATCH] ovn-controller: Handle physical > changes correctly > > On Fri, Jul 22, 2016 at 09:54:26PM +0000, Ryan Moats wrote: > > [1] reported increased failure rates in certain tests > > with incremental processing (the numbers are the number of failures > > seen in 100 tests): > > > > 2 ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS > > 10 ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs > > 52 ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR > > 45 ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR > > 23 ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static routes > > 53 ovn -- 2 HVs, 3 LRs connected via LS, static routes > > 32 ovn -- 2 HVs, 2 LRs connected via LS, gateway router > > 50 ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR > > > > These failures were caused by a combination of problems in > > handling physical changes: > > > > 1. When a vif was removed, the localvif_to_ofport entry was not > > removed. > > 2. When a physical change was detected, ovn-controller would wait > > a poll cycle before processing the logical flow table. > > > > This patch set addresses both of these issues while simultaneously > > cleaning up the code in physical.c. A side effect is a modification > > of where OF flows are dumped in the gateway router case that allowed > > the root causes of this issue to be found. > > > > With these changes, all of the above tests had a 100/100 success rate. > > > > [1] http://openvswitch.org/pipermail/dev/2016-July/075803.html > > > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com> > > Thank you! > > This made a major improvement for me, too. I'm not seeing 100/100 but I > also run my tests with TESTSUITEFLAGS=-j10, which is really stressful. > > I folded in the following changes (most importantly, to make it obvious > why we're calling poll_immediate_wake(), which the commit message > explained but the code didn't) and applied this to master. > > diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c > index dc4ef77..11b2c2b 100644 > --- a/ovn/controller/physical.c > +++ b/ovn/controller/physical.c > @@ -55,8 +55,6 @@ physical_register_ovs_idl(struct ovsdb_idl *ovs_idl) > > static struct simap localvif_to_ofport = > SIMAP_INITIALIZER(&localvif_to_ofport); > -static struct simap new_localvif_to_ofport = > - SIMAP_INITIALIZER(&new_localvif_to_ofport); > static struct hmap tunnels = HMAP_INITIALIZER(&tunnels); > > /* UUID to identify OF flows not associated with ovsdb rows. */ > @@ -606,6 +604,8 @@ physical_run(struct controller_ctx *ctx, enum > mf_field_id mff_ovn_geneve, > uuid_generate(hc_uuid); > } > > + struct simap new_localvif_to_ofport = > + SIMAP_INITIALIZER(&new_localvif_to_ofport); > for (int i = 0; i < br_int->n_ports; i++) { > const struct ovsrec_port *port_rec = br_int->ports[i]; > if (!strcmp(port_rec->name, br_int->name)) { > @@ -674,8 +674,10 @@ physical_run(struct controller_ctx *ctx, enum > mf_field_id mff_ovn_geneve, > tun->ofport = u16_to_ofp(ofport); > tun->type = tunnel_type; > full_binding_processing = true; > - lflow_reset_processing(); > binding_reset_processing(); > + > + /* Reprocess logical flow table immediately. */ > + lflow_reset_processing(); > poll_immediate_wake(); > break; > } else { > @@ -712,6 +714,8 @@ physical_run(struct controller_ctx *ctx, enum > mf_field_id mff_ovn_geneve, > } > if (localvif_map_changed) { > full_binding_processing = true; > + > + /* Reprocess logical flow table immediately. */ > lflow_reset_processing(); > poll_immediate_wake(); > } > @@ -853,9 +857,10 @@ physical_run(struct controller_ctx *ctx, enum > mf_field_id mff_ovn_geneve, > ofctrl_add_flow(OFTABLE_DROP_LOOPBACK, 0, &match, &ofpacts, hc_uuid); > > ofpbuf_uninit(&ofpacts); > - simap_clear(&new_localvif_to_ofport); > HMAP_FOR_EACH_POP (tun, hmap_node, &tunnels) { > free(tun); > } > hmap_clear(&tunnels); > + > + simap_destroy(&new_localvif_to_ofport); > } > I should have said 100/100 with -j1 and the above changes make sense. Thanks for applying! Ryan
----- Original Message ----- > From: "Ryan Moats" <rmoats@us.ibm.com> > To: dev@openvswitch.org > Sent: Friday, July 22, 2016 5:54:26 PM > Subject: [ovs-dev] [PATCH] ovn-controller: Handle physical changes correctly > > [1] reported increased failure rates in certain tests > with incremental processing (the numbers are the number of failures > seen in 100 tests): > > 2 ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS > 10 ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs > 52 ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR > 45 ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR > 23 ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static routes > 53 ovn -- 2 HVs, 3 LRs connected via LS, static routes > 32 ovn -- 2 HVs, 2 LRs connected via LS, gateway router > 50 ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR > > These failures were caused by a combination of problems in > handling physical changes: > > 1. When a vif was removed, the localvif_to_ofport entry was not > removed. > 2. When a physical change was detected, ovn-controller would wait > a poll cycle before processing the logical flow table. > > This patch set addresses both of these issues while simultaneously > cleaning up the code in physical.c. A side effect is a modification > of where OF flows are dumped in the gateway router case that allowed > the root causes of this issue to be found. > > With these changes, all of the above tests had a 100/100 success rate. > > [1] http://openvswitch.org/pipermail/dev/2016-July/075803.html > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Hi Ryan, Belated ACK since this is already applied, tests are looking *much* better. Thanks! Lance
> From: Ryan Moats <rmoats@us.ibm.com> > To: dev@openvswitch.org > Date: 23/07/2016 12:57 AM > Subject: [ovs-dev] [PATCH] ovn-controller: Handle physical changes correctly > Sent by: "dev" <dev-bounces@openvswitch.org> > > [1] reported increased failure rates in certain tests > with incremental processing (the numbers are the number of failures > seen in 100 tests): > > 2 ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS > 10 ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs > 52 ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR > 45 ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR > 23 ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static routes > 53 ovn -- 2 HVs, 3 LRs connected via LS, static routes > 32 ovn -- 2 HVs, 2 LRs connected via LS, gateway router > 50 ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR > > These failures were caused by a combination of problems in > handling physical changes: > > 1. When a vif was removed, the localvif_to_ofport entry was not > removed. > 2. When a physical change was detected, ovn-controller would wait > a poll cycle before processing the logical flow table. > > This patch set addresses both of these issues while simultaneously > cleaning up the code in physical.c. A side effect is a modification > of where OF flows are dumped in the gateway router case that allowed > the root causes of this issue to be found. > > With these changes, all of the above tests had a 100/100 success rate. > > [1] http://openvswitch.org/pipermail/dev/2016-July/075803.html > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com> > --- This patch really improves the tests however on my performance evaluation environment it results in a 100% CPU load on the ovn-controllers.
Liran Schour/Haifa/IBM wrote on 07/25/2016 02:44:42 AM: > From: Liran Schour/Haifa/IBM > To: Ryan Moats <rmoats@us.ibm.com> > Cc: dev@openvswitch.org > Date: 07/25/2016 02:44 AM > Subject: Re: [ovs-dev] [PATCH] ovn-controller: Handle physical > changes correctly > > > From: Ryan Moats <rmoats@us.ibm.com> > > To: dev@openvswitch.org > > Date: 23/07/2016 12:57 AM > > Subject: [ovs-dev] [PATCH] ovn-controller: Handle physical changescorrectly > > Sent by: "dev" <dev-bounces@openvswitch.org> > > > > [1] reported increased failure rates in certain tests > > with incremental processing (the numbers are the number of failures > > seen in 100 tests): > > > > 2 ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS > > 10 ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs > > 52 ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR > > 45 ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR > > 23 ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static routes > > 53 ovn -- 2 HVs, 3 LRs connected via LS, static routes > > 32 ovn -- 2 HVs, 2 LRs connected via LS, gateway router > > 50 ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR > > > > These failures were caused by a combination of problems in > > handling physical changes: > > > > 1. When a vif was removed, the localvif_to_ofport entry was not > > removed. > > 2. When a physical change was detected, ovn-controller would wait > > a poll cycle before processing the logical flow table. > > > > This patch set addresses both of these issues while simultaneously > > cleaning up the code in physical.c. A side effect is a modification > > of where OF flows are dumped in the gateway router case that allowed > > the root causes of this issue to be found. > > > > With these changes, all of the above tests had a 100/100 success rate. > > > > [1] http://openvswitch.org/pipermail/dev/2016-July/075803.html > > > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com> > > --- > > This patch really improves the tests however on my performance > evaluation environment it results in a 100% CPU load on the ovn-controllers. I'm pretty sure I know what's causing that - let's see if I can revert that without impacting test results...
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c index b8f3105..dc4ef77 100644 --- a/ovn/controller/physical.c +++ b/ovn/controller/physical.c @@ -14,9 +14,11 @@ */ #include <config.h> +#include "binding.h" #include "byte-order.h" #include "flow.h" #include "lflow.h" +#include "lib/poll-loop.h" #include "ofctrl.h" #include "openvswitch/match.h" #include "openvswitch/ofp-actions.h" @@ -53,6 +55,8 @@ physical_register_ovs_idl(struct ovsdb_idl *ovs_idl) static struct simap localvif_to_ofport = SIMAP_INITIALIZER(&localvif_to_ofport); +static struct simap new_localvif_to_ofport = + SIMAP_INITIALIZER(&new_localvif_to_ofport); static struct hmap tunnels = HMAP_INITIALIZER(&tunnels); /* UUID to identify OF flows not associated with ovsdb rows. */ @@ -638,51 +642,15 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, bool is_patch = !strcmp(iface_rec->type, "patch"); if (is_patch && localnet) { /* localnet patch ports can be handled just like VIFs. */ - if (simap_find(&localvif_to_ofport, localnet)) { - unsigned int old_port = simap_get(&localvif_to_ofport, - localnet); - if (old_port != ofport) { - simap_put(&localvif_to_ofport, localnet, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } - } else { - simap_put(&localvif_to_ofport, localnet, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } + simap_put(&new_localvif_to_ofport, localnet, ofport); break; } else if (is_patch && l2gateway) { /* L2 gateway patch ports can be handled just like VIFs. */ - if (simap_find(&localvif_to_ofport, l2gateway)) { - unsigned int old_port = simap_get(&localvif_to_ofport, - l2gateway); - if (old_port != ofport) { - simap_put(&localvif_to_ofport, l2gateway, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } - } else { - simap_put(&localvif_to_ofport, l2gateway, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } + simap_put(&new_localvif_to_ofport, l2gateway, ofport); break; } else if (is_patch && logpatch) { /* Logical patch ports can be handled just like VIFs. */ - if (simap_find(&localvif_to_ofport, logpatch)) { - unsigned int old_port = simap_get(&localvif_to_ofport, - logpatch); - if (old_port != ofport) { - simap_put(&localvif_to_ofport, logpatch, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } - } else { - simap_put(&localvif_to_ofport, logpatch, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } + simap_put(&new_localvif_to_ofport, logpatch, ofport); break; } else if (chassis_id) { enum chassis_tunnel_type tunnel_type; @@ -706,29 +674,48 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, tun->ofport = u16_to_ofp(ofport); tun->type = tunnel_type; full_binding_processing = true; + lflow_reset_processing(); + binding_reset_processing(); + poll_immediate_wake(); break; } else { const char *iface_id = smap_get(&iface_rec->external_ids, "iface-id"); if (iface_id) { - if (simap_find(&localvif_to_ofport, iface_id)) { - unsigned int old_port = simap_get(&localvif_to_ofport, - iface_id); - if (old_port != ofport) { - simap_put(&localvif_to_ofport, iface_id, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } - } else { - simap_put(&localvif_to_ofport, iface_id, ofport); - full_binding_processing = true; - lflow_reset_processing(); - } + simap_put(&new_localvif_to_ofport, iface_id, ofport); } } } } + /* Capture changed or removed openflow ports. */ + bool localvif_map_changed = false; + struct simap_node *vif_name, *vif_name_next; + SIMAP_FOR_EACH_SAFE (vif_name, vif_name_next, &localvif_to_ofport) { + int newport; + if ((newport = simap_get(&new_localvif_to_ofport, vif_name->name))) { + if (newport != simap_get(&localvif_to_ofport, vif_name->name)) { + simap_put(&localvif_to_ofport, vif_name->name, newport); + localvif_map_changed = true; + } + } else { + simap_find_and_delete(&localvif_to_ofport, vif_name->name); + localvif_map_changed = true; + } + } + SIMAP_FOR_EACH (vif_name, &new_localvif_to_ofport) { + if (!simap_get(&localvif_to_ofport, vif_name->name)) { + simap_put(&localvif_to_ofport, vif_name->name, + simap_get(&new_localvif_to_ofport, vif_name->name)); + localvif_map_changed = true; + } + } + if (localvif_map_changed) { + full_binding_processing = true; + lflow_reset_processing(); + poll_immediate_wake(); + } + struct ofpbuf ofpacts; ofpbuf_init(&ofpacts, 0); @@ -866,6 +853,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, ofctrl_add_flow(OFTABLE_DROP_LOOPBACK, 0, &match, &ofpacts, hc_uuid); ofpbuf_uninit(&ofpacts); + simap_clear(&new_localvif_to_ofport); HMAP_FOR_EACH_POP (tun, hmap_node, &tunnels) { free(tun); } diff --git a/tests/ovn.at b/tests/ovn.at index 95aa822..614a4bb 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -3192,14 +3192,6 @@ ovn-sbctl list chassis ovn-sbctl list encap echo "---------------------" -echo "------ hv1 dump ----------" -as hv1 ovs-ofctl show br-int -as hv1 ovs-ofctl dump-flows br-int -echo "------ hv2 dump ----------" -as hv2 ovs-ofctl show br-int -as hv2 ovs-ofctl dump-flows br-int -echo "----------------------------" - # Packet to Expect at alice1 src_mac="000002010203" dst_mac="f00000010204" @@ -3211,6 +3203,14 @@ expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}00351 as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet +echo "------ hv1 dump after packet 1 ----------" +as hv1 ovs-ofctl show br-int +as hv1 ovs-ofctl dump-flows br-int +echo "------ hv2 dump after packet 1 ----------" +as hv2 ovs-ofctl show br-int +as hv2 ovs-ofctl dump-flows br-int +echo "----------------------------" + $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap | trim_zeros > received1.packets echo $expected | trim_zeros > expout AT_CHECK([cat received1.packets], [0], [expout]) @@ -3232,6 +3232,15 @@ sleep 1 # Send the packet again. as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet + +echo "------ hv1 dump after packet 2 ----------" +as hv1 ovs-ofctl show br-int +as hv1 ovs-ofctl dump-flows br-int +echo "------ hv2 dump after packet 2 ----------" +as hv2 ovs-ofctl show br-int +as hv2 ovs-ofctl dump-flows br-int +echo "----------------------------" + $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap | trim_zeros > received1.packets echo $expected | trim_zeros >> expout AT_CHECK([cat received1.packets], [0], [expout])
[1] reported increased failure rates in certain tests with incremental processing (the numbers are the number of failures seen in 100 tests): 2 ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS 10 ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs 52 ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR 45 ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR 23 ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static routes 53 ovn -- 2 HVs, 3 LRs connected via LS, static routes 32 ovn -- 2 HVs, 2 LRs connected via LS, gateway router 50 ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR These failures were caused by a combination of problems in handling physical changes: 1. When a vif was removed, the localvif_to_ofport entry was not removed. 2. When a physical change was detected, ovn-controller would wait a poll cycle before processing the logical flow table. This patch set addresses both of these issues while simultaneously cleaning up the code in physical.c. A side effect is a modification of where OF flows are dumped in the gateway router case that allowed the root causes of this issue to be found. With these changes, all of the above tests had a 100/100 success rate. [1] http://openvswitch.org/pipermail/dev/2016-July/075803.html Signed-off-by: Ryan Moats <rmoats@us.ibm.com> --- ovn/controller/physical.c | 92 +++++++++++++++++++++-------------------------- tests/ovn.at | 25 ++++++++----- 2 files changed, 57 insertions(+), 60 deletions(-)