diff mbox

[ovs-dev,ovs-dev,v23,2/2] ovn-controller: Add incremental processing to lflow_run and physical_run

Message ID OFF5548A24.F1724497-ON86257FF5.00765648-86257FF5.0078779F@notes.na.collabserv.com
State Not Applicable
Headers show

Commit Message

Ryan Moats July 19, 2016, 9:55 p.m. UTC
> On Tue, Jul 19, 2016 at 02:57:43PM -0500, Ryan Moats wrote:
> > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 02:54:36 PM:
> >
> > > From: Ben Pfaff <blp at ovn.org>
> > To: Ryan Moats/Omaha/IBM at IBMUS>
> > Cc: dev at openvswitch.org
> > Date: 07/19/2016 02:54 PM
> > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > processing to lflow_run and physical_run
> > >
> > > On Tue, Jul 19, 2016 at 02:18:11PM -0500, Ryan Moats wrote:
> > > >
> > > >
> > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 01:39:43 PM:
> > > >
> > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > Cc: dev at openvswitch.org
> > > > > Date: 07/19/2016 01:41 PM
> > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > > processing to lflow_run and physical_run
> > > > >
> > > > > On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> > > > > >
> > > > > >
> > > > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 12:57:23 AM:
> > > > > >
> > > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > > Cc: dev at openvswitch.org
> > > > > > > Date: 07/19/2016 12:57 AM
> > > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add
incremental
> > > > > > > processing to lflow_run and physical_run
> > > > > > >
> > > > > > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats wrote:
> > > > > > > > This code changes to allow incremental processing of the
> > > > > > > > logical flow and physical binding tables whenver possible.
> > > > > > > >
> > > > > > > > Note: flows created by physical_run for multicast_groups
are
> > > > > > > > *NOT* handled incrementally due to to be solved issues
> > > > > > > > with GWs and local routers.
> > > > > > > >
> > > > > > > > Signed-off-by: Ryan Moats <rmoats at us.ibm.com>
> > > > > > >
> > > > > > > With this applied, I get a number of apparently consistent
test
> > > > failures
> > > > > > > (literally no passes in 5+ runs) that I don't see, at least
not
> > > > > > > consistently, if I revert it:
> > > > > > >
> > > > > > >     OVN end-to-end tests
> > > > > > >
> > > > > > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS
FAILED
> > > > > > (ovn.at:1307)
> > > > > > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS
FAILED
> > > > > > (ovn.at:1473)
> > > > > > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR
FAILED
> > > > > > (ovn.at:1887)
> > > > > > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs
FAILED
> > > > > > (ovn.at:2416)
> > > > > > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > FAILED
> > > > > > > (ovn.at:3053)
> > > > > > >
> > > > > > > Do these pass reasonably often for you?  (Maybe my changes to
the
> > > > > > > previous patch screwed something up?)
> > > > > > >
> > > > > >
> > > > > > First, not all of the above cases failed for me even with a
rebase
> > of
> > > > the
> > > > > > last patch onto what is currently in master - the last two
passed
> > for
> > > > me
> > > > > > on either the first and second run, so I'm looking at the first
> > three
> > > > for
> > > > > > the rest of this message.
> > > > > >
> > > > > > > Double checking by going back to my original first patch
verified
> > that
> > > > all
> > > > > > of the tests in question passed.  Going through and checking
each
> > of
> > > > the
> > > > > > changes that you made to the first patch, the following is the
diff
> > I
> > > > > > needed
> > > > > > to make things work again:
> > > > > >
> > > > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > > > index d10dcc6..e113213 100644
> > > > > > --- a/ovn/controller/ofctrl.c
> > > > > > +++ b/ovn/controller/ofctrl.c
> > > > > > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> > > > priority,
> > > > > >      ovn_flow_lookup(&match_flow_table, f, &existing);
> > > > > >      struct ovn_flow *d;
> > > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > > > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > +        if (f->table_id == d->table_id && f->priority == d->
> > priority
> > > > > > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > > +                             d->ofpacts, d->ofpacts_len)
> > > > > > +            && uuid_equals(&f->uuid, &d->uuid)) {
> > > > > >              static struct vlog_rate_limit rl =
> > VLOG_RATE_LIMIT_INIT(5,
> > > > 1);
> > > > > >              char *s = ovn_flow_to_string(f);
> > > > > >              VLOG_WARN_RL(&rl, "found duplicate flow %s for
parent
> > > > > > "UUID_FMT,
> > > > > >
> > > > > > Hopefully that will get you past the test failures and you can
push
> > > > > > both it and the rest of the last incremental processing patch.
> > > > >
> > > > > OK...
> > > > >
> > > > > Help me understand why that change is correct.
> > > > >
> > > > > I deleted the table_id and priority checks because
ovn_flow_lookup()
> > > > > only returns flows where those are equal anyhow, so they seemed
> > > > > completely redundant.  That leaves the actions comparison.  Why
would
> > we
> > > > > report a duplicate only if the actions are the same?  It's
actually
> > > > > worse, isn't it, if the actions are different, because then the
flows
> > > > > aren't just duplicates, they're inconsistent with each other.
> > > >
> > > > Actually, you are correct in that I'm doing something I shouldn't,
but
> > > > your solution is also doing something it shouldn't.  Yes, my
keeping
> > > > rules that are inconsistent is wrong, but discarding rules where
> > > > the action changes (which is what you proposed) doesn't work
either.
> > > >
> > > > How about if we split the difference?  First, verify that the uuid
> > > > is the same, then check the actions.  If they are the same, spawn a
> > > > warning, and if they aren't replace the old actions with the new
ones.
> > > > This appears to pass unit tests as well (and I think it's more
> > correct),
> > > > in which case the difference becomes:
> > > >
> > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > index d10dcc6..0b8326e 100644
> > > > --- a/ovn/controller/ofctrl.c
> > > > +++ b/ovn/controller/ofctrl.c
> > > > @@ -554,13 +554,21 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> > priority,
> > > >      struct ovn_flow *d;
> > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > -            static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> > 1);
> > > > -            char *s = ovn_flow_to_string(f);
> > > > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > > > "UUID_FMT,
> > > > -                         s, UUID_ARGS(&f->uuid));
> > > > -            ovn_flow_destroy(f);
> > > > -            free(s);
> > > > -            return;
> > > > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > +                              d->ofpacts, d->ofpacts_len)) {
> > > > +                static struct vlog_rate_limit rl =
> > VLOG_RATE_LIMIT_INIT(5,
> > > > 1);
> > > > +                char *s = ovn_flow_to_string(f);
> > > > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for
parent
> > > > "UUID_FMT
> > > > +                             s, UUID_ARGS(&f->uuid));
> > > > +                ovn_flow_destroy(f);
> > > > +                free(s);
> > > > +                return;
> > > > +            } else {
> > > > +                /* Update the action. */
> > > > +                free(d->ofpacts);
> > > > +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> > > > +                d->ofpacts_len = f->ofpacts_len;
> > > > +            }
> > > >          }
> > > >      }
> > > >
> > >
> > > I'm OK with updating the actions, but I'd prefer to warn about it
anyway
> > > because I still think it's a bug; if the "new" actions happen to work
> > > better than the "old" actions, I think that's just a coincidence.
> >
> > While I think that's going to be a *lot* of warnings, I'm ok with
warning
> > in that case as well for now and we can argue the point later on after
we
> > have more operational experience with the patch set.
>
> So do we have a lot of code that's using ofctrl_add_flow() to actually
> replace an existing flow instead of adding one?  Maybe that code should
> be using ofctrl_set_flow() instead?
>
> I'm trying to understand the intent here instead of just the code.
>
> > You want to add the additional warning code or should I?
>
> I'm happy to write the code once I understand what's going on.

Sorry for breaking the thread but while I can see your reply in the
archives,
something in the server chain isn't it dropping it into my mailbox,
and I don't want to look like I'm ignoring it.

Based on looking through the logs when running the unit tests, I believe
(but I don't know for sure) that the vast majority of warnings
will come from creating a completely identical flow (and I admit that I
don't know why *yet* - that was a topic for another day) - the reason I
can't say for sure is that since I've not had a separate message handling
the action change case, I've had nothing to compare to.

Since we are going to warn on both branches, I'd like to have
the warning message be different so that there is a canary to work
backwards
from to figure out what's driving what. Something like the following:

+                return;
+            }
         }

Does that make sense?

Ryan

Comments

Ben Pfaff July 20, 2016, 12:08 a.m. UTC | #1
On Tue, Jul 19, 2016 at 04:55:49PM -0500, Ryan Moats wrote:
> 
> 
> > On Tue, Jul 19, 2016 at 02:57:43PM -0500, Ryan Moats wrote:
> > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 02:54:36 PM:
> > >
> > > > From: Ben Pfaff <blp at ovn.org>
> > > To: Ryan Moats/Omaha/IBM at IBMUS>
> > > Cc: dev at openvswitch.org
> > > Date: 07/19/2016 02:54 PM
> > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > processing to lflow_run and physical_run
> > > >
> > > > On Tue, Jul 19, 2016 at 02:18:11PM -0500, Ryan Moats wrote:
> > > > >
> > > > >
> > > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 01:39:43 PM:
> > > > >
> > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > Cc: dev at openvswitch.org
> > > > > > Date: 07/19/2016 01:41 PM
> > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > > > processing to lflow_run and physical_run
> > > > > >
> > > > > > On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> > > > > > >
> > > > > > >
> > > > > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 12:57:23 AM:
> > > > > > >
> > > > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > > > Cc: dev at openvswitch.org
> > > > > > > > Date: 07/19/2016 12:57 AM
> > > > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add
> incremental
> > > > > > > > processing to lflow_run and physical_run
> > > > > > > >
> > > > > > > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats wrote:
> > > > > > > > > This code changes to allow incremental processing of the
> > > > > > > > > logical flow and physical binding tables whenver possible.
> > > > > > > > >
> > > > > > > > > Note: flows created by physical_run for multicast_groups
> are
> > > > > > > > > *NOT* handled incrementally due to to be solved issues
> > > > > > > > > with GWs and local routers.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Ryan Moats <rmoats at us.ibm.com>
> > > > > > > >
> > > > > > > > With this applied, I get a number of apparently consistent
> test
> > > > > failures
> > > > > > > > (literally no passes in 5+ runs) that I don't see, at least
> not
> > > > > > > > consistently, if I revert it:
> > > > > > > >
> > > > > > > >     OVN end-to-end tests
> > > > > > > >
> > > > > > > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS
> FAILED
> > > > > > > (ovn.at:1307)
> > > > > > > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS
> FAILED
> > > > > > > (ovn.at:1473)
> > > > > > > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR
> FAILED
> > > > > > > (ovn.at:1887)
> > > > > > > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs
> FAILED
> > > > > > > (ovn.at:2416)
> > > > > > > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > > FAILED
> > > > > > > > (ovn.at:3053)
> > > > > > > >
> > > > > > > > Do these pass reasonably often for you?  (Maybe my changes to
> the
> > > > > > > > previous patch screwed something up?)
> > > > > > > >
> > > > > > >
> > > > > > > First, not all of the above cases failed for me even with a
> rebase
> > > of
> > > > > the
> > > > > > > last patch onto what is currently in master - the last two
> passed
> > > for
> > > > > me
> > > > > > > on either the first and second run, so I'm looking at the first
> > > three
> > > > > for
> > > > > > > the rest of this message.
> > > > > > >
> > > > > > > > Double checking by going back to my original first patch
> verified
> > > that
> > > > > all
> > > > > > > of the tests in question passed.  Going through and checking
> each
> > > of
> > > > > the
> > > > > > > changes that you made to the first patch, the following is the
> diff
> > > I
> > > > > > > needed
> > > > > > > to make things work again:
> > > > > > >
> > > > > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > > > > index d10dcc6..e113213 100644
> > > > > > > --- a/ovn/controller/ofctrl.c
> > > > > > > +++ b/ovn/controller/ofctrl.c
> > > > > > > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> > > > > priority,
> > > > > > >      ovn_flow_lookup(&match_flow_table, f, &existing);
> > > > > > >      struct ovn_flow *d;
> > > > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > > > > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > > +        if (f->table_id == d->table_id && f->priority == d->
> > > priority
> > > > > > > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > > > +                             d->ofpacts, d->ofpacts_len)
> > > > > > > +            && uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > >              static struct vlog_rate_limit rl =
> > > VLOG_RATE_LIMIT_INIT(5,
> > > > > 1);
> > > > > > >              char *s = ovn_flow_to_string(f);
> > > > > > >              VLOG_WARN_RL(&rl, "found duplicate flow %s for
> parent
> > > > > > > "UUID_FMT,
> > > > > > >
> > > > > > > Hopefully that will get you past the test failures and you can
> push
> > > > > > > both it and the rest of the last incremental processing patch.
> > > > > >
> > > > > > OK...
> > > > > >
> > > > > > Help me understand why that change is correct.
> > > > > >
> > > > > > I deleted the table_id and priority checks because
> ovn_flow_lookup()
> > > > > > only returns flows where those are equal anyhow, so they seemed
> > > > > > completely redundant.  That leaves the actions comparison.  Why
> would
> > > we
> > > > > > report a duplicate only if the actions are the same?  It's
> actually
> > > > > > worse, isn't it, if the actions are different, because then the
> flows
> > > > > > aren't just duplicates, they're inconsistent with each other.
> > > > >
> > > > > Actually, you are correct in that I'm doing something I shouldn't,
> but
> > > > > your solution is also doing something it shouldn't.  Yes, my
> keeping
> > > > > rules that are inconsistent is wrong, but discarding rules where
> > > > > the action changes (which is what you proposed) doesn't work
> either.
> > > > >
> > > > > How about if we split the difference?  First, verify that the uuid
> > > > > is the same, then check the actions.  If they are the same, spawn a
> > > > > warning, and if they aren't replace the old actions with the new
> ones.
> > > > > This appears to pass unit tests as well (and I think it's more
> > > correct),
> > > > > in which case the difference becomes:
> > > > >
> > > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > > index d10dcc6..0b8326e 100644
> > > > > --- a/ovn/controller/ofctrl.c
> > > > > +++ b/ovn/controller/ofctrl.c
> > > > > @@ -554,13 +554,21 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> > > priority,
> > > > >      struct ovn_flow *d;
> > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > -            static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5,
> > > 1);
> > > > > -            char *s = ovn_flow_to_string(f);
> > > > > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > > > > "UUID_FMT,
> > > > > -                         s, UUID_ARGS(&f->uuid));
> > > > > -            ovn_flow_destroy(f);
> > > > > -            free(s);
> > > > > -            return;
> > > > > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > +                              d->ofpacts, d->ofpacts_len)) {
> > > > > +                static struct vlog_rate_limit rl =
> > > VLOG_RATE_LIMIT_INIT(5,
> > > > > 1);
> > > > > +                char *s = ovn_flow_to_string(f);
> > > > > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for
> parent
> > > > > "UUID_FMT
> > > > > +                             s, UUID_ARGS(&f->uuid));
> > > > > +                ovn_flow_destroy(f);
> > > > > +                free(s);
> > > > > +                return;
> > > > > +            } else {
> > > > > +                /* Update the action. */
> > > > > +                free(d->ofpacts);
> > > > > +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> > > > > +                d->ofpacts_len = f->ofpacts_len;
> > > > > +            }
> > > > >          }
> > > > >      }
> > > > >
> > > >
> > > > I'm OK with updating the actions, but I'd prefer to warn about it
> anyway
> > > > because I still think it's a bug; if the "new" actions happen to work
> > > > better than the "old" actions, I think that's just a coincidence.
> > >
> > > While I think that's going to be a *lot* of warnings, I'm ok with
> warning
> > > in that case as well for now and we can argue the point later on after
> we
> > > have more operational experience with the patch set.
> >
> > So do we have a lot of code that's using ofctrl_add_flow() to actually
> > replace an existing flow instead of adding one?  Maybe that code should
> > be using ofctrl_set_flow() instead?
> >
> > I'm trying to understand the intent here instead of just the code.
> >
> > > You want to add the additional warning code or should I?
> >
> > I'm happy to write the code once I understand what's going on.
> 
> Sorry for breaking the thread but while I can see your reply in the
> archives,
> something in the server chain isn't it dropping it into my mailbox,
> and I don't want to look like I'm ignoring it.
> 
> Based on looking through the logs when running the unit tests, I believe
> (but I don't know for sure) that the vast majority of warnings
> will come from creating a completely identical flow (and I admit that I
> don't know why *yet* - that was a topic for another day) - the reason I
> can't say for sure is that since I've not had a separate message handling
> the action change case, I've had nothing to compare to.
> 
> Since we are going to warn on both branches, I'd like to have
> the warning message be different so that there is a canary to work
> backwards
> from to figure out what's driving what. Something like the following:
> 
> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> index d10dcc6..7fd6ff8 100644
> --- a/ovn/controller/ofctrl.c
> +++ b/ovn/controller/ofctrl.c
> @@ -554,13 +554,28 @@ ofctrl_add_flow(uint8_t table_id, uint16_t priority,
>      struct ovn_flow *d;
>      LIST_FOR_EACH (d, list_node, &existing) {
>          if (uuid_equals(&f->uuid, &d->uuid)) {
> -            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> -            char *s = ovn_flow_to_string(f);
> -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> "UUID_FMT,
> -                         s, UUID_ARGS(&f->uuid));
> -            ovn_flow_destroy(f);
> -            free(s);
> -            return;
> +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> +                              d->ofpacts, d->ofpacts_len)) {
> +                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 1);
> +                char *s = ovn_flow_to_string(f);
> +                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> "UUID_FMT
> +                             s, UUID_ARGS(&f->uuid));
> +                ovn_flow_destroy(f);
> +                free(s);
> +                return;
> +            } else {
> +                /* Update the action. */
> +                free(d->ofpacts);
> +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> +                d->ofpacts_len = f->ofpacts_len;
> +                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 1);
> +                char *s = ovn_flow_to_string(f);
> +                VLOG_WARN_RL(&rl, "found flow %s with modified action for
> paren
> +                             s, UUID_ARGS(&f->uuid));
> +                ovn_flow_destroy(f);
> +                free(s);
> +                return;
> +            }
>          }
> 
> Does that make sense?

Here's a version for your review:
        http://openvswitch.org/pipermail/dev/2016-July/075709.html
        http://openvswitch.org/pipermail/dev/2016-July/075710.html
Ryan Moats July 20, 2016, 12:19 a.m. UTC | #2
Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 07:08:04 PM:

> From: Ben Pfaff <blp@ovn.org>
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: ovs-dev <dev@openvswitch.org>
> Date: 07/19/2016 07:08 PM
> Subject: Re: [ovs-dev] [ovs-dev, v23, 2/2] ovn-controller: Add
> incremental processing to lflow_run and physical_run
>
> On Tue, Jul 19, 2016 at 04:55:49PM -0500, Ryan Moats wrote:
> >
> >
> > > On Tue, Jul 19, 2016 at 02:57:43PM -0500, Ryan Moats wrote:
> > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 02:54:36 PM:
> > > >
> > > > > From: Ben Pfaff <blp at ovn.org>
> > > > To: Ryan Moats/Omaha/IBM at IBMUS>
> > > > Cc: dev at openvswitch.org
> > > > Date: 07/19/2016 02:54 PM
> > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > > processing to lflow_run and physical_run
> > > > >
> > > > > On Tue, Jul 19, 2016 at 02:18:11PM -0500, Ryan Moats wrote:
> > > > > >
> > > > > >
> > > > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 01:39:43 PM:
> > > > > >
> > > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > > Cc: dev at openvswitch.org
> > > > > > > Date: 07/19/2016 01:41 PM
> > > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add
incremental
> > > > > > > processing to lflow_run and physical_run
> > > > > > >
> > > > > > > On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Ben Pfaff <blp at ovn.org> wrote on 07/19/2016 12:57:23 AM:
> > > > > > > >
> > > > > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > > > > Cc: dev at openvswitch.org
> > > > > > > > > Date: 07/19/2016 12:57 AM
> > > > > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add
> > incremental
> > > > > > > > > processing to lflow_run and physical_run
> > > > > > > > >
> > > > > > > > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats
wrote:
> > > > > > > > > > This code changes to allow incremental processing of
the
> > > > > > > > > > logical flow and physical binding tables whenver
possible.
> > > > > > > > > >
> > > > > > > > > > Note: flows created by physical_run for
multicast_groups
> > are
> > > > > > > > > > *NOT* handled incrementally due to to be solved issues
> > > > > > > > > > with GWs and local routers.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Ryan Moats <rmoats at us.ibm.com>
> > > > > > > > >
> > > > > > > > > With this applied, I get a number of apparently
consistent
> > test
> > > > > > failures
> > > > > > > > > (literally no passes in 5+ runs) that I don't see, at
least
> > not
> > > > > > > > > consistently, if I revert it:
> > > > > > > > >
> > > > > > > > >     OVN end-to-end tests
> > > > > > > > >
> > > > > > > > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS
> > FAILED
> > > > > > > > (ovn.at:1307)
> > > > > > > > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS
> > FAILED
> > > > > > > > (ovn.at:1473)
> > > > > > > > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR
> > FAILED
> > > > > > > > (ovn.at:1887)
> > > > > > > > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs
> > FAILED
> > > > > > > > (ovn.at:2416)
> > > > > > > > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static
routes
> > > > FAILED
> > > > > > > > > (ovn.at:3053)
> > > > > > > > >
> > > > > > > > > Do these pass reasonably often for you?  (Maybe my
changes to
> > the
> > > > > > > > > previous patch screwed something up?)
> > > > > > > > >
> > > > > > > >
> > > > > > > > First, not all of the above cases failed for me even with a
> > rebase
> > > > of
> > > > > > the
> > > > > > > > last patch onto what is currently in master - the last two
> > passed
> > > > for
> > > > > > me
> > > > > > > > on either the first and second run, so I'm looking at the
first
> > > > three
> > > > > > for
> > > > > > > > the rest of this message.
> > > > > > > >
> > > > > > > > > Double checking by going back to my original first patch
> > verified
> > > > that
> > > > > > all
> > > > > > > > of the tests in question passed.  Going through and
checking
> > each
> > > > of
> > > > > > the
> > > > > > > > changes that you made to the first patch, the following is
the
> > diff
> > > > I
> > > > > > > > needed
> > > > > > > > to make things work again:
> > > > > > > >
> > > > > > > > diff --git a/ovn/controller/ofctrl.c
b/ovn/controller/ofctrl.c
> > > > > > > > index d10dcc6..e113213 100644
> > > > > > > > --- a/ovn/controller/ofctrl.c
> > > > > > > > +++ b/ovn/controller/ofctrl.c
> > > > > > > > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id,
uint16_t
> > > > > > priority,
> > > > > > > >      ovn_flow_lookup(&match_flow_table, f, &existing);
> > > > > > > >      struct ovn_flow *d;
> > > > > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > > > > > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > > > +        if (f->table_id == d->table_id && f->priority ==
d->
> > > > priority
> > > > > > > > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > > > > +                             d->ofpacts, d->ofpacts_len)
> > > > > > > > +            && uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > > >              static struct vlog_rate_limit rl =
> > > > VLOG_RATE_LIMIT_INIT(5,
> > > > > > 1);
> > > > > > > >              char *s = ovn_flow_to_string(f);
> > > > > > > >              VLOG_WARN_RL(&rl, "found duplicate flow %s for
> > parent
> > > > > > > > "UUID_FMT,
> > > > > > > >
> > > > > > > > Hopefully that will get you past the test failures and you
can
> > push
> > > > > > > > both it and the rest of the last incremental processing
patch.
> > > > > > >
> > > > > > > OK...
> > > > > > >
> > > > > > > Help me understand why that change is correct.
> > > > > > >
> > > > > > > I deleted the table_id and priority checks because
> > ovn_flow_lookup()
> > > > > > > only returns flows where those are equal anyhow, so they
seemed
> > > > > > > completely redundant.  That leaves the actions comparison.
Why
> > would
> > > > we
> > > > > > > report a duplicate only if the actions are the same?  It's
> > actually
> > > > > > > worse, isn't it, if the actions are different, because then
the
> > flows
> > > > > > > aren't just duplicates, they're inconsistent with each other.
> > > > > >
> > > > > > Actually, you are correct in that I'm doing something I
shouldn't,
> > but
> > > > > > your solution is also doing something it shouldn't.  Yes, my
> > keeping
> > > > > > rules that are inconsistent is wrong, but discarding rules
where
> > > > > > the action changes (which is what you proposed) doesn't work
> > either.
> > > > > >
> > > > > > How about if we split the difference?  First, verify that the
uuid
> > > > > > is the same, then check the actions.  If they are the same,
spawn a
> > > > > > warning, and if they aren't replace the old actions with the
new
> > ones.
> > > > > > This appears to pass unit tests as well (and I think it's more
> > > > correct),
> > > > > > in which case the difference becomes:
> > > > > >
> > > > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > > > index d10dcc6..0b8326e 100644
> > > > > > --- a/ovn/controller/ofctrl.c
> > > > > > +++ b/ovn/controller/ofctrl.c
> > > > > > @@ -554,13 +554,21 @@ ofctrl_add_flow(uint8_t table_id,
uint16_t
> > > > priority,
> > > > > >      struct ovn_flow *d;
> > > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > > >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > > -            static struct vlog_rate_limit rl =
> > VLOG_RATE_LIMIT_INIT(5,
> > > > 1);
> > > > > > -            char *s = ovn_flow_to_string(f);
> > > > > > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for
parent
> > > > > > "UUID_FMT,
> > > > > > -                         s, UUID_ARGS(&f->uuid));
> > > > > > -            ovn_flow_destroy(f);
> > > > > > -            free(s);
> > > > > > -            return;
> > > > > > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > > +                              d->ofpacts, d->ofpacts_len)) {
> > > > > > +                static struct vlog_rate_limit rl =
> > > > VLOG_RATE_LIMIT_INIT(5,
> > > > > > 1);
> > > > > > +                char *s = ovn_flow_to_string(f);
> > > > > > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for
> > parent
> > > > > > "UUID_FMT
> > > > > > +                             s, UUID_ARGS(&f->uuid));
> > > > > > +                ovn_flow_destroy(f);
> > > > > > +                free(s);
> > > > > > +                return;
> > > > > > +            } else {
> > > > > > +                /* Update the action. */
> > > > > > +                free(d->ofpacts);
> > > > > > +                d->ofpacts = xmemdup(f->ofpacts, f->
ofpacts_len);
> > > > > > +                d->ofpacts_len = f->ofpacts_len;
> > > > > > +            }
> > > > > >          }
> > > > > >      }
> > > > > >
> > > > >
> > > > > I'm OK with updating the actions, but I'd prefer to warn about it
> > anyway
> > > > > because I still think it's a bug; if the "new" actions happen to
work
> > > > > better than the "old" actions, I think that's just a coincidence.
> > > >
> > > > While I think that's going to be a *lot* of warnings, I'm ok with
> > warning
> > > > in that case as well for now and we can argue the point later on
after
> > we
> > > > have more operational experience with the patch set.
> > >
> > > So do we have a lot of code that's using ofctrl_add_flow() to
actually
> > > replace an existing flow instead of adding one?  Maybe that code
should
> > > be using ofctrl_set_flow() instead?
> > >
> > > I'm trying to understand the intent here instead of just the code.
> > >
> > > > You want to add the additional warning code or should I?
> > >
> > > I'm happy to write the code once I understand what's going on.
> >
> > Sorry for breaking the thread but while I can see your reply in the
> > archives,
> > something in the server chain isn't it dropping it into my mailbox,
> > and I don't want to look like I'm ignoring it.
> >
> > Based on looking through the logs when running the unit tests, I
believe
> > (but I don't know for sure) that the vast majority of warnings
> > will come from creating a completely identical flow (and I admit that I
> > don't know why *yet* - that was a topic for another day) - the reason I
> > can't say for sure is that since I've not had a separate message
handling
> > the action change case, I've had nothing to compare to.
> >
> > Since we are going to warn on both branches, I'd like to have
> > the warning message be different so that there is a canary to work
> > backwards
> > from to figure out what's driving what. Something like the following:
> >
> > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > index d10dcc6..7fd6ff8 100644
> > --- a/ovn/controller/ofctrl.c
> > +++ b/ovn/controller/ofctrl.c
> > @@ -554,13 +554,28 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
priority,
> >      struct ovn_flow *d;
> >      LIST_FOR_EACH (d, list_node, &existing) {
> >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > -            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
> > -            char *s = ovn_flow_to_string(f);
> > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > "UUID_FMT,
> > -                         s, UUID_ARGS(&f->uuid));
> > -            ovn_flow_destroy(f);
> > -            free(s);
> > -            return;
> > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > +                              d->ofpacts, d->ofpacts_len)) {
> > +                static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> > 1);
> > +                char *s = ovn_flow_to_string(f);
> > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > "UUID_FMT
> > +                             s, UUID_ARGS(&f->uuid));
> > +                ovn_flow_destroy(f);
> > +                free(s);
> > +                return;
> > +            } else {
> > +                /* Update the action. */
> > +                free(d->ofpacts);
> > +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> > +                d->ofpacts_len = f->ofpacts_len;
> > +                static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> > 1);
> > +                char *s = ovn_flow_to_string(f);
> > +                VLOG_WARN_RL(&rl, "found flow %s with modified action
for
> > paren
> > +                             s, UUID_ARGS(&f->uuid));
> > +                ovn_flow_destroy(f);
> > +                free(s);
> > +                return;
> > +            }
> >          }
> >
> > Does that make sense?
>
> Here's a version for your review:
>         http://openvswitch.org/pipermail/dev/2016-July/075709.html
>         http://openvswitch.org/pipermail/dev/2016-July/075710.html
>

Looking at them now

Ryan
diff mbox

Patch

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index d10dcc6..7fd6ff8 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -554,13 +554,28 @@  ofctrl_add_flow(uint8_t table_id, uint16_t priority,
     struct ovn_flow *d;
     LIST_FOR_EACH (d, list_node, &existing) {
         if (uuid_equals(&f->uuid, &d->uuid)) {
-            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
-            char *s = ovn_flow_to_string(f);
-            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
"UUID_FMT,
-                         s, UUID_ARGS(&f->uuid));
-            ovn_flow_destroy(f);
-            free(s);
-            return;
+            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
+                              d->ofpacts, d->ofpacts_len)) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
+                char *s = ovn_flow_to_string(f);
+                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
"UUID_FMT
+                             s, UUID_ARGS(&f->uuid));
+                ovn_flow_destroy(f);
+                free(s);
+                return;
+            } else {
+                /* Update the action. */
+                free(d->ofpacts);
+                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
+                d->ofpacts_len = f->ofpacts_len;
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
+                char *s = ovn_flow_to_string(f);
+                VLOG_WARN_RL(&rl, "found flow %s with modified action for
paren
+                             s, UUID_ARGS(&f->uuid));
+                ovn_flow_destroy(f);
+                free(s);