diff mbox

[ovs-dev,ovs-dev,v23,2/2] ovn-controller: Add incremental processing to lflow_run and physical_run

Message ID OF37D618BB.8FA4C4F2-ON86257FF5.00672CDA-86257FF5.006A0922@notes.na.collabserv.com
State Not Applicable
Headers show

Commit Message

Ryan Moats July 19, 2016, 7:18 p.m. UTC
Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 01:39:43 PM:

> From: Ben Pfaff <blp@ovn.org>
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: dev@openvswitch.org
> Date: 07/19/2016 01:41 PM
> Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> processing to lflow_run and physical_run
>
> On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> >
> >
> > Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 12:57:23 AM:
> >
> > > From: Ben Pfaff <blp@ovn.org>
> > > To: Ryan Moats/Omaha/IBM@IBMUS
> > > Cc: dev@openvswitch.org
> > > Date: 07/19/2016 12:57 AM
> > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > processing to lflow_run and physical_run
> > >
> > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats wrote:
> > > > This code changes to allow incremental processing of the
> > > > logical flow and physical binding tables whenver possible.
> > > >
> > > > Note: flows created by physical_run for multicast_groups are
> > > > *NOT* handled incrementally due to to be solved issues
> > > > with GWs and local routers.
> > > >
> > > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
> > >
> > > With this applied, I get a number of apparently consistent test
failures
> > > (literally no passes in 5+ runs) that I don't see, at least not
> > > consistently, if I revert it:
> > >
> > >     OVN end-to-end tests
> > >
> > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > (ovn.at:1307)
> > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS    FAILED
> > (ovn.at:1473)
> > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR           FAILED
> > (ovn.at:1887)
> > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
> > (ovn.at:2416)
> > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static routes FAILED
> > > (ovn.at:3053)
> > >
> > > Do these pass reasonably often for you?  (Maybe my changes to the
> > > previous patch screwed something up?)
> > >
> >
> > First, not all of the above cases failed for me even with a rebase of
the
> > last patch onto what is currently in master - the last two passed for
me
> > on either the first and second run, so I'm looking at the first three
for
> > the rest of this message.
> >
> > Double checking by going back to my original first patch verified that
all
> > of the tests in question passed.  Going through and checking each of
the
> > changes that you made to the first patch, the following is the diff I
> > needed
> > to make things work again:
> >
> > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > index d10dcc6..e113213 100644
> > --- a/ovn/controller/ofctrl.c
> > +++ b/ovn/controller/ofctrl.c
> > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
priority,
> >      ovn_flow_lookup(&match_flow_table, f, &existing);
> >      struct ovn_flow *d;
> >      LIST_FOR_EACH (d, list_node, &existing) {
> > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > +        if (f->table_id == d->table_id && f->priority == d->priority
> > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > +                             d->ofpacts, d->ofpacts_len)
> > +            && uuid_equals(&f->uuid, &d->uuid)) {
> >              static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
> >              char *s = ovn_flow_to_string(f);
> >              VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > "UUID_FMT,
> >
> > Hopefully that will get you past the test failures and you can push
> > both it and the rest of the last incremental processing patch.
>
> OK...
>
> Help me understand why that change is correct.
>
> I deleted the table_id and priority checks because ovn_flow_lookup()
> only returns flows where those are equal anyhow, so they seemed
> completely redundant.  That leaves the actions comparison.  Why would we
> report a duplicate only if the actions are the same?  It's actually
> worse, isn't it, if the actions are different, because then the flows
> aren't just duplicates, they're inconsistent with each other.

Actually, you are correct in that I'm doing something I shouldn't, but
your solution is also doing something it shouldn't.  Yes, my keeping
rules that are inconsistent is wrong, but discarding rules where
the action changes (which is what you proposed) doesn't work either.

How about if we split the difference?  First, verify that the uuid
is the same, then check the actions.  If they are the same, spawn a
warning, and if they aren't replace the old actions with the new ones.
This appears to pass unit tests as well (and I think it's more correct),
in which case the difference becomes:

         }
     }

Comments

Ben Pfaff July 19, 2016, 7:54 p.m. UTC | #1
On Tue, Jul 19, 2016 at 02:18:11PM -0500, Ryan Moats wrote:
> 
> 
> Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 01:39:43 PM:
> 
> > From: Ben Pfaff <blp@ovn.org>
> > To: Ryan Moats/Omaha/IBM@IBMUS
> > Cc: dev@openvswitch.org
> > Date: 07/19/2016 01:41 PM
> > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > processing to lflow_run and physical_run
> >
> > On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> > >
> > >
> > > Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 12:57:23 AM:
> > >
> > > > From: Ben Pfaff <blp@ovn.org>
> > > > To: Ryan Moats/Omaha/IBM@IBMUS
> > > > Cc: dev@openvswitch.org
> > > > Date: 07/19/2016 12:57 AM
> > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > processing to lflow_run and physical_run
> > > >
> > > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats wrote:
> > > > > This code changes to allow incremental processing of the
> > > > > logical flow and physical binding tables whenver possible.
> > > > >
> > > > > Note: flows created by physical_run for multicast_groups are
> > > > > *NOT* handled incrementally due to to be solved issues
> > > > > with GWs and local routers.
> > > > >
> > > > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
> > > >
> > > > With this applied, I get a number of apparently consistent test
> failures
> > > > (literally no passes in 5+ runs) that I don't see, at least not
> > > > consistently, if I revert it:
> > > >
> > > >     OVN end-to-end tests
> > > >
> > > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > > (ovn.at:1307)
> > > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS    FAILED
> > > (ovn.at:1473)
> > > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR           FAILED
> > > (ovn.at:1887)
> > > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
> > > (ovn.at:2416)
> > > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static routes FAILED
> > > > (ovn.at:3053)
> > > >
> > > > Do these pass reasonably often for you?  (Maybe my changes to the
> > > > previous patch screwed something up?)
> > > >
> > >
> > > First, not all of the above cases failed for me even with a rebase of
> the
> > > last patch onto what is currently in master - the last two passed for
> me
> > > on either the first and second run, so I'm looking at the first three
> for
> > > the rest of this message.
> > >
> > > Double checking by going back to my original first patch verified that
> all
> > > of the tests in question passed.  Going through and checking each of
> the
> > > changes that you made to the first patch, the following is the diff I
> > > needed
> > > to make things work again:
> > >
> > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > index d10dcc6..e113213 100644
> > > --- a/ovn/controller/ofctrl.c
> > > +++ b/ovn/controller/ofctrl.c
> > > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> priority,
> > >      ovn_flow_lookup(&match_flow_table, f, &existing);
> > >      struct ovn_flow *d;
> > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > > +        if (f->table_id == d->table_id && f->priority == d->priority
> > > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > +                             d->ofpacts, d->ofpacts_len)
> > > +            && uuid_equals(&f->uuid, &d->uuid)) {
> > >              static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 1);
> > >              char *s = ovn_flow_to_string(f);
> > >              VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > > "UUID_FMT,
> > >
> > > Hopefully that will get you past the test failures and you can push
> > > both it and the rest of the last incremental processing patch.
> >
> > OK...
> >
> > Help me understand why that change is correct.
> >
> > I deleted the table_id and priority checks because ovn_flow_lookup()
> > only returns flows where those are equal anyhow, so they seemed
> > completely redundant.  That leaves the actions comparison.  Why would we
> > report a duplicate only if the actions are the same?  It's actually
> > worse, isn't it, if the actions are different, because then the flows
> > aren't just duplicates, they're inconsistent with each other.
> 
> Actually, you are correct in that I'm doing something I shouldn't, but
> your solution is also doing something it shouldn't.  Yes, my keeping
> rules that are inconsistent is wrong, but discarding rules where
> the action changes (which is what you proposed) doesn't work either.
> 
> How about if we split the difference?  First, verify that the uuid
> is the same, then check the actions.  If they are the same, spawn a
> warning, and if they aren't replace the old actions with the new ones.
> This appears to pass unit tests as well (and I think it's more correct),
> in which case the difference becomes:
> 
> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> index d10dcc6..0b8326e 100644
> --- a/ovn/controller/ofctrl.c
> +++ b/ovn/controller/ofctrl.c
> @@ -554,13 +554,21 @@ ofctrl_add_flow(uint8_t table_id, uint16_t priority,
>      struct ovn_flow *d;
>      LIST_FOR_EACH (d, list_node, &existing) {
>          if (uuid_equals(&f->uuid, &d->uuid)) {
> -            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> -            char *s = ovn_flow_to_string(f);
> -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> "UUID_FMT,
> -                         s, UUID_ARGS(&f->uuid));
> -            ovn_flow_destroy(f);
> -            free(s);
> -            return;
> +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> +                              d->ofpacts, d->ofpacts_len)) {
> +                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 1);
> +                char *s = ovn_flow_to_string(f);
> +                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> "UUID_FMT
> +                             s, UUID_ARGS(&f->uuid));
> +                ovn_flow_destroy(f);
> +                free(s);
> +                return;
> +            } else {
> +                /* Update the action. */
> +                free(d->ofpacts);
> +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> +                d->ofpacts_len = f->ofpacts_len;
> +            }
>          }
>      }
> 

I'm OK with updating the actions, but I'd prefer to warn about it anyway
because I still think it's a bug; if the "new" actions happen to work
better than the "old" actions, I think that's just a coincidence.
Ryan Moats July 19, 2016, 7:57 p.m. UTC | #2
Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 02:54:36 PM:

> From: Ben Pfaff <blp@ovn.org>
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: dev@openvswitch.org
> Date: 07/19/2016 02:54 PM
> Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> processing to lflow_run and physical_run
>
> On Tue, Jul 19, 2016 at 02:18:11PM -0500, Ryan Moats wrote:
> >
> >
> > Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 01:39:43 PM:
> >
> > > From: Ben Pfaff <blp@ovn.org>
> > > To: Ryan Moats/Omaha/IBM@IBMUS
> > > Cc: dev@openvswitch.org
> > > Date: 07/19/2016 01:41 PM
> > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > processing to lflow_run and physical_run
> > >
> > > On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> > > >
> > > >
> > > > Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 12:57:23 AM:
> > > >
> > > > > From: Ben Pfaff <blp@ovn.org>
> > > > > To: Ryan Moats/Omaha/IBM@IBMUS
> > > > > Cc: dev@openvswitch.org
> > > > > Date: 07/19/2016 12:57 AM
> > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > > processing to lflow_run and physical_run
> > > > >
> > > > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats wrote:
> > > > > > This code changes to allow incremental processing of the
> > > > > > logical flow and physical binding tables whenver possible.
> > > > > >
> > > > > > Note: flows created by physical_run for multicast_groups are
> > > > > > *NOT* handled incrementally due to to be solved issues
> > > > > > with GWs and local routers.
> > > > > >
> > > > > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
> > > > >
> > > > > With this applied, I get a number of apparently consistent test
> > failures
> > > > > (literally no passes in 5+ runs) that I don't see, at least not
> > > > > consistently, if I revert it:
> > > > >
> > > > >     OVN end-to-end tests
> > > > >
> > > > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > > > (ovn.at:1307)
> > > > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS    FAILED
> > > > (ovn.at:1473)
> > > > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR           FAILED
> > > > (ovn.at:1887)
> > > > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
> > > > (ovn.at:2416)
> > > > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static routes
FAILED
> > > > > (ovn.at:3053)
> > > > >
> > > > > Do these pass reasonably often for you?  (Maybe my changes to the
> > > > > previous patch screwed something up?)
> > > > >
> > > >
> > > > First, not all of the above cases failed for me even with a rebase
of
> > the
> > > > last patch onto what is currently in master - the last two passed
for
> > me
> > > > on either the first and second run, so I'm looking at the first
three
> > for
> > > > the rest of this message.
> > > >
> > > > Double checking by going back to my original first patch verified
that
> > all
> > > > of the tests in question passed.  Going through and checking each
of
> > the
> > > > changes that you made to the first patch, the following is the diff
I
> > > > needed
> > > > to make things work again:
> > > >
> > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > index d10dcc6..e113213 100644
> > > > --- a/ovn/controller/ofctrl.c
> > > > +++ b/ovn/controller/ofctrl.c
> > > > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> > priority,
> > > >      ovn_flow_lookup(&match_flow_table, f, &existing);
> > > >      struct ovn_flow *d;
> > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > +        if (f->table_id == d->table_id && f->priority == d->
priority
> > > > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > +                             d->ofpacts, d->ofpacts_len)
> > > > +            && uuid_equals(&f->uuid, &d->uuid)) {
> > > >              static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> > 1);
> > > >              char *s = ovn_flow_to_string(f);
> > > >              VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > > > "UUID_FMT,
> > > >
> > > > Hopefully that will get you past the test failures and you can push
> > > > both it and the rest of the last incremental processing patch.
> > >
> > > OK...
> > >
> > > Help me understand why that change is correct.
> > >
> > > I deleted the table_id and priority checks because ovn_flow_lookup()
> > > only returns flows where those are equal anyhow, so they seemed
> > > completely redundant.  That leaves the actions comparison.  Why would
we
> > > report a duplicate only if the actions are the same?  It's actually
> > > worse, isn't it, if the actions are different, because then the flows
> > > aren't just duplicates, they're inconsistent with each other.
> >
> > Actually, you are correct in that I'm doing something I shouldn't, but
> > your solution is also doing something it shouldn't.  Yes, my keeping
> > rules that are inconsistent is wrong, but discarding rules where
> > the action changes (which is what you proposed) doesn't work either.
> >
> > How about if we split the difference?  First, verify that the uuid
> > is the same, then check the actions.  If they are the same, spawn a
> > warning, and if they aren't replace the old actions with the new ones.
> > This appears to pass unit tests as well (and I think it's more
correct),
> > in which case the difference becomes:
> >
> > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > index d10dcc6..0b8326e 100644
> > --- a/ovn/controller/ofctrl.c
> > +++ b/ovn/controller/ofctrl.c
> > @@ -554,13 +554,21 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
priority,
> >      struct ovn_flow *d;
> >      LIST_FOR_EACH (d, list_node, &existing) {
> >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > -            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
> > -            char *s = ovn_flow_to_string(f);
> > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > "UUID_FMT,
> > -                         s, UUID_ARGS(&f->uuid));
> > -            ovn_flow_destroy(f);
> > -            free(s);
> > -            return;
> > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > +                              d->ofpacts, d->ofpacts_len)) {
> > +                static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> > 1);
> > +                char *s = ovn_flow_to_string(f);
> > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > "UUID_FMT
> > +                             s, UUID_ARGS(&f->uuid));
> > +                ovn_flow_destroy(f);
> > +                free(s);
> > +                return;
> > +            } else {
> > +                /* Update the action. */
> > +                free(d->ofpacts);
> > +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> > +                d->ofpacts_len = f->ofpacts_len;
> > +            }
> >          }
> >      }
> >
>
> I'm OK with updating the actions, but I'd prefer to warn about it anyway
> because I still think it's a bug; if the "new" actions happen to work
> better than the "old" actions, I think that's just a coincidence.

While I think that's going to be a *lot* of warnings, I'm ok with warning
in that case as well for now and we can argue the point later on after we
have more operational experience with the patch set.

You want to add the additional warning code or should I?

Ryan
Ben Pfaff July 19, 2016, 9:14 p.m. UTC | #3
On Tue, Jul 19, 2016 at 02:57:43PM -0500, Ryan Moats wrote:
> Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 02:54:36 PM:
> 
> > From: Ben Pfaff <blp@ovn.org>
> > To: Ryan Moats/Omaha/IBM@IBMUS
> > Cc: dev@openvswitch.org
> > Date: 07/19/2016 02:54 PM
> > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > processing to lflow_run and physical_run
> >
> > On Tue, Jul 19, 2016 at 02:18:11PM -0500, Ryan Moats wrote:
> > >
> > >
> > > Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 01:39:43 PM:
> > >
> > > > From: Ben Pfaff <blp@ovn.org>
> > > > To: Ryan Moats/Omaha/IBM@IBMUS
> > > > Cc: dev@openvswitch.org
> > > > Date: 07/19/2016 01:41 PM
> > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > processing to lflow_run and physical_run
> > > >
> > > > On Tue, Jul 19, 2016 at 08:10:05AM -0500, Ryan Moats wrote:
> > > > >
> > > > >
> > > > > Ben Pfaff <blp@ovn.org> wrote on 07/19/2016 12:57:23 AM:
> > > > >
> > > > > > From: Ben Pfaff <blp@ovn.org>
> > > > > > To: Ryan Moats/Omaha/IBM@IBMUS
> > > > > > Cc: dev@openvswitch.org
> > > > > > Date: 07/19/2016 12:57 AM
> > > > > > Subject: Re: [ovs-dev, v23, 2/2] ovn-controller: Add incremental
> > > > > > processing to lflow_run and physical_run
> > > > > >
> > > > > > On Mon, Jul 18, 2016 at 04:21:17PM -0500, Ryan Moats wrote:
> > > > > > > This code changes to allow incremental processing of the
> > > > > > > logical flow and physical binding tables whenver possible.
> > > > > > >
> > > > > > > Note: flows created by physical_run for multicast_groups are
> > > > > > > *NOT* handled incrementally due to to be solved issues
> > > > > > > with GWs and local routers.
> > > > > > >
> > > > > > > Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
> > > > > >
> > > > > > With this applied, I get a number of apparently consistent test
> > > failures
> > > > > > (literally no passes in 5+ runs) that I don't see, at least not
> > > > > > consistently, if I revert it:
> > > > > >
> > > > > >     OVN end-to-end tests
> > > > > >
> > > > > >     2106: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > > > > (ovn.at:1307)
> > > > > >     2107: ovn -- 3 HVs, 1 VIFs/HV, 1 software GW, 1 LS    FAILED
> > > > > (ovn.at:1473)
> > > > > >     2108: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR           FAILED
> > > > > (ovn.at:1887)
> > > > > >     2110: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
> > > > > (ovn.at:2416)
> > > > > >     2115: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> FAILED
> > > > > > (ovn.at:3053)
> > > > > >
> > > > > > Do these pass reasonably often for you?  (Maybe my changes to the
> > > > > > previous patch screwed something up?)
> > > > > >
> > > > >
> > > > > First, not all of the above cases failed for me even with a rebase
> of
> > > the
> > > > > last patch onto what is currently in master - the last two passed
> for
> > > me
> > > > > on either the first and second run, so I'm looking at the first
> three
> > > for
> > > > > the rest of this message.
> > > > >
> > > > > Double checking by going back to my original first patch verified
> that
> > > all
> > > > > of the tests in question passed.  Going through and checking each
> of
> > > the
> > > > > changes that you made to the first patch, the following is the diff
> I
> > > > > needed
> > > > > to make things work again:
> > > > >
> > > > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > > > index d10dcc6..e113213 100644
> > > > > --- a/ovn/controller/ofctrl.c
> > > > > +++ b/ovn/controller/ofctrl.c
> > > > > @@ -553,7 +553,10 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> > > priority,
> > > > >      ovn_flow_lookup(&match_flow_table, f, &existing);
> > > > >      struct ovn_flow *d;
> > > > >      LIST_FOR_EACH (d, list_node, &existing) {
> > > > > -        if (uuid_equals(&f->uuid, &d->uuid)) {
> > > > > +        if (f->table_id == d->table_id && f->priority == d->
> priority
> > > > > +            && ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > > > +                             d->ofpacts, d->ofpacts_len)
> > > > > +            && uuid_equals(&f->uuid, &d->uuid)) {
> > > > >              static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5,
> > > 1);
> > > > >              char *s = ovn_flow_to_string(f);
> > > > >              VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > > > > "UUID_FMT,
> > > > >
> > > > > Hopefully that will get you past the test failures and you can push
> > > > > both it and the rest of the last incremental processing patch.
> > > >
> > > > OK...
> > > >
> > > > Help me understand why that change is correct.
> > > >
> > > > I deleted the table_id and priority checks because ovn_flow_lookup()
> > > > only returns flows where those are equal anyhow, so they seemed
> > > > completely redundant.  That leaves the actions comparison.  Why would
> we
> > > > report a duplicate only if the actions are the same?  It's actually
> > > > worse, isn't it, if the actions are different, because then the flows
> > > > aren't just duplicates, they're inconsistent with each other.
> > >
> > > Actually, you are correct in that I'm doing something I shouldn't, but
> > > your solution is also doing something it shouldn't.  Yes, my keeping
> > > rules that are inconsistent is wrong, but discarding rules where
> > > the action changes (which is what you proposed) doesn't work either.
> > >
> > > How about if we split the difference?  First, verify that the uuid
> > > is the same, then check the actions.  If they are the same, spawn a
> > > warning, and if they aren't replace the old actions with the new ones.
> > > This appears to pass unit tests as well (and I think it's more
> correct),
> > > in which case the difference becomes:
> > >
> > > diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > > index d10dcc6..0b8326e 100644
> > > --- a/ovn/controller/ofctrl.c
> > > +++ b/ovn/controller/ofctrl.c
> > > @@ -554,13 +554,21 @@ ofctrl_add_flow(uint8_t table_id, uint16_t
> priority,
> > >      struct ovn_flow *d;
> > >      LIST_FOR_EACH (d, list_node, &existing) {
> > >          if (uuid_equals(&f->uuid, &d->uuid)) {
> > > -            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 1);
> > > -            char *s = ovn_flow_to_string(f);
> > > -            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > > "UUID_FMT,
> > > -                         s, UUID_ARGS(&f->uuid));
> > > -            ovn_flow_destroy(f);
> > > -            free(s);
> > > -            return;
> > > +            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
> > > +                              d->ofpacts, d->ofpacts_len)) {
> > > +                static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5,
> > > 1);
> > > +                char *s = ovn_flow_to_string(f);
> > > +                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
> > > "UUID_FMT
> > > +                             s, UUID_ARGS(&f->uuid));
> > > +                ovn_flow_destroy(f);
> > > +                free(s);
> > > +                return;
> > > +            } else {
> > > +                /* Update the action. */
> > > +                free(d->ofpacts);
> > > +                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
> > > +                d->ofpacts_len = f->ofpacts_len;
> > > +            }
> > >          }
> > >      }
> > >
> >
> > I'm OK with updating the actions, but I'd prefer to warn about it anyway
> > because I still think it's a bug; if the "new" actions happen to work
> > better than the "old" actions, I think that's just a coincidence.
> 
> While I think that's going to be a *lot* of warnings, I'm ok with warning
> in that case as well for now and we can argue the point later on after we
> have more operational experience with the patch set.

So do we have a lot of code that's using ofctrl_add_flow() to actually
replace an existing flow instead of adding one?  Maybe that code should
be using ofctrl_set_flow() instead?

I'm trying to understand the intent here instead of just the code.

> You want to add the additional warning code or should I?

I'm happy to write the code once I understand what's going on.
diff mbox

Patch

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index d10dcc6..0b8326e 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -554,13 +554,21 @@  ofctrl_add_flow(uint8_t table_id, uint16_t priority,
     struct ovn_flow *d;
     LIST_FOR_EACH (d, list_node, &existing) {
         if (uuid_equals(&f->uuid, &d->uuid)) {
-            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
-            char *s = ovn_flow_to_string(f);
-            VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
"UUID_FMT,
-                         s, UUID_ARGS(&f->uuid));
-            ovn_flow_destroy(f);
-            free(s);
-            return;
+            if (ofpacts_equal(f->ofpacts, f->ofpacts_len,
+                              d->ofpacts, d->ofpacts_len)) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
+                char *s = ovn_flow_to_string(f);
+                VLOG_WARN_RL(&rl, "found duplicate flow %s for parent
"UUID_FMT
+                             s, UUID_ARGS(&f->uuid));
+                ovn_flow_destroy(f);
+                free(s);
+                return;
+            } else {
+                /* Update the action. */
+                free(d->ofpacts);
+                d->ofpacts = xmemdup(f->ofpacts, f->ofpacts_len);
+                d->ofpacts_len = f->ofpacts_len;
+            }