diff mbox series

[ovs-dev,v3] bond: Always revalidate unbalanced bonds when active member changes

Message ID 20231008052632.613791-1-mkp@redhat.com
State Changes Requested
Headers show
Series [ovs-dev,v3] bond: Always revalidate unbalanced bonds when active member changes | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Mike Pattrick Oct. 8, 2023, 5:26 a.m. UTC
Currently a bond will not always revalidate when an active member
changes. This can result in counter-intuitive behaviors like the fact
that using ovs-appctl bond/set-active-member will cause the bond to
revalidate but changing other_config:bond-primary will not trigger a
revalidate in the bond.

When revalidation is not set but the active member changes in an
unbalanced bond, OVS may send traffic out of previously active member
instead of the new active member.

This change will always mark unbalanced bonds for revalidation if the
active member changes.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
Signed-off-by: Mike Pattrick <mkp@redhat.com>
---
v2: Added a test
v3: Made the test more reliable
---
 ofproto/bond.c          |  8 +++++--
 tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 2 deletions(-)

Comments

0-day Robot Oct. 8, 2023, 5:39 a.m. UTC | #1
Bleep bloop.  Greetings Mike Pattrick, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject summary should end with a dot.
Subject: bond: Always revalidate unbalanced bonds when active member changes
Lines checked: 135, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Eelco Chaudron Oct. 13, 2023, 8:28 a.m. UTC | #2
On 8 Oct 2023, at 7:26, Mike Pattrick wrote:

> Currently a bond will not always revalidate when an active member
> changes. This can result in counter-intuitive behaviors like the fact
> that using ovs-appctl bond/set-active-member will cause the bond to
> revalidate but changing other_config:bond-primary will not trigger a
> revalidate in the bond.
>
> When revalidation is not set but the active member changes in an
> unbalanced bond, OVS may send traffic out of previously active member
> instead of the new active member.
>
> This change will always mark unbalanced bonds for revalidation if the
> active member changes.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
> Signed-off-by: Mike Pattrick <mkp@redhat.com>

Hi Mike,

Thanks for the patch. Some comments below, and the subject needs to end with a dot.

//Eelco


> ---
> v2: Added a test
> v3: Made the test more reliable
> ---
>  ofproto/bond.c          |  8 +++++--
>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index cfdf44f85..fb108d30a 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
>  static bool bond_is_falling_back_to_ab(const struct bond *);
>  static void bond_add_lb_output_buckets(const struct bond *);
>  static void bond_del_lb_output_buckets(const struct bond *);
> +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
>
>
>  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
>
>  static void
>  bond_active_member_changed(struct bond *bond)
> +    OVS_REQ_WRLOCK(rwlock)
>  {
>      if (bond->active_member) {
>          struct eth_addr mac;
>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>          bond->active_member_mac = mac;
> +        if (!bond_is_balanced(bond)) {
> +            bond->bond_revalidate = true;
> +        }
>      } else {
>          bond->active_member_mac = eth_addr_zero;
>      }
> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
>  /* Rebalancing. */
>
>  static bool
> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)

Any reason why the annotation was removed? clang would be smart enough to see that a write lock is fine for a readlock. It seems to compile fine here. Or were you trying to clean this up and move it to the next line :)

> +bond_is_balanced(const struct bond *bond)
>  {
>      return bond->rebalance_interval
>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
>      }
>
>      if (bond->active_member != member) {
> -        bond->bond_revalidate = true;
>          bond->active_member = member;
>          VLOG_INFO("bond %s: active member is now %s",
>                    bond->name, member->name);
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 945037ec0..7075c35ea 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([datapath - ping over active-backup bond])

This does not really represent the test case. What about ‘datapath - bond active-backup failover’

> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "priority=1 actions=drop"])
> +AT_CHECK([ovs-ofctl add-flow br1 "priority=1 actions=drop"])
> +AT_CHECK([ovs-ofctl add-flow br0 "ip,icmp actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br1 "ip,icmp actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br0 "arp actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br1 "arp actions=normal"])

Do we need this ‘complex’ set of rules, will the following not work?

AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])

> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
> +on_exit 'ip link del link0a'
> +on_exit 'ip link del link0b'
> +AT_CHECK([ip link add link0a type veth peer name link1a])
> +AT_CHECK([ip link add link0b type veth peer name link1b])
> +
> +AT_CHECK([ip link set dev link0a up])
> +AT_CHECK([ip link set dev link1a up])
> +AT_CHECK([ip link set dev link0b up])
> +AT_CHECK([ip link set dev link1b up])
> +
> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
> +
> +dnl Set primary

dnl Set the primary bond member.


> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
> +                    set port bond1 other_config:bond-primary=link1a])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
> +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Change primary

dnl Change the primary bond member.

> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
> +                    set port bond1 other_config:bond-primary=link1b])
> +
> +sleep 0

This short sleep makes it work, however, it could result in a flaky test. Is there something else we could wait for to be 100% sure we are ready?

> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.5 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
> +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> +])

Should we not add some code to verify traffic goes over the correct link?

> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([datapath - ping over vxlan tunnel])
>  OVS_CHECK_TUNNEL_TSO()
>  OVS_CHECK_VXLAN()


Can you also check this test on all datapaths? It’s failing for me once out of ten on userspace;


@@ -1,2 +1,2 @@
-6 packets transmitted, 6 received, 0% packet loss, time 0ms
+7 packets transmitted, 6 received, 14.2857% packet loss, time 0ms

sudo make check-kernel TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  sudo make check-offloads TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  sudo make check-system-userspace TESTSUITEFLAGS="-k 'ping over active-backup bond'"
Mike Pattrick Oct. 19, 2023, 2:35 a.m. UTC | #3
On Fri, Oct 13, 2023 at 4:28 AM Eelco Chaudron <echaudro@redhat.com> wrote:
>
>
>
> On 8 Oct 2023, at 7:26, Mike Pattrick wrote:
>
> > Currently a bond will not always revalidate when an active member
> > changes. This can result in counter-intuitive behaviors like the fact
> > that using ovs-appctl bond/set-active-member will cause the bond to
> > revalidate but changing other_config:bond-primary will not trigger a
> > revalidate in the bond.
> >
> > When revalidation is not set but the active member changes in an
> > unbalanced bond, OVS may send traffic out of previously active member
> > instead of the new active member.
> >
> > This change will always mark unbalanced bonds for revalidation if the
> > active member changes.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
> > Signed-off-by: Mike Pattrick <mkp@redhat.com>
>
> Hi Mike,
>
> Thanks for the patch. Some comments below, and the subject needs to end with a dot.
>
> //Eelco
>
>
> > ---
> > v2: Added a test
> > v3: Made the test more reliable
> > ---
> >  ofproto/bond.c          |  8 +++++--
> >  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 56 insertions(+), 2 deletions(-)
> >
> > diff --git a/ofproto/bond.c b/ofproto/bond.c
> > index cfdf44f85..fb108d30a 100644
> > --- a/ofproto/bond.c
> > +++ b/ofproto/bond.c
> > @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
> >  static bool bond_is_falling_back_to_ab(const struct bond *);
> >  static void bond_add_lb_output_buckets(const struct bond *);
> >  static void bond_del_lb_output_buckets(const struct bond *);
> > +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
> >
> >
> >  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
> > @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
> >
> >  static void
> >  bond_active_member_changed(struct bond *bond)
> > +    OVS_REQ_WRLOCK(rwlock)
> >  {
> >      if (bond->active_member) {
> >          struct eth_addr mac;
> >          netdev_get_etheraddr(bond->active_member->netdev, &mac);
> >          bond->active_member_mac = mac;
> > +        if (!bond_is_balanced(bond)) {
> > +            bond->bond_revalidate = true;
> > +        }
> >      } else {
> >          bond->active_member_mac = eth_addr_zero;
> >      }
> > @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
> >  /* Rebalancing. */
> >
> >  static bool
> > -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
>
> Any reason why the annotation was removed? clang would be smart enough to see that a write lock is fine for a readlock. It seems to compile fine here. Or were you trying to clean this up and move it to the next line :)

The annotation moved up to the prototype above.

>
> > +bond_is_balanced(const struct bond *bond)
> >  {
> >      return bond->rebalance_interval
> >          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
> > @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
> >      }
> >
> >      if (bond->active_member != member) {
> > -        bond->bond_revalidate = true;
> >          bond->active_member = member;
> >          VLOG_INFO("bond %s: active member is now %s",
> >                    bond->name, member->name);
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index 945037ec0..7075c35ea 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> > +AT_SETUP([datapath - ping over active-backup bond])
>
> This does not really represent the test case. What about ‘datapath - bond active-backup failover’
>
> > +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "priority=1 actions=drop"])
> > +AT_CHECK([ovs-ofctl add-flow br1 "priority=1 actions=drop"])
> > +AT_CHECK([ovs-ofctl add-flow br0 "ip,icmp actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br1 "ip,icmp actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br0 "arp actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br1 "arp actions=normal"])
>
> Do we need this ‘complex’ set of rules, will the following not work?
>
> AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
>
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
> > +on_exit 'ip link del link0a'
> > +on_exit 'ip link del link0b'
> > +AT_CHECK([ip link add link0a type veth peer name link1a])
> > +AT_CHECK([ip link add link0b type veth peer name link1b])
> > +
> > +AT_CHECK([ip link set dev link0a up])
> > +AT_CHECK([ip link set dev link1a up])
> > +AT_CHECK([ip link set dev link0b up])
> > +AT_CHECK([ip link set dev link1b up])
> > +
> > +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
> > +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
> > +
> > +dnl Set primary
>
> dnl Set the primary bond member.
>
>
> > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
> > +                    set port bond1 other_config:bond-primary=link1a])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
> > +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Change primary
>
> dnl Change the primary bond member.
>
> > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
> > +                    set port bond1 other_config:bond-primary=link1b])
> > +
> > +sleep 0
>
> This short sleep makes it work, however, it could result in a flaky test. Is there something else we could wait for to be 100% sure we are ready?
>
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.5 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
> > +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> > +])
>
> Should we not add some code to verify traffic goes over the correct link?
>
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> >  AT_SETUP([datapath - ping over vxlan tunnel])
> >  OVS_CHECK_TUNNEL_TSO()
> >  OVS_CHECK_VXLAN()
>
>
> Can you also check this test on all datapaths? It’s failing for me once out of ten on userspace;
>
>
> @@ -1,2 +1,2 @@
> -6 packets transmitted, 6 received, 0% packet loss, time 0ms
> +7 packets transmitted, 6 received, 14.2857% packet loss, time 0ms
>
> sudo make check-kernel TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  sudo make check-offloads TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  sudo make check-system-userspace TESTSUITEFLAGS="-k 'ping over active-backup bond'"

I've changed the test and ran it 100 times with and without the patch
with and without false positives or false negatives, so I'm hoping the
latest revision is a lot more reliable for you.

The other changes make sense so you should see them in the next
version of the patch.


Thanks for the review,
Mike
Eelco Chaudron Oct. 19, 2023, 7:48 a.m. UTC | #4
On 19 Oct 2023, at 4:35, Mike Pattrick wrote:

> On Fri, Oct 13, 2023 at 4:28 AM Eelco Chaudron <echaudro@redhat.com> wrote:
>>
>>
>>
>> On 8 Oct 2023, at 7:26, Mike Pattrick wrote:
>>
>>> Currently a bond will not always revalidate when an active member
>>> changes. This can result in counter-intuitive behaviors like the fact
>>> that using ovs-appctl bond/set-active-member will cause the bond to
>>> revalidate but changing other_config:bond-primary will not trigger a
>>> revalidate in the bond.
>>>
>>> When revalidation is not set but the active member changes in an
>>> unbalanced bond, OVS may send traffic out of previously active member
>>> instead of the new active member.
>>>
>>> This change will always mark unbalanced bonds for revalidation if the
>>> active member changes.
>>>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
>>> Signed-off-by: Mike Pattrick <mkp@redhat.com>
>>
>> Hi Mike,
>>
>> Thanks for the patch. Some comments below, and the subject needs to end with a dot.
>>
>> //Eelco
>>
>>
>>> ---
>>> v2: Added a test
>>> v3: Made the test more reliable
>>> ---
>>>  ofproto/bond.c          |  8 +++++--
>>>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 56 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>> index cfdf44f85..fb108d30a 100644
>>> --- a/ofproto/bond.c
>>> +++ b/ofproto/bond.c
>>> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
>>>  static bool bond_is_falling_back_to_ab(const struct bond *);
>>>  static void bond_add_lb_output_buckets(const struct bond *);
>>>  static void bond_del_lb_output_buckets(const struct bond *);
>>> +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
>>>
>>>
>>>  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
>>> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
>>>
>>>  static void
>>>  bond_active_member_changed(struct bond *bond)
>>> +    OVS_REQ_WRLOCK(rwlock)
>>>  {
>>>      if (bond->active_member) {
>>>          struct eth_addr mac;
>>>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>>>          bond->active_member_mac = mac;
>>> +        if (!bond_is_balanced(bond)) {
>>> +            bond->bond_revalidate = true;
>>> +        }
>>>      } else {
>>>          bond->active_member_mac = eth_addr_zero;
>>>      }
>>> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
>>>  /* Rebalancing. */
>>>
>>>  static bool
>>> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
>>
>> Any reason why the annotation was removed? clang would be smart enough to see that a write lock is fine for a readlock. It seems to compile fine here. Or were you trying to clean this up and move it to the next line :)
>
> The annotation moved up to the prototype above.

Yes, but this function in general needs the read lock, so it should be annotated. What if someone starts using this function without holding the readlock? We would no longer get an error.

>>
>>> +bond_is_balanced(const struct bond *bond)
>>>  {
>>>      return bond->rebalance_interval
>>>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
>>> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
>>>      }
>>>
>>>      if (bond->active_member != member) {
>>> -        bond->bond_revalidate = true;
>>>          bond->active_member = member;
>>>          VLOG_INFO("bond %s: active member is now %s",
>>>                    bond->name, member->name);
>>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>>> index 945037ec0..7075c35ea 100644
>>> --- a/tests/system-traffic.at
>>> +++ b/tests/system-traffic.at
>>> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
>>>  OVS_TRAFFIC_VSWITCHD_STOP
>>>  AT_CLEANUP
>>>
>>> +AT_SETUP([datapath - ping over active-backup bond])
>>
>> This does not really represent the test case. What about ‘datapath - bond active-backup failover’
>>
>>> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
>>> +
>>> +AT_CHECK([ovs-ofctl add-flow br0 "priority=1 actions=drop"])
>>> +AT_CHECK([ovs-ofctl add-flow br1 "priority=1 actions=drop"])
>>> +AT_CHECK([ovs-ofctl add-flow br0 "ip,icmp actions=normal"])
>>> +AT_CHECK([ovs-ofctl add-flow br1 "ip,icmp actions=normal"])
>>> +AT_CHECK([ovs-ofctl add-flow br0 "arp actions=normal"])
>>> +AT_CHECK([ovs-ofctl add-flow br1 "arp actions=normal"])
>>
>> Do we need this ‘complex’ set of rules, will the following not work?
>>
>> AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>> AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
>>
>>> +ADD_NAMESPACES(at_ns0, at_ns1)
>>> +
>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>>> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
>>> +on_exit 'ip link del link0a'
>>> +on_exit 'ip link del link0b'
>>> +AT_CHECK([ip link add link0a type veth peer name link1a])
>>> +AT_CHECK([ip link add link0b type veth peer name link1b])
>>> +
>>> +AT_CHECK([ip link set dev link0a up])
>>> +AT_CHECK([ip link set dev link1a up])
>>> +AT_CHECK([ip link set dev link0b up])
>>> +AT_CHECK([ip link set dev link1b up])
>>> +
>>> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
>>> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
>>> +
>>> +dnl Set primary
>>
>> dnl Set the primary bond member.
>>
>>
>>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
>>> +                    set port bond1 other_config:bond-primary=link1a])
>>> +
>>> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
>>> +
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
>>> +6 packets transmitted, 6 received, 0% packet loss, time 0ms
>>> +])
>>> +
>>> +dnl Change primary
>>
>> dnl Change the primary bond member.
>>
>>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
>>> +                    set port bond1 other_config:bond-primary=link1b])
>>> +
>>> +sleep 0
>>
>> This short sleep makes it work, however, it could result in a flaky test. Is there something else we could wait for to be 100% sure we are ready?
>>
>>> +
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.5 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
>>> +6 packets transmitted, 6 received, 0% packet loss, time 0ms
>>> +])
>>
>> Should we not add some code to verify traffic goes over the correct link?
>>
>>> +
>>> +OVS_TRAFFIC_VSWITCHD_STOP
>>> +AT_CLEANUP
>>> +
>>>  AT_SETUP([datapath - ping over vxlan tunnel])
>>>  OVS_CHECK_TUNNEL_TSO()
>>>  OVS_CHECK_VXLAN()
>>
>>
>> Can you also check this test on all datapaths? It’s failing for me once out of ten on userspace;
>>
>>
>> @@ -1,2 +1,2 @@
>> -6 packets transmitted, 6 received, 0% packet loss, time 0ms
>> +7 packets transmitted, 6 received, 14.2857% packet loss, time 0ms
>>
>> sudo make check-kernel TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  sudo make check-offloads TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  sudo make check-system-userspace TESTSUITEFLAGS="-k 'ping over active-backup bond'"
>
> I've changed the test and ran it 100 times with and without the patch
> with and without false positives or false negatives, so I'm hoping the
> latest revision is a lot more reliable for you.
>
> The other changes make sense so you should see them in the next
> version of the patch.

Thanks, will take a look at your new revision later.

//Eelco
diff mbox series

Patch

diff --git a/ofproto/bond.c b/ofproto/bond.c
index cfdf44f85..fb108d30a 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -193,6 +193,7 @@  static void bond_update_post_recirc_rules__(struct bond *, bool force)
 static bool bond_is_falling_back_to_ab(const struct bond *);
 static void bond_add_lb_output_buckets(const struct bond *);
 static void bond_del_lb_output_buckets(const struct bond *);
+static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
 
 
 /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
@@ -552,11 +553,15 @@  bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
 
 static void
 bond_active_member_changed(struct bond *bond)
+    OVS_REQ_WRLOCK(rwlock)
 {
     if (bond->active_member) {
         struct eth_addr mac;
         netdev_get_etheraddr(bond->active_member->netdev, &mac);
         bond->active_member_mac = mac;
+        if (!bond_is_balanced(bond)) {
+            bond->bond_revalidate = true;
+        }
     } else {
         bond->active_member_mac = eth_addr_zero;
     }
@@ -1124,7 +1129,7 @@  bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
 /* Rebalancing. */
 
 static bool
-bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
+bond_is_balanced(const struct bond *bond)
 {
     return bond->rebalance_interval
         && (bond->balance == BM_SLB || bond->balance == BM_TCP)
@@ -1728,7 +1733,6 @@  bond_unixctl_set_active_member(struct unixctl_conn *conn,
     }
 
     if (bond->active_member != member) {
-        bond->bond_revalidate = true;
         bond->active_member = member;
         VLOG_INFO("bond %s: active member is now %s",
                   bond->name, member->name);
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 945037ec0..7075c35ea 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -291,6 +291,56 @@  NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - ping over active-backup bond])
+OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
+
+AT_CHECK([ovs-ofctl add-flow br0 "priority=1 actions=drop"])
+AT_CHECK([ovs-ofctl add-flow br1 "priority=1 actions=drop"])
+AT_CHECK([ovs-ofctl add-flow br0 "ip,icmp actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br1 "ip,icmp actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br0 "arp actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br1 "arp actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
+on_exit 'ip link del link0a'
+on_exit 'ip link del link0b'
+AT_CHECK([ip link add link0a type veth peer name link1a])
+AT_CHECK([ip link add link0b type veth peer name link1b])
+
+AT_CHECK([ip link set dev link0a up])
+AT_CHECK([ip link set dev link1a up])
+AT_CHECK([ip link set dev link0b up])
+AT_CHECK([ip link set dev link1b up])
+
+AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
+AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
+
+dnl Set primary
+AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
+                    set port bond1 other_config:bond-primary=link1a])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
+6 packets transmitted, 6 received, 0% packet loss, time 0ms
+])
+
+dnl Change primary
+AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
+                    set port bond1 other_config:bond-primary=link1b])
+
+sleep 0
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.5 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl
+6 packets transmitted, 6 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping over vxlan tunnel])
 OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_VXLAN()