[ovs-dev] ovn: Document limitation in the L3HA plan

Message ID	20170424082118.37546-1-majopela@redhat.com
State	Accepted
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 66CFDC04B94B From: majopela@redhat.com To: dev@openvswitch.org Date: Mon, 24 Apr 2017 10:21:18 +0200 Message-Id: <20170424082118.37546-1-majopela@redhat.com> Cc: Miguel Angel Ajo <majopela@redhat.com> Subject: [ovs-dev] [PATCH] ovn: Document limitation in the L3HA plan Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org

Message ID

20170424082118.37546-1-majopela@redhat.com

State

Accepted

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 66CFDC04B94B
From: majopela@redhat.com
To: dev@openvswitch.org
Date: Mon, 24 Apr 2017 10:21:18 +0200
Message-Id: <20170424082118.37546-1-majopela@redhat.com>
Cc: Miguel Angel Ajo <majopela@redhat.com>
Subject: [ovs-dev] [PATCH] ovn: Document limitation in the L3HA plan
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

Commit Message

Miguel Angel Ajo April 24, 2017, 8:21 a.m. UTC

From: Miguel Angel Ajo <majopela@redhat.com>

The intergateway monitoring covers host failure well, but
it doesn't cover path failure which is a more complicated
problem.

By this change I don't mean we should implement something
to cover path failure right away, but that we should
keep the limitation in mind.

Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
---
 Documentation/topics/high-availability.rst | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Miguel Angel Ajo April 24, 2017, 8:24 a.m. UTC | #1

Anil Venkata and I were talking about this last week and we
realised we had this limitation. It's not uncommon to other
mechanisms like VRRP or CARP, but we thought that it was
good to make sure everyone was on the same page, and
that having gateways across multiple L2 domains with routing
in the middle could be problematic. At a minimum it would require
less strict tuning of the BFD pinging intervals.



On Mon, Apr 24, 2017 at 10:21 AM, <majopela@redhat.com> wrote:

> From: Miguel Angel Ajo <majopela@redhat.com>
>
> The intergateway monitoring covers host failure well, but
> it doesn't cover path failure which is a more complicated
> problem.
>
> By this change I don't mean we should implement something
> to cover path failure right away, but that we should
> keep the limitation in mind.
>
> Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>
> ---
>  Documentation/topics/high-availability.rst | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/Documentation/topics/high-availability.rst
> b/Documentation/topics/high-availability.rst
> index 5b21b6469..7ee9357c0 100644
> --- a/Documentation/topics/high-availability.rst
> +++ b/Documentation/topics/high-availability.rst
> @@ -288,6 +288,14 @@ which are alive, and therefore whether or not that
> gateway happens to be the
>  leader.  If leading, the gateway forwards traffic normally, otherwise it
> drops
>  all traffic.
>
> +We should note that this method works well under the assumption that there
> +are no inter-gateway connectivity failures, in such case this method
> would fail
> +to elect a single master. The simplest example is two gateways which stop
> seeing
> +each other but can still reach the hypervisors. Protocols like VRRP or
> CARP
> +have the same issue. A mitigation for this type of failure mode could be
> +achieved by having all network elements (hypervisors and gateways)
> periodically
> +share their link status to other endpoints.
> +
>  Gateway Leadership Resignation
>  ++++++++++++++++++++++++++++++
>
> --
> 2.11.0 (Apple Git-81)
>
>

Ben Pfaff May 1, 2017, 9:49 p.m. UTC | #2

On Mon, Apr 24, 2017 at 10:21:18AM +0200, majopela@redhat.com wrote:
> From: Miguel Angel Ajo <majopela@redhat.com>
> 
> The intergateway monitoring covers host failure well, but
> it doesn't cover path failure which is a more complicated
> problem.
> 
> By this change I don't mean we should implement something
> to cover path failure right away, but that we should
> keep the limitation in mind.
> 
> Signed-off-by: Miguel Angel Ajo <majopela@redhat.com>

Thank you for thinking about this!  I applied this to master.

diff --git a/Documentation/topics/high-availability.rst b/Documentation/topics/high-availability.rst
index 5b21b6469..7ee9357c0 100644
--- a/Documentation/topics/high-availability.rst
+++ b/Documentation/topics/high-availability.rst
@@ -288,6 +288,14 @@  which are alive, and therefore whether or not that gateway happens to be the
 leader.  If leading, the gateway forwards traffic normally, otherwise it drops
 all traffic.
 
+We should note that this method works well under the assumption that there
+are no inter-gateway connectivity failures, in such case this method would fail
+to elect a single master. The simplest example is two gateways which stop seeing
+each other but can still reach the hypervisors. Protocols like VRRP or CARP
+have the same issue. A mitigation for this type of failure mode could be
+achieved by having all network elements (hypervisors and gateways) periodically
+share their link status to other endpoints.
+
 Gateway Leadership Resignation
 ++++++++++++++++++++++++++++++

[ovs-dev] ovn: Document limitation in the L3HA plan

Commit Message

Comments

Patch