Message ID | 20170424082118.37546-1-majopela@redhat.com |
---|---|
State | Accepted |
Headers | show |
Anil Venkata and I were talking about this last week and we realised we had this limitation. It's not uncommon to other mechanisms like VRRP or CARP, but we thought that it was good to make sure everyone was on the same page, and that having gateways across multiple L2 domains with routing in the middle could be problematic. At a minimum it would require less strict tuning of the BFD pinging intervals. On Mon, Apr 24, 2017 at 10:21 AM, <majopela@redhat.com> wrote: > From: Miguel Angel Ajo <majopela@redhat.com> > > The intergateway monitoring covers host failure well, but > it doesn't cover path failure which is a more complicated > problem. > > By this change I don't mean we should implement something > to cover path failure right away, but that we should > keep the limitation in mind. > > Signed-off-by: Miguel Angel Ajo <majopela@redhat.com> > --- > Documentation/topics/high-availability.rst | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/Documentation/topics/high-availability.rst > b/Documentation/topics/high-availability.rst > index 5b21b6469..7ee9357c0 100644 > --- a/Documentation/topics/high-availability.rst > +++ b/Documentation/topics/high-availability.rst > @@ -288,6 +288,14 @@ which are alive, and therefore whether or not that > gateway happens to be the > leader. If leading, the gateway forwards traffic normally, otherwise it > drops > all traffic. > > +We should note that this method works well under the assumption that there > +are no inter-gateway connectivity failures, in such case this method > would fail > +to elect a single master. The simplest example is two gateways which stop > seeing > +each other but can still reach the hypervisors. Protocols like VRRP or > CARP > +have the same issue. A mitigation for this type of failure mode could be > +achieved by having all network elements (hypervisors and gateways) > periodically > +share their link status to other endpoints. > + > Gateway Leadership Resignation > ++++++++++++++++++++++++++++++ > > -- > 2.11.0 (Apple Git-81) > >
On Mon, Apr 24, 2017 at 10:21:18AM +0200, majopela@redhat.com wrote: > From: Miguel Angel Ajo <majopela@redhat.com> > > The intergateway monitoring covers host failure well, but > it doesn't cover path failure which is a more complicated > problem. > > By this change I don't mean we should implement something > to cover path failure right away, but that we should > keep the limitation in mind. > > Signed-off-by: Miguel Angel Ajo <majopela@redhat.com> Thank you for thinking about this! I applied this to master.
diff --git a/Documentation/topics/high-availability.rst b/Documentation/topics/high-availability.rst index 5b21b6469..7ee9357c0 100644 --- a/Documentation/topics/high-availability.rst +++ b/Documentation/topics/high-availability.rst @@ -288,6 +288,14 @@ which are alive, and therefore whether or not that gateway happens to be the leader. If leading, the gateway forwards traffic normally, otherwise it drops all traffic. +We should note that this method works well under the assumption that there +are no inter-gateway connectivity failures, in such case this method would fail +to elect a single master. The simplest example is two gateways which stop seeing +each other but can still reach the hypervisors. Protocols like VRRP or CARP +have the same issue. A mitigation for this type of failure mode could be +achieved by having all network elements (hypervisors and gateways) periodically +share their link status to other endpoints. + Gateway Leadership Resignation ++++++++++++++++++++++++++++++