Message ID | 20200911091013.694297-1-numans@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,v2] northd: Add lflows to send all pkts to conntrack if LB is configured on a lswitch. | expand |
On 9/11/20 11:10 AM, numans@ovn.org wrote: > From: Numan Siddique <numans@ovn.org> > > Prior to this patch, if a load balancer is configured on a logical switch but with no > ACLs with allow-related configured, then in the ingress pipeline only the packets > with ip.dst = VIP will be sent to conntrack using the zone id of the source logical port. > > If the backend of the load balancer, sends an invalid packet (for example invalid tcp > sequence number), then such packets will be delivered to the source logical port VIF > without unDNATting. This causes the source to reset the connection. > > This patch fixes this issue by sending all the packets to conntrack if a load balancer > is configured on the logical switch. Because of this, any invalid (ct.inv) packets will > be dropped in the ingress pipeline itself. > > Unfortunately this impacts the performance as now there will be extra recirculations > because of ct and ct_commit (for new connections) actions. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359 > Reported-by: Tim Rozet (trozet@redhat.com) > Signed-off-by: Numan Siddique <numans@ovn.org> > --- > v1 -> v2 > ---- > * Addressed the review comments from Dumitru and Mark. > > northd/ovn-northd.8.xml | 41 +++++++++++++++++----- > northd/ovn-northd.c | 77 +++++++++++++++++++++++++---------------- > 2 files changed, 80 insertions(+), 38 deletions(-) > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > index a275442a82..a9826e8e9e 100644 > --- a/northd/ovn-northd.8.xml > +++ b/northd/ovn-northd.8.xml > @@ -340,15 +340,12 @@ > it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic > to the next table. If load balancing rules with virtual IP addresses > (and ports) are configured in <code>OVN_Northbound</code> database for a > - logical switch datapath, a priority-100 flow is added for each configured > - virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the match is > - <code>ip && ip4.dst == <var>VIP</var></code>. For IPv6 > - <var>VIPs</var>, the match is <code>ip && > - ip6.dst == <var>VIP</var></code>. The flow sets an action > + logical switch datapath, a priority-100 flow is added with the match > + <code>ip</code> to match on IP packets and sets the action > <code>reg0[0] = 1; next;</code> to act as a hint for table > <code>Pre-stateful</code> to send IP packets to the connection tracker > - for packet de-fragmentation before eventually advancing to ingress table > - <code>LB</code>. > + for packet de-fragmentation before eventually advancing to ingress > + table <code>LB</code>. > If controller_event has been enabled and load balancing rules with > empty backends have been added in <code>OVN_Northbound</code>, a 130 flow > is added to trigger ovn-controller events whenever the chassis receives a > @@ -356,6 +353,32 @@ > previously created, it will be associated to the empty_lb logical flow > </p> > > + <p> > + Prior to <code>OVN 20.09</code> we were setting the > + <code>reg0[0] = 1</code> only if the IP destination matches the load > + balancer VIP. However this had few issues cases where a logical switch > + doesn't have any ACLs with <code>allow-related</code> action. > + To understand the issue lets a take a TCP load balancer - > + <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 with > + IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the > + packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone > + id and the packet is load balanced to the backend - 10.0.0.3. For the > + reply packet from the backend lport, it is not sent to the conntrack of > + backend lport's zone id. This is fine as long as the packet is valid. > + Suppose the backend lport sends an invalid TCP packet (like incorrect > + sequence number), the packet gets delivered to the lport 'p1' without > + unDNATing the packet to the VIP - 10.0.0.10. And this causes the > + connection to be reset by the lport p1's VIF. > + </p> > + > + <p> > + We can't fix this issue by adding a logical flow to drop ct.inv packets > + in the egress pipeline since it will drop all other connections not > + destined to the load balancers. To fix this issue, we send all the > + packets to the conntrack in the ingress pipeline if a load balancer is > + configured. We can now add a lflow to drop ct.inv packets. > + </p> > + > <p> > This table also has a priority-110 flow with the match > <code>eth.dst == <var>E</var></code> for all logical switch > @@ -430,8 +453,8 @@ > <p> > This table also contains a priority 0 flow with action > <code>next;</code>, so that ACLs allow packets by default. If the > - logical datapath has a statetful ACL, the following flows will > - also be added: > + logical datapath has a statetful ACL or a load balancer with VIP > + configured, the following flows will also be added: Tiny nit: s/statetful/stateful With this addressed: Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks, Dumitru
Bleep bloop. Greetings Numan Siddique, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. git-am: error: Failed to merge in the changes. hint: Use 'git am --show-current-patch' to see the failed patch Patch failed at 0001 northd: Add lflows to send all pkts to conntrack if LB is configured on a lswitch. When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Please check this out. If you feel there has been an error, please email aconole@redhat.com Thanks, 0-day Robot
Acked-by: Mark Michelson <mmichels@redhat.com> Thanks Numan! On 9/11/20 5:10 AM, numans@ovn.org wrote: > From: Numan Siddique <numans@ovn.org> > > Prior to this patch, if a load balancer is configured on a logical switch but with no > ACLs with allow-related configured, then in the ingress pipeline only the packets > with ip.dst = VIP will be sent to conntrack using the zone id of the source logical port. > > If the backend of the load balancer, sends an invalid packet (for example invalid tcp > sequence number), then such packets will be delivered to the source logical port VIF > without unDNATting. This causes the source to reset the connection. > > This patch fixes this issue by sending all the packets to conntrack if a load balancer > is configured on the logical switch. Because of this, any invalid (ct.inv) packets will > be dropped in the ingress pipeline itself. > > Unfortunately this impacts the performance as now there will be extra recirculations > because of ct and ct_commit (for new connections) actions. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359 > Reported-by: Tim Rozet (trozet@redhat.com) > Signed-off-by: Numan Siddique <numans@ovn.org> > --- > v1 -> v2 > ---- > * Addressed the review comments from Dumitru and Mark. > > northd/ovn-northd.8.xml | 41 +++++++++++++++++----- > northd/ovn-northd.c | 77 +++++++++++++++++++++++++---------------- > 2 files changed, 80 insertions(+), 38 deletions(-) > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > index a275442a82..a9826e8e9e 100644 > --- a/northd/ovn-northd.8.xml > +++ b/northd/ovn-northd.8.xml > @@ -340,15 +340,12 @@ > it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic > to the next table. If load balancing rules with virtual IP addresses > (and ports) are configured in <code>OVN_Northbound</code> database for a > - logical switch datapath, a priority-100 flow is added for each configured > - virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the match is > - <code>ip && ip4.dst == <var>VIP</var></code>. For IPv6 > - <var>VIPs</var>, the match is <code>ip && > - ip6.dst == <var>VIP</var></code>. The flow sets an action > + logical switch datapath, a priority-100 flow is added with the match > + <code>ip</code> to match on IP packets and sets the action > <code>reg0[0] = 1; next;</code> to act as a hint for table > <code>Pre-stateful</code> to send IP packets to the connection tracker > - for packet de-fragmentation before eventually advancing to ingress table > - <code>LB</code>. > + for packet de-fragmentation before eventually advancing to ingress > + table <code>LB</code>. > If controller_event has been enabled and load balancing rules with > empty backends have been added in <code>OVN_Northbound</code>, a 130 flow > is added to trigger ovn-controller events whenever the chassis receives a > @@ -356,6 +353,32 @@ > previously created, it will be associated to the empty_lb logical flow > </p> > > + <p> > + Prior to <code>OVN 20.09</code> we were setting the > + <code>reg0[0] = 1</code> only if the IP destination matches the load > + balancer VIP. However this had few issues cases where a logical switch > + doesn't have any ACLs with <code>allow-related</code> action. > + To understand the issue lets a take a TCP load balancer - > + <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 with > + IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the > + packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone > + id and the packet is load balanced to the backend - 10.0.0.3. For the > + reply packet from the backend lport, it is not sent to the conntrack of > + backend lport's zone id. This is fine as long as the packet is valid. > + Suppose the backend lport sends an invalid TCP packet (like incorrect > + sequence number), the packet gets delivered to the lport 'p1' without > + unDNATing the packet to the VIP - 10.0.0.10. And this causes the > + connection to be reset by the lport p1's VIF. > + </p> > + > + <p> > + We can't fix this issue by adding a logical flow to drop ct.inv packets > + in the egress pipeline since it will drop all other connections not > + destined to the load balancers. To fix this issue, we send all the > + packets to the conntrack in the ingress pipeline if a load balancer is > + configured. We can now add a lflow to drop ct.inv packets. > + </p> > + > <p> > This table also has a priority-110 flow with the match > <code>eth.dst == <var>E</var></code> for all logical switch > @@ -430,8 +453,8 @@ > <p> > This table also contains a priority 0 flow with action > <code>next;</code>, so that ACLs allow packets by default. If the > - logical datapath has a statetful ACL, the following flows will > - also be added: > + logical datapath has a statetful ACL or a load balancer with VIP > + configured, the following flows will also be added: > </p> > > <ul> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > index b95d6cd8a1..3967bae569 100644 > --- a/northd/ovn-northd.c > +++ b/northd/ovn-northd.c > @@ -5034,6 +5034,19 @@ build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows, > free(action); > } > > +static bool > +has_lb_vip(struct ovn_datapath *od) > +{ > + for (int i = 0; i < od->nbs->n_load_balancer; i++) { > + struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; > + if (!smap_is_empty(&nb_lb->vips)) { > + return true; > + } > + } > + > + return false; > +} > + > static void > build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, > struct shash *meter_groups, struct hmap *lbs) > @@ -5072,8 +5085,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, > 110, lflows); > } > > - struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4); > - struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6); > bool vip_configured = false; > for (int i = 0; i < od->nbs->n_load_balancer; i++) { > struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; > @@ -5083,12 +5094,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, > > for (size_t j = 0; j < lb->n_vips; j++) { > struct lb_vip *lb_vip = &lb->vips[j]; > - if (lb_vip->addr_family == AF_INET) { > - sset_add(&all_ips_v4, lb_vip->vip); > - } else { > - sset_add(&all_ips_v6, lb_vip->vip); > - } > - > build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb, > S_SWITCH_IN_PRE_LB, meter_groups); > > @@ -5102,26 +5107,37 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, > } > > /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send > - * packet to conntrack for defragmentation. */ > - const char *ip_address; > - SSET_FOR_EACH (ip_address, &all_ips_v4) { > - char *match = xasprintf("ip && ip4.dst == %s", ip_address); > - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, > - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > - free(match); > - } > - > - SSET_FOR_EACH (ip_address, &all_ips_v6) { > - char *match = xasprintf("ip && ip6.dst == %s", ip_address); > - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, > - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > - free(match); > - } > - > - sset_destroy(&all_ips_v4); > - sset_destroy(&all_ips_v6); > - > + * packet to conntrack for defragmentation. > + * > + * Send all the packets to conntrack in the ingress pipeline if the > + * logical switch has a load balancer with VIP configured. Earlier > + * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline > + * if the IP destination matches the VIP. But this causes few issues when > + * a logical switch has no ACLs configured with allow-related. > + * To understand the issue, lets a take a TCP load balancer - > + * 10.0.0.10:80=10.0.0.3:80. > + * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with > + * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1' > + * is sent to the p1's conntrack zone id and the packet is load balanced > + * to the backend - 10.0.0.3. For the reply packet from the backend lport, > + * it is not sent to the conntrack of backend lport's zone id. This is fine > + * as long as the packet is valid. Suppose the backend lport sends an > + * invalid TCP packet (like incorrect sequence number), the packet gets > + * delivered to the lport 'p1' without unDNATing the packet to the > + * VIP - 10.0.0.10. And this causes the connection to be reset by the > + * lport p1's VIF. > + * > + * We can't fix this issue by adding a logical flow to drop ct.inv packets > + * in the egress pipeline since it will drop all other connections not > + * destined to the load balancers. > + * > + * To fix this issue, we send all the packets to the conntrack in the > + * ingress pipeline if a load balancer is configured. We can now > + * add a lflow to drop ct.inv packets. > + */ > if (vip_configured) { > + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, > + 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, > 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > } > @@ -5482,7 +5498,7 @@ static void > build_acls(struct ovn_datapath *od, struct hmap *lflows, > struct hmap *port_groups) > { > - bool has_stateful = has_stateful_acl(od); > + bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od)); > > /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by > * default. A related rule at priority 1 is added below if there > @@ -5746,12 +5762,15 @@ build_lb(struct ovn_datapath *od, struct hmap *lflows) > ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;"); > ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;"); > > - if (od->nbs->load_balancer) { > + if (od->nbs->n_load_balancer) { > for (size_t i = 0; i < od->n_router_ports; i++) { > skip_port_from_conntrack(od, od->router_ports[i], > S_SWITCH_IN_LB, S_SWITCH_OUT_LB, > UINT16_MAX, lflows); > } > + } > + > + if (has_lb_vip(od)) { > /* Ingress and Egress LB Table (Priority 65534). > * > * Send established traffic through conntrack for just NAT. */ >
On Fri, Sep 11, 2020 at 8:14 PM Mark Michelson <mmichels@redhat.com> wrote: > Acked-by: Mark Michelson <mmichels@redhat.com> > Thanks Dumitru and Mark. I applied this patch to master. Numan > > Thanks Numan! > > On 9/11/20 5:10 AM, numans@ovn.org wrote: > > From: Numan Siddique <numans@ovn.org> > > > > Prior to this patch, if a load balancer is configured on a logical > switch but with no > > ACLs with allow-related configured, then in the ingress pipeline only > the packets > > with ip.dst = VIP will be sent to conntrack using the zone id of the > source logical port. > > > > If the backend of the load balancer, sends an invalid packet (for > example invalid tcp > > sequence number), then such packets will be delivered to the source > logical port VIF > > without unDNATting. This causes the source to reset the connection. > > > > This patch fixes this issue by sending all the packets to conntrack if a > load balancer > > is configured on the logical switch. Because of this, any invalid > (ct.inv) packets will > > be dropped in the ingress pipeline itself. > > > > Unfortunately this impacts the performance as now there will be extra > recirculations > > because of ct and ct_commit (for new connections) actions. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359 > > Reported-by: Tim Rozet (trozet@redhat.com) > > Signed-off-by: Numan Siddique <numans@ovn.org> > > --- > > v1 -> v2 > > ---- > > * Addressed the review comments from Dumitru and Mark. > > > > northd/ovn-northd.8.xml | 41 +++++++++++++++++----- > > northd/ovn-northd.c | 77 +++++++++++++++++++++++++---------------- > > 2 files changed, 80 insertions(+), 38 deletions(-) > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > index a275442a82..a9826e8e9e 100644 > > --- a/northd/ovn-northd.8.xml > > +++ b/northd/ovn-northd.8.xml > > @@ -340,15 +340,12 @@ > > it contains a priority-110 flow to move IPv6 Neighbor Discovery > traffic > > to the next table. If load balancing rules with virtual IP > addresses > > (and ports) are configured in <code>OVN_Northbound</code> > database for a > > - logical switch datapath, a priority-100 flow is added for each > configured > > - virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the > match is > > - <code>ip && ip4.dst == <var>VIP</var></code>. For IPv6 > > - <var>VIPs</var>, the match is <code>ip && > > - ip6.dst == <var>VIP</var></code>. The flow sets an action > > + logical switch datapath, a priority-100 flow is added with the > match > > + <code>ip</code> to match on IP packets and sets the action > > <code>reg0[0] = 1; next;</code> to act as a hint for table > > <code>Pre-stateful</code> to send IP packets to the connection > tracker > > - for packet de-fragmentation before eventually advancing to > ingress table > > - <code>LB</code>. > > + for packet de-fragmentation before eventually advancing to ingress > > + table <code>LB</code>. > > If controller_event has been enabled and load balancing rules > with > > empty backends have been added in <code>OVN_Northbound</code>, a > 130 flow > > is added to trigger ovn-controller events whenever the chassis > receives a > > @@ -356,6 +353,32 @@ > > previously created, it will be associated to the empty_lb > logical flow > > </p> > > > > + <p> > > + Prior to <code>OVN 20.09</code> we were setting the > > + <code>reg0[0] = 1</code> only if the IP destination matches the > load > > + balancer VIP. However this had few issues cases where a logical > switch > > + doesn't have any ACLs with <code>allow-related</code> action. > > + To understand the issue lets a take a TCP load balancer - > > + <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 > with > > + IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, > then the > > + packet in the ingress pipeline of 'p1' is sent to the p1's > conntrack zone > > + id and the packet is load balanced to the backend - 10.0.0.3. For > the > > + reply packet from the backend lport, it is not sent to the > conntrack of > > + backend lport's zone id. This is fine as long as the packet is > valid. > > + Suppose the backend lport sends an invalid TCP packet (like > incorrect > > + sequence number), the packet gets delivered to the lport 'p1' > without > > + unDNATing the packet to the VIP - 10.0.0.10. And this causes the > > + connection to be reset by the lport p1's VIF. > > + </p> > > + > > + <p> > > + We can't fix this issue by adding a logical flow to drop ct.inv > packets > > + in the egress pipeline since it will drop all other connections > not > > + destined to the load balancers. To fix this issue, we send all the > > + packets to the conntrack in the ingress pipeline if a load > balancer is > > + configured. We can now add a lflow to drop ct.inv packets. > > + </p> > > + > > <p> > > This table also has a priority-110 flow with the match > > <code>eth.dst == <var>E</var></code> for all logical switch > > @@ -430,8 +453,8 @@ > > <p> > > This table also contains a priority 0 flow with action > > <code>next;</code>, so that ACLs allow packets by default. If > the > > - logical datapath has a statetful ACL, the following flows will > > - also be added: > > + logical datapath has a statetful ACL or a load balancer with VIP > > + configured, the following flows will also be added: > > </p> > > > > <ul> > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > > index b95d6cd8a1..3967bae569 100644 > > --- a/northd/ovn-northd.c > > +++ b/northd/ovn-northd.c > > @@ -5034,6 +5034,19 @@ build_empty_lb_event_flow(struct ovn_datapath > *od, struct hmap *lflows, > > free(action); > > } > > > > +static bool > > +has_lb_vip(struct ovn_datapath *od) > > +{ > > + for (int i = 0; i < od->nbs->n_load_balancer; i++) { > > + struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; > > + if (!smap_is_empty(&nb_lb->vips)) { > > + return true; > > + } > > + } > > + > > + return false; > > +} > > + > > static void > > build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, > > struct shash *meter_groups, struct hmap *lbs) > > @@ -5072,8 +5085,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap > *lflows, > > 110, lflows); > > } > > > > - struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4); > > - struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6); > > bool vip_configured = false; > > for (int i = 0; i < od->nbs->n_load_balancer; i++) { > > struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; > > @@ -5083,12 +5094,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap > *lflows, > > > > for (size_t j = 0; j < lb->n_vips; j++) { > > struct lb_vip *lb_vip = &lb->vips[j]; > > - if (lb_vip->addr_family == AF_INET) { > > - sset_add(&all_ips_v4, lb_vip->vip); > > - } else { > > - sset_add(&all_ips_v6, lb_vip->vip); > > - } > > - > > build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb, > > S_SWITCH_IN_PRE_LB, > meter_groups); > > > > @@ -5102,26 +5107,37 @@ build_pre_lb(struct ovn_datapath *od, struct > hmap *lflows, > > } > > > > /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table > send > > - * packet to conntrack for defragmentation. */ > > - const char *ip_address; > > - SSET_FOR_EACH (ip_address, &all_ips_v4) { > > - char *match = xasprintf("ip && ip4.dst == %s", ip_address); > > - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, > > - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > > - free(match); > > - } > > - > > - SSET_FOR_EACH (ip_address, &all_ips_v6) { > > - char *match = xasprintf("ip && ip6.dst == %s", ip_address); > > - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, > > - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > > - free(match); > > - } > > - > > - sset_destroy(&all_ips_v4); > > - sset_destroy(&all_ips_v6); > > - > > + * packet to conntrack for defragmentation. > > + * > > + * Send all the packets to conntrack in the ingress pipeline if the > > + * logical switch has a load balancer with VIP configured. Earlier > > + * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress > pipeline > > + * if the IP destination matches the VIP. But this causes few > issues when > > + * a logical switch has no ACLs configured with allow-related. > > + * To understand the issue, lets a take a TCP load balancer - > > + * 10.0.0.10:80=10.0.0.3:80. > > + * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection > with > > + * the VIP - 10.0.0.10, then the packet in the ingress pipeline of > 'p1' > > + * is sent to the p1's conntrack zone id and the packet is load > balanced > > + * to the backend - 10.0.0.3. For the reply packet from the backend > lport, > > + * it is not sent to the conntrack of backend lport's zone id. This > is fine > > + * as long as the packet is valid. Suppose the backend lport sends > an > > + * invalid TCP packet (like incorrect sequence number), the packet > gets > > + * delivered to the lport 'p1' without unDNATing the packet to the > > + * VIP - 10.0.0.10. And this causes the connection to be reset by > the > > + * lport p1's VIF. > > + * > > + * We can't fix this issue by adding a logical flow to drop ct.inv > packets > > + * in the egress pipeline since it will drop all other connections > not > > + * destined to the load balancers. > > + * > > + * To fix this issue, we send all the packets to the conntrack in > the > > + * ingress pipeline if a load balancer is configured. We can now > > + * add a lflow to drop ct.inv packets. > > + */ > > if (vip_configured) { > > + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, > > + 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > > ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, > > 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); > > } > > @@ -5482,7 +5498,7 @@ static void > > build_acls(struct ovn_datapath *od, struct hmap *lflows, > > struct hmap *port_groups) > > { > > - bool has_stateful = has_stateful_acl(od); > > + bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od)); > > > > /* Ingress and Egress ACL Table (Priority 0): Packets are allowed > by > > * default. A related rule at priority 1 is added below if there > > @@ -5746,12 +5762,15 @@ build_lb(struct ovn_datapath *od, struct hmap > *lflows) > > ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;"); > > ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;"); > > > > - if (od->nbs->load_balancer) { > > + if (od->nbs->n_load_balancer) { > > for (size_t i = 0; i < od->n_router_ports; i++) { > > skip_port_from_conntrack(od, od->router_ports[i], > > S_SWITCH_IN_LB, S_SWITCH_OUT_LB, > > UINT16_MAX, lflows); > > } > > + } > > + > > + if (has_lb_vip(od)) { > > /* Ingress and Egress LB Table (Priority 65534). > > * > > * Send established traffic through conntrack for just NAT. */ > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index a275442a82..a9826e8e9e 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -340,15 +340,12 @@ it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next table. If load balancing rules with virtual IP addresses (and ports) are configured in <code>OVN_Northbound</code> database for a - logical switch datapath, a priority-100 flow is added for each configured - virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the match is - <code>ip && ip4.dst == <var>VIP</var></code>. For IPv6 - <var>VIPs</var>, the match is <code>ip && - ip6.dst == <var>VIP</var></code>. The flow sets an action + logical switch datapath, a priority-100 flow is added with the match + <code>ip</code> to match on IP packets and sets the action <code>reg0[0] = 1; next;</code> to act as a hint for table <code>Pre-stateful</code> to send IP packets to the connection tracker - for packet de-fragmentation before eventually advancing to ingress table - <code>LB</code>. + for packet de-fragmentation before eventually advancing to ingress + table <code>LB</code>. If controller_event has been enabled and load balancing rules with empty backends have been added in <code>OVN_Northbound</code>, a 130 flow is added to trigger ovn-controller events whenever the chassis receives a @@ -356,6 +353,32 @@ previously created, it will be associated to the empty_lb logical flow </p> + <p> + Prior to <code>OVN 20.09</code> we were setting the + <code>reg0[0] = 1</code> only if the IP destination matches the load + balancer VIP. However this had few issues cases where a logical switch + doesn't have any ACLs with <code>allow-related</code> action. + To understand the issue lets a take a TCP load balancer - + <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 with + IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the + packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone + id and the packet is load balanced to the backend - 10.0.0.3. For the + reply packet from the backend lport, it is not sent to the conntrack of + backend lport's zone id. This is fine as long as the packet is valid. + Suppose the backend lport sends an invalid TCP packet (like incorrect + sequence number), the packet gets delivered to the lport 'p1' without + unDNATing the packet to the VIP - 10.0.0.10. And this causes the + connection to be reset by the lport p1's VIF. + </p> + + <p> + We can't fix this issue by adding a logical flow to drop ct.inv packets + in the egress pipeline since it will drop all other connections not + destined to the load balancers. To fix this issue, we send all the + packets to the conntrack in the ingress pipeline if a load balancer is + configured. We can now add a lflow to drop ct.inv packets. + </p> + <p> This table also has a priority-110 flow with the match <code>eth.dst == <var>E</var></code> for all logical switch @@ -430,8 +453,8 @@ <p> This table also contains a priority 0 flow with action <code>next;</code>, so that ACLs allow packets by default. If the - logical datapath has a statetful ACL, the following flows will - also be added: + logical datapath has a statetful ACL or a load balancer with VIP + configured, the following flows will also be added: </p> <ul> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index b95d6cd8a1..3967bae569 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -5034,6 +5034,19 @@ build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows, free(action); } +static bool +has_lb_vip(struct ovn_datapath *od) +{ + for (int i = 0; i < od->nbs->n_load_balancer; i++) { + struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; + if (!smap_is_empty(&nb_lb->vips)) { + return true; + } + } + + return false; +} + static void build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, struct shash *meter_groups, struct hmap *lbs) @@ -5072,8 +5085,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, 110, lflows); } - struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4); - struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6); bool vip_configured = false; for (int i = 0; i < od->nbs->n_load_balancer; i++) { struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; @@ -5083,12 +5094,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, for (size_t j = 0; j < lb->n_vips; j++) { struct lb_vip *lb_vip = &lb->vips[j]; - if (lb_vip->addr_family == AF_INET) { - sset_add(&all_ips_v4, lb_vip->vip); - } else { - sset_add(&all_ips_v6, lb_vip->vip); - } - build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb, S_SWITCH_IN_PRE_LB, meter_groups); @@ -5102,26 +5107,37 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, } /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send - * packet to conntrack for defragmentation. */ - const char *ip_address; - SSET_FOR_EACH (ip_address, &all_ips_v4) { - char *match = xasprintf("ip && ip4.dst == %s", ip_address); - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); - free(match); - } - - SSET_FOR_EACH (ip_address, &all_ips_v6) { - char *match = xasprintf("ip && ip6.dst == %s", ip_address); - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); - free(match); - } - - sset_destroy(&all_ips_v4); - sset_destroy(&all_ips_v6); - + * packet to conntrack for defragmentation. + * + * Send all the packets to conntrack in the ingress pipeline if the + * logical switch has a load balancer with VIP configured. Earlier + * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline + * if the IP destination matches the VIP. But this causes few issues when + * a logical switch has no ACLs configured with allow-related. + * To understand the issue, lets a take a TCP load balancer - + * 10.0.0.10:80=10.0.0.3:80. + * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with + * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1' + * is sent to the p1's conntrack zone id and the packet is load balanced + * to the backend - 10.0.0.3. For the reply packet from the backend lport, + * it is not sent to the conntrack of backend lport's zone id. This is fine + * as long as the packet is valid. Suppose the backend lport sends an + * invalid TCP packet (like incorrect sequence number), the packet gets + * delivered to the lport 'p1' without unDNATing the packet to the + * VIP - 10.0.0.10. And this causes the connection to be reset by the + * lport p1's VIF. + * + * We can't fix this issue by adding a logical flow to drop ct.inv packets + * in the egress pipeline since it will drop all other connections not + * destined to the load balancers. + * + * To fix this issue, we send all the packets to the conntrack in the + * ingress pipeline if a load balancer is configured. We can now + * add a lflow to drop ct.inv packets. + */ if (vip_configured) { + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, + 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); } @@ -5482,7 +5498,7 @@ static void build_acls(struct ovn_datapath *od, struct hmap *lflows, struct hmap *port_groups) { - bool has_stateful = has_stateful_acl(od); + bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od)); /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by * default. A related rule at priority 1 is added below if there @@ -5746,12 +5762,15 @@ build_lb(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;"); - if (od->nbs->load_balancer) { + if (od->nbs->n_load_balancer) { for (size_t i = 0; i < od->n_router_ports; i++) { skip_port_from_conntrack(od, od->router_ports[i], S_SWITCH_IN_LB, S_SWITCH_OUT_LB, UINT16_MAX, lflows); } + } + + if (has_lb_vip(od)) { /* Ingress and Egress LB Table (Priority 65534). * * Send established traffic through conntrack for just NAT. */