Message ID | 7f4dec6839784c9b5605451a80abaaffb74d8e66.1710154918.git.felix.huettner@mail.schwarz |
---|---|
State | Accepted |
Commit | 9ec849e8aa869b646c372fac552ae2609a4b5f66 |
Headers | show |
Series | [ovs-dev,v7,1/2] util: Support checking for kernel versions. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
Felix Huettner via dev <ovs-dev@openvswitch.org> writes: > Previously the kernel did not provide a netlink interface to flush/list > only conntrack entries matching a specific zone. With [1] and [2] it is now > possible to flush and list conntrack entries filtered by zone. Older > kernels not yet supporting this feature will ignore the filter. > For the list request that means just returning all entries (which we can > then filter in userspace as before). > For the flush request that means deleting all conntrack entries. > > The implementation is now identical to the windows one, so we combine > them. > > These significantly improves the performance of flushing conntrack zones > when the conntrack table is large. Since flushing a conntrack zone is > normally triggered via an openflow command it blocks the main ovs thread > and thereby also blocks new flows from being applied. Using this new > feature we can reduce the flushing time for zones by around 93%. > > In combination with OVN the creation of a Logical_Router (which causes > the flushing of a ct zone) could block other operations, e.g. the > failover of Logical_Routers (as they cause new flows to be created). > This is visible from a user perspective as a ovn-controller that is idle > (as it waits for vswitchd) and vswitchd reporting: > "blocked 1000 ms waiting for main to quiesce" (potentially with ever > increasing times). > > The following performance tests where run in a qemu vm with 500.000 > conntrack entries distributed evenly over 500 ct zones using `ovstest > test-netlink-conntrack flush zone=<zoneid>`. > > | flush zone with 1000 entries | flush zone with no entry | > +---------------------+----------+---------------------+----------| > | with the patch | without | with the patch | without | > +----------+----------+----------+----------+----------+----------| > | v6.8-rc4 | v6.7.1 | v6.8-rc4 | v6.8-rc4 | v6.7.1 | v6.8-rc4 | > +---------+----------+----------+----------+----------+----------+----------| > | Min | 0.260 | 3.946 | 3.497 | 0.228 | 3.462 | 3.212 | > | Median | 0.319 | 4.237 | 4.349 | 0.298 | 4.460 | 4.010 | > | 90%ile | 0.335 | 4.367 | 4.522 | 0.325 | 4.662 | 4.572 | > | 99%ile | 0.348 | 4.495 | 4.773 | 0.340 | 4.931 | 6.003 | > | Max | 0.362 | 4.543 | 5.054 | 0.348 | 5.390 | 6.396 | > | Mean | 0.320 | 4.236 | 4.331 | 0.296 | 4.430 | 4.071 | > | Total | 80.02 | 1058 | 1082 | 73.93 | 1107 | 1017 | > > [1]: https://github.com/torvalds/linux/commit/eff3c558bb7e61c41b53e4c8130e514a5a4df9ba > [2]: https://github.com/torvalds/linux/commit/fa173a1b4e3fd1ab5451cbc57de6fc624c824b0a > > Acked-by: Mike Pattrick <mkp@redhat.com> > Co-Authored-By: Luca Czesla <luca.czesla@mail.schwarz> > Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> > Co-Authored-By: Max Lamprecht <max.lamprecht@mail.schwarz> > Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> > Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> > --- Acked-by: Aaron Conole <aconole@redhat.com> Thanks!
On 4/4/24 15:27, Aaron Conole wrote: > Felix Huettner via dev <ovs-dev@openvswitch.org> writes: > >> Previously the kernel did not provide a netlink interface to flush/list >> only conntrack entries matching a specific zone. With [1] and [2] it is now >> possible to flush and list conntrack entries filtered by zone. Older >> kernels not yet supporting this feature will ignore the filter. >> For the list request that means just returning all entries (which we can >> then filter in userspace as before). >> For the flush request that means deleting all conntrack entries. >> >> The implementation is now identical to the windows one, so we combine >> them. >> >> These significantly improves the performance of flushing conntrack zones >> when the conntrack table is large. Since flushing a conntrack zone is >> normally triggered via an openflow command it blocks the main ovs thread >> and thereby also blocks new flows from being applied. Using this new >> feature we can reduce the flushing time for zones by around 93%. >> >> In combination with OVN the creation of a Logical_Router (which causes >> the flushing of a ct zone) could block other operations, e.g. the >> failover of Logical_Routers (as they cause new flows to be created). >> This is visible from a user perspective as a ovn-controller that is idle >> (as it waits for vswitchd) and vswitchd reporting: >> "blocked 1000 ms waiting for main to quiesce" (potentially with ever >> increasing times). >> >> The following performance tests where run in a qemu vm with 500.000 >> conntrack entries distributed evenly over 500 ct zones using `ovstest >> test-netlink-conntrack flush zone=<zoneid>`. >> >> | flush zone with 1000 entries | flush zone with no entry | >> +---------------------+----------+---------------------+----------| >> | with the patch | without | with the patch | without | >> +----------+----------+----------+----------+----------+----------| >> | v6.8-rc4 | v6.7.1 | v6.8-rc4 | v6.8-rc4 | v6.7.1 | v6.8-rc4 | >> +---------+----------+----------+----------+----------+----------+----------| >> | Min | 0.260 | 3.946 | 3.497 | 0.228 | 3.462 | 3.212 | >> | Median | 0.319 | 4.237 | 4.349 | 0.298 | 4.460 | 4.010 | >> | 90%ile | 0.335 | 4.367 | 4.522 | 0.325 | 4.662 | 4.572 | >> | 99%ile | 0.348 | 4.495 | 4.773 | 0.340 | 4.931 | 6.003 | >> | Max | 0.362 | 4.543 | 5.054 | 0.348 | 5.390 | 6.396 | >> | Mean | 0.320 | 4.236 | 4.331 | 0.296 | 4.430 | 4.071 | >> | Total | 80.02 | 1058 | 1082 | 73.93 | 1107 | 1017 | >> >> [1]: https://github.com/torvalds/linux/commit/eff3c558bb7e61c41b53e4c8130e514a5a4df9ba >> [2]: https://github.com/torvalds/linux/commit/fa173a1b4e3fd1ab5451cbc57de6fc624c824b0a >> >> Acked-by: Mike Pattrick <mkp@redhat.com> >> Co-Authored-By: Luca Czesla <luca.czesla@mail.schwarz> >> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> >> Co-Authored-By: Max Lamprecht <max.lamprecht@mail.schwarz> >> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> >> Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> >> --- > > Acked-by: Aaron Conole <aconole@redhat.com> > > Thanks! Thanks, everyone! Applied. Best regards, Ilya Maximets.
diff --git a/lib/netlink-conntrack.c b/lib/netlink-conntrack.c index 492bfcffb..263496b17 100644 --- a/lib/netlink-conntrack.c +++ b/lib/netlink-conntrack.c @@ -141,6 +141,9 @@ nl_ct_dump_start(struct nl_ct_dump_state **statep, const uint16_t *zone, nl_msg_put_nfgenmsg(&state->buf, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK, IPCTNL_MSG_CT_GET, NLM_F_REQUEST); + if (zone) { + nl_msg_put_be16(&state->buf, CTA_ZONE, htons(*zone)); + } nl_dump_start(&state->dump, NETLINK_NETFILTER, &state->buf); ofpbuf_clear(&state->buf); @@ -263,11 +266,9 @@ out: return err; } -#ifdef _WIN32 -int -nl_ct_flush_zone(uint16_t flush_zone) +static int +nl_ct_flush_zone_with_cta_zone(uint16_t flush_zone) { - /* Windows can flush a specific zone */ struct ofpbuf buf; int err; @@ -282,24 +283,63 @@ nl_ct_flush_zone(uint16_t flush_zone) return err; } + +#ifdef _WIN32 +int +nl_ct_flush_zone(uint16_t flush_zone) +{ + return nl_ct_flush_zone_with_cta_zone(flush_zone); +} #else + +static bool +netlink_flush_supports_zone(void) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + static bool supported = false; + + if (ovsthread_once_start(&once)) { + if (ovs_kernel_is_version_or_newer(6, 8)) { + supported = true; + } else { + VLOG_INFO("disabling conntrack flush by zone. " + "Not supported in Linux kernel"); + } + ovsthread_once_done(&once); + } + return supported; +} + int nl_ct_flush_zone(uint16_t flush_zone) { - /* Apparently, there's no netlink interface to flush a specific zone. + /* In older kernels, there was no netlink interface to flush a specific + * conntrack zone. * This code dumps every connection, checks the zone and eventually * delete the entry. + * In newer kernels there is the option to specify a zone for filtering + * during dumps. Older kernels ignore this option. We set it here in the + * hope we only get relevant entries back, but fall back to filtering here + * to keep compatibility. * - * This is race-prone, but it is better than using shell scripts. */ + * This is race-prone, but it is better than using shell scripts. + * + * Additionally newer kernels also support flushing a zone without listing + * it first. */ struct nl_dump dump; struct ofpbuf buf, reply, delete; + if (netlink_flush_supports_zone()) { + return nl_ct_flush_zone_with_cta_zone(flush_zone); + } + ofpbuf_init(&buf, NL_DUMP_BUFSIZE); ofpbuf_init(&delete, NL_DUMP_BUFSIZE); nl_msg_put_nfgenmsg(&buf, 0, AF_UNSPEC, NFNL_SUBSYS_CTNETLINK, IPCTNL_MSG_CT_GET, NLM_F_REQUEST); + nl_msg_put_be16(&buf, CTA_ZONE, htons(flush_zone)); nl_dump_start(&dump, NETLINK_NETFILTER, &buf); ofpbuf_clear(&buf); diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 2d12d558e..a4600eb54 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -3069,6 +3069,34 @@ AT_CHECK([grep -q "failed to parse mark" stderr]) AT_CHECK([FLUSH_CMD labels=invalid], [ignore], [ignore], [stderr]) AT_CHECK([grep -q "failed to parse labels" stderr]) + +dnl Test UDP from port 1 and 2, partial flush by zone. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"]) +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1," | sort], [0], [dnl +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),mark=170 +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5,labels=0xaa00000000 +]) + +AT_CHECK([FLUSH_CMD zone=5]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1,"], [0], [dnl +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),mark=170 +]) + +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + +AT_CHECK([FLUSH_CMD zone=0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1,"], [0], [dnl +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5,labels=0xaa00000000 +]) + +AT_CHECK([FLUSH_CMD]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1,"], [1]) ]) OVS_TRAFFIC_VSWITCHD_STOP