Message ID | cover.1574428269.git.sbrivio@redhat.com |
---|---|
Headers | show |
Series | nftables: Set implementation for arbitrary concatenation of ranges | expand |
On Fri, Nov 22, 2019 at 02:39:59PM +0100, Stefano Brivio wrote: [...] > Patch 1/8 implements the needed UAPI bits: additions to the existing > interface are kept to a minimum by recycling existing concepts for > both ranging and concatenation, as suggested by Florian. > > Patch 2/8 adds a new bitmap operation that copies the source bitmap > onto the destination while removing a given region, and is needed to > delete regions of arrays mapping between lookup tables. > > Patch 3/8 is the actual set implementation. > > Patch 4/8 introduces selftests for the new implementation. [...] After talking to Florian, I'm inclined to merge upstream up to patch 4/8 in this merge window, once the UAPI discussion is sorted out. Thanks.
On Sat, 23 Nov 2019 21:05:18 +0100 Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Fri, Nov 22, 2019 at 02:39:59PM +0100, Stefano Brivio wrote: > [...] > > Patch 1/8 implements the needed UAPI bits: additions to the existing > > interface are kept to a minimum by recycling existing concepts for > > both ranging and concatenation, as suggested by Florian. > > > > Patch 2/8 adds a new bitmap operation that copies the source bitmap > > onto the destination while removing a given region, and is needed to > > delete regions of arrays mapping between lookup tables. > > > > Patch 3/8 is the actual set implementation. > > > > Patch 4/8 introduces selftests for the new implementation. > [...] > > After talking to Florian, I'm inclined to merge upstream up to patch > 4/8 in this merge window, once the UAPI discussion is sorted out. Thanks for the update. Let me know if there's some specific topic or concern I can start addressing for patches 5/8 to 8/8.
On Mon, Nov 25, 2019 at 10:31:06AM +0100, Stefano Brivio wrote: > On Sat, 23 Nov 2019 21:05:18 +0100 > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > On Fri, Nov 22, 2019 at 02:39:59PM +0100, Stefano Brivio wrote: > > [...] > > > Patch 1/8 implements the needed UAPI bits: additions to the existing > > > interface are kept to a minimum by recycling existing concepts for > > > both ranging and concatenation, as suggested by Florian. > > > > > > Patch 2/8 adds a new bitmap operation that copies the source bitmap > > > onto the destination while removing a given region, and is needed to > > > delete regions of arrays mapping between lookup tables. > > > > > > Patch 3/8 is the actual set implementation. > > > > > > Patch 4/8 introduces selftests for the new implementation. > > [...] > > > > After talking to Florian, I'm inclined to merge upstream up to patch > > 4/8 in this merge window, once the UAPI discussion is sorted out. > > Thanks for the update. Let me know if there's some specific topic or > concern I can start addressing for patches 5/8 to 8/8. Merge window is now closed, I was trying to get the bare minimum in this round. Now we have a bit more time to merge this upstream. BTW, do you have numbers comparing the AVX2 version with the C code? I quickly had a look at your numbers, but not clear to me if this is compared there. Thanks.
On Mon, 25 Nov 2019 11:02:14 +0100 Pablo Neira Ayuso <pablo@netfilter.org> wrote: > BTW, do you have numbers comparing the AVX2 version with the C code? I > quickly had a look at your numbers, but not clear to me if this is > compared there. No, sorry, I didn't report that anywhere, I probably should have in the commit messages for 4/8 and 5/8. This was from v1 at 4/8, single thread on AMD Epyc 7351, C implementation without unrolled loops: TEST: performance net,port [ OK ] baseline (drop from netdev hook): 9971887pps baseline hash (non-ranged entries): 5991032pps baseline rbtree (match on first field only): 2666255pps set with 1000 full, ranged entries: 2220404pps port,net [ OK ] baseline (drop from netdev hook): 10004499pps baseline hash (non-ranged entries): 6011221pps baseline rbtree (match on first field only): 4035566pps set with 100 full, ranged entries: 4018240pps net6,port [ OK ] baseline (drop from netdev hook): 9497500pps baseline hash (non-ranged entries): 4685436pps baseline rbtree (match on first field only): 1354978pps set with 1000 full, ranged entries: 1052188pps port,proto [ OK ] baseline (drop from netdev hook): 10749256pps baseline hash (non-ranged entries): 6774103pps baseline rbtree (match on first field only): 2819211pps set with 30000 full, ranged entries: 283492pps net6,port,mac [ OK ] baseline (drop from netdev hook): 9463935pps baseline hash (non-ranged entries): 3777039pps baseline rbtree (match on first field only): 2943527pps set with 10 full, ranged entries: 1927899pps net6,port,mac,proto [ OK ] baseline (drop from netdev hook): 9502200pps baseline hash (non-ranged entries): 3637739pps baseline rbtree (match on first field only): 1342323pps set with 1000 full, ranged entries: 753960pps net,mac [ OK ] baseline (drop from netdev hook): 10065715pps baseline hash (non-ranged entries): 5082895pps baseline rbtree (match on first field only): 2677391pps set with 1000 full, ranged entries: 1215104pps I would re-run tests on v3 patches and include the comparisons in commit messages. By the way, as you can see, even though the comparison with rbtree is unfair (comparing > 1 fields adds substantial complexity), without AVX2 it doesn't scale as nicely. I plan to propose some optimisations that should substantially improve the non-vectorised case, but what I have in mind right now is a bit convoluted and I would skip it in this initial submission.