Message ID | 20191224010103.56407-1-mcroce@redhat.com |
---|---|
Headers | show |
Series | mvpp2: page_pool support | expand |
On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote: > This patches change the memory allocator of mvpp2 from the frag allocator to > the page_pool API. This change is needed to add later XDP support to mvpp2. > > The reason I send it as RFC is that with this changeset, mvpp2 performs much > more slower. This is the tc drop rate measured with a single flow: > > stock net-next with frag allocator: > rx: 900.7 Mbps 1877 Kpps > > this patchset with page_pool: > rx: 423.5 Mbps 882.3 Kpps > > This is the perf top when receiving traffic: > > 27.68% [kernel] [k] __page_pool_clean_page This seems extremly high on the list. > 9.79% [kernel] [k] get_page_from_freelist > 7.18% [kernel] [k] free_unref_page > 4.64% [kernel] [k] build_skb > 4.63% [kernel] [k] __netif_receive_skb_core > 3.83% [mvpp2] [k] mvpp2_poll > 3.64% [kernel] [k] eth_type_trans > 3.61% [kernel] [k] kmem_cache_free > 3.03% [kernel] [k] kmem_cache_alloc > 2.76% [kernel] [k] dev_gro_receive > 2.69% [mvpp2] [k] mvpp2_bm_pool_put > 2.68% [kernel] [k] page_frag_free > 1.83% [kernel] [k] inet_gro_receive > 1.74% [kernel] [k] page_pool_alloc_pages > 1.70% [kernel] [k] __build_skb > 1.47% [kernel] [k] __alloc_pages_nodemask > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0 > 1.29% [kernel] [k] tcf_action_exec > > I tried Ilias patches for page_pool recycling, I get an improvement > to ~1100, but I'm still far than the original allocator. Can you post the recycling perf for comparison? > > Any idea on why I get such bad numbers? Nop but it's indeed strange > > Another reason to send it as RFC is that I'm not fully convinced on how to > use the page_pool given the HW limitation of the BM. I'll have a look right after holidays > > The driver currently uses, for every CPU, a page_pool for short packets and > another for long ones. The driver also has 4 rx queue per port, so every > RXQ #1 will share the short and long page pools of CPU #1. > I am not sure i am following the hardware config here > This means that for every RX queue I call xdp_rxq_info_reg_mem_model() twice, > on two different page_pool, can this be a problem? > > As usual, ideas are welcome. > > Matteo Croce (2): > mvpp2: use page_pool allocator > mvpp2: memory accounting > > drivers/net/ethernet/marvell/Kconfig | 1 + > drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 7 + > .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 142 +++++++++++++++--- > 3 files changed, 125 insertions(+), 25 deletions(-) > > -- > 2.24.1 > Cheers /Ilias
On Tue, Dec 24, 2019 at 10:52 AM Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote: > > This patches change the memory allocator of mvpp2 from the frag allocator to > > the page_pool API. This change is needed to add later XDP support to mvpp2. > > > > The reason I send it as RFC is that with this changeset, mvpp2 performs much > > more slower. This is the tc drop rate measured with a single flow: > > > > stock net-next with frag allocator: > > rx: 900.7 Mbps 1877 Kpps > > > > this patchset with page_pool: > > rx: 423.5 Mbps 882.3 Kpps > > > > This is the perf top when receiving traffic: > > > > 27.68% [kernel] [k] __page_pool_clean_page > > This seems extremly high on the list. > > > 9.79% [kernel] [k] get_page_from_freelist > > 7.18% [kernel] [k] free_unref_page > > 4.64% [kernel] [k] build_skb > > 4.63% [kernel] [k] __netif_receive_skb_core > > 3.83% [mvpp2] [k] mvpp2_poll > > 3.64% [kernel] [k] eth_type_trans > > 3.61% [kernel] [k] kmem_cache_free > > 3.03% [kernel] [k] kmem_cache_alloc > > 2.76% [kernel] [k] dev_gro_receive > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put > > 2.68% [kernel] [k] page_frag_free > > 1.83% [kernel] [k] inet_gro_receive > > 1.74% [kernel] [k] page_pool_alloc_pages > > 1.70% [kernel] [k] __build_skb > > 1.47% [kernel] [k] __alloc_pages_nodemask > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0 > > 1.29% [kernel] [k] tcf_action_exec > > > > I tried Ilias patches for page_pool recycling, I get an improvement > > to ~1100, but I'm still far than the original allocator. > > Can you post the recycling perf for comparison? > 12.00% [kernel] [k] get_page_from_freelist 9.25% [kernel] [k] free_unref_page 6.83% [kernel] [k] eth_type_trans 5.33% [kernel] [k] __netif_receive_skb_core 4.96% [mvpp2] [k] mvpp2_poll 4.64% [kernel] [k] kmem_cache_free 4.06% [kernel] [k] __xdp_return 3.60% [kernel] [k] kmem_cache_alloc 3.31% [kernel] [k] dev_gro_receive 3.29% [kernel] [k] __page_pool_clean_page 3.25% [mvpp2] [k] mvpp2_bm_pool_put 2.73% [kernel] [k] __page_pool_put_page 2.33% [kernel] [k] __alloc_pages_nodemask 2.33% [kernel] [k] inet_gro_receive 2.05% [kernel] [k] __build_skb 1.95% [kernel] [k] build_skb 1.89% [cls_matchall] [k] mall_classify 1.83% [kernel] [k] page_pool_alloc_pages 1.80% [kernel] [k] tcf_action_exec 1.70% [mvpp2] [k] mvpp2_buf_alloc.isra.0 1.63% [kernel] [k] free_unref_page_prepare.part.0 1.45% [kernel] [k] page_pool_return_skb_page 1.42% [act_gact] [k] tcf_gact_act 1.16% [kernel] [k] netif_receive_skb_list_internal 1.08% [kernel] [k] kfree_skb 1.07% [kernel] [k] skb_release_data > > > > Any idea on why I get such bad numbers? > > Nop but it's indeed strange > > > > > Another reason to send it as RFC is that I'm not fully convinced on how to > > use the page_pool given the HW limitation of the BM. > > I'll have a look right after holidays > Thanks > > > > The driver currently uses, for every CPU, a page_pool for short packets and > > another for long ones. The driver also has 4 rx queue per port, so every > > RXQ #1 will share the short and long page pools of CPU #1. > > > > I am not sure i am following the hardware config here > Never mind, it's quite a mess, I needed a lot of time to get it :) The HW put the packets in different buffer pools depending on the size: short: 64..128 long: 128..1664 jumbo: 1664..9856 Let's skip the jumbo buffer for now and assume we have 4 CPU, the driver allocates 4 short and 4 long buffers. Each port has 4 RX queues, and each one uses a short and a long buffer. With the page_pool api, we have 8 struct page_pool, 4 for the short and 4 for the long buffers. > > This means that for every RX queue I call xdp_rxq_info_reg_mem_model() twice, > > on two different page_pool, can this be a problem? > > > > As usual, ideas are welcome. > > > > Matteo Croce (2): > > mvpp2: use page_pool allocator > > mvpp2: memory accounting > > > > drivers/net/ethernet/marvell/Kconfig | 1 + > > drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 7 + > > .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 142 +++++++++++++++--- > > 3 files changed, 125 insertions(+), 25 deletions(-) > > > > -- > > 2.24.1 > > > Cheers > /Ilias > Bye,
On Tue, 24 Dec 2019 11:52:29 +0200 Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote: > > This patches change the memory allocator of mvpp2 from the frag allocator to > > the page_pool API. This change is needed to add later XDP support to mvpp2. > > > > The reason I send it as RFC is that with this changeset, mvpp2 performs much > > more slower. This is the tc drop rate measured with a single flow: > > > > stock net-next with frag allocator: > > rx: 900.7 Mbps 1877 Kpps > > > > this patchset with page_pool: > > rx: 423.5 Mbps 882.3 Kpps > > > > This is the perf top when receiving traffic: > > > > 27.68% [kernel] [k] __page_pool_clean_page > > This seems extremly high on the list. This looks related to the cost of dma unmap, as page_pool have PP_FLAG_DMA_MAP. (It is a little strange, as page_pool have flag DMA_ATTR_SKIP_CPU_SYNC, which should make it less expensive). > > 9.79% [kernel] [k] get_page_from_freelist You are clearly hitting page-allocator every time, because you are not using page_pool recycle facility. > > 7.18% [kernel] [k] free_unref_page > > 4.64% [kernel] [k] build_skb > > 4.63% [kernel] [k] __netif_receive_skb_core > > 3.83% [mvpp2] [k] mvpp2_poll > > 3.64% [kernel] [k] eth_type_trans > > 3.61% [kernel] [k] kmem_cache_free > > 3.03% [kernel] [k] kmem_cache_alloc > > 2.76% [kernel] [k] dev_gro_receive > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put > > 2.68% [kernel] [k] page_frag_free > > 1.83% [kernel] [k] inet_gro_receive > > 1.74% [kernel] [k] page_pool_alloc_pages > > 1.70% [kernel] [k] __build_skb > > 1.47% [kernel] [k] __alloc_pages_nodemask > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0 > > 1.29% [kernel] [k] tcf_action_exec > > > > I tried Ilias patches for page_pool recycling, I get an improvement > > to ~1100, but I'm still far than the original allocator.
On Tue, 24 Dec 2019 14:34:07 +0100 Matteo Croce <mcroce@redhat.com> wrote: > On Tue, Dec 24, 2019 at 10:52 AM Ilias Apalodimas > <ilias.apalodimas@linaro.org> wrote: > > > > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote: > > > This patches change the memory allocator of mvpp2 from the frag allocator to > > > the page_pool API. This change is needed to add later XDP support to mvpp2. > > > > > > The reason I send it as RFC is that with this changeset, mvpp2 performs much > > > more slower. This is the tc drop rate measured with a single flow: > > > > > > stock net-next with frag allocator: > > > rx: 900.7 Mbps 1877 Kpps > > > > > > this patchset with page_pool: > > > rx: 423.5 Mbps 882.3 Kpps > > > > > > This is the perf top when receiving traffic: > > > > > > 27.68% [kernel] [k] __page_pool_clean_page > > > > This seems extremly high on the list. > > > > > 9.79% [kernel] [k] get_page_from_freelist > > > 7.18% [kernel] [k] free_unref_page > > > 4.64% [kernel] [k] build_skb > > > 4.63% [kernel] [k] __netif_receive_skb_core > > > 3.83% [mvpp2] [k] mvpp2_poll > > > 3.64% [kernel] [k] eth_type_trans > > > 3.61% [kernel] [k] kmem_cache_free > > > 3.03% [kernel] [k] kmem_cache_alloc > > > 2.76% [kernel] [k] dev_gro_receive > > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put > > > 2.68% [kernel] [k] page_frag_free > > > 1.83% [kernel] [k] inet_gro_receive > > > 1.74% [kernel] [k] page_pool_alloc_pages > > > 1.70% [kernel] [k] __build_skb > > > 1.47% [kernel] [k] __alloc_pages_nodemask > > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0 > > > 1.29% [kernel] [k] tcf_action_exec > > > > > > I tried Ilias patches for page_pool recycling, I get an improvement > > > to ~1100, but I'm still far than the original allocator. > > > > Can you post the recycling perf for comparison? > > > > 12.00% [kernel] [k] get_page_from_freelist > 9.25% [kernel] [k] free_unref_page Hmm, this indicate pages are not getting recycled. > 6.83% [kernel] [k] eth_type_trans > 5.33% [kernel] [k] __netif_receive_skb_core > 4.96% [mvpp2] [k] mvpp2_poll > 4.64% [kernel] [k] kmem_cache_free > 4.06% [kernel] [k] __xdp_return You do invoke __xdp_return() code, but it might find that the page cannot be recycled... > 3.60% [kernel] [k] kmem_cache_alloc > 3.31% [kernel] [k] dev_gro_receive > 3.29% [kernel] [k] __page_pool_clean_page > 3.25% [mvpp2] [k] mvpp2_bm_pool_put > 2.73% [kernel] [k] __page_pool_put_page > 2.33% [kernel] [k] __alloc_pages_nodemask > 2.33% [kernel] [k] inet_gro_receive > 2.05% [kernel] [k] __build_skb > 1.95% [kernel] [k] build_skb > 1.89% [cls_matchall] [k] mall_classify > 1.83% [kernel] [k] page_pool_alloc_pages > 1.80% [kernel] [k] tcf_action_exec > 1.70% [mvpp2] [k] mvpp2_buf_alloc.isra.0 > 1.63% [kernel] [k] free_unref_page_prepare.part.0 > 1.45% [kernel] [k] page_pool_return_skb_page > 1.42% [act_gact] [k] tcf_gact_act > 1.16% [kernel] [k] netif_receive_skb_list_internal > 1.08% [kernel] [k] kfree_skb > 1.07% [kernel] [k] skb_release_data
On Tue, Dec 24, 2019 at 3:01 PM Jesper Dangaard Brouer <brouer@redhat.com> wrote: > > On Tue, 24 Dec 2019 11:52:29 +0200 > Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote: > > > This patches change the memory allocator of mvpp2 from the frag allocator to > > > the page_pool API. This change is needed to add later XDP support to mvpp2. > > > > > > The reason I send it as RFC is that with this changeset, mvpp2 performs much > > > more slower. This is the tc drop rate measured with a single flow: > > > > > > stock net-next with frag allocator: > > > rx: 900.7 Mbps 1877 Kpps > > > > > > this patchset with page_pool: > > > rx: 423.5 Mbps 882.3 Kpps > > > > > > This is the perf top when receiving traffic: > > > > > > 27.68% [kernel] [k] __page_pool_clean_page > > > > This seems extremly high on the list. > > This looks related to the cost of dma unmap, as page_pool have > PP_FLAG_DMA_MAP. (It is a little strange, as page_pool have flag > DMA_ATTR_SKIP_CPU_SYNC, which should make it less expensive). > > > > > 9.79% [kernel] [k] get_page_from_freelist > > You are clearly hitting page-allocator every time, because you are not > using page_pool recycle facility. > > > > > 7.18% [kernel] [k] free_unref_page > > > 4.64% [kernel] [k] build_skb > > > 4.63% [kernel] [k] __netif_receive_skb_core > > > 3.83% [mvpp2] [k] mvpp2_poll > > > 3.64% [kernel] [k] eth_type_trans > > > 3.61% [kernel] [k] kmem_cache_free > > > 3.03% [kernel] [k] kmem_cache_alloc > > > 2.76% [kernel] [k] dev_gro_receive > > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put > > > 2.68% [kernel] [k] page_frag_free > > > 1.83% [kernel] [k] inet_gro_receive > > > 1.74% [kernel] [k] page_pool_alloc_pages > > > 1.70% [kernel] [k] __build_skb > > > 1.47% [kernel] [k] __alloc_pages_nodemask > > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0 > > > 1.29% [kernel] [k] tcf_action_exec > > > > > > I tried Ilias patches for page_pool recycling, I get an improvement > > > to ~1100, but I'm still far than the original allocator. > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer > The change I did to use the recycling is the following: --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -3071,7 +3071,7 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi, if (pp) - page_pool_release_page(pp, virt_to_page(data)); + skb_mark_for_recycle(skb, virt_to_page(data), &rxq->xdp_rxq.mem); else dma_unmap_single_attrs(dev->dev.parent, dma_addr, -- Matteo Croce per aspera ad upstream
On Tue, Dec 24, 2019 at 03:37:49PM +0100, Matteo Croce wrote: > On Tue, Dec 24, 2019 at 3:01 PM Jesper Dangaard Brouer > <brouer@redhat.com> wrote: > > > > On Tue, 24 Dec 2019 11:52:29 +0200 > > Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > > > > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote: > > > > This patches change the memory allocator of mvpp2 from the frag allocator to > > > > the page_pool API. This change is needed to add later XDP support to mvpp2. > > > > > > > > The reason I send it as RFC is that with this changeset, mvpp2 performs much > > > > more slower. This is the tc drop rate measured with a single flow: > > > > > > > > stock net-next with frag allocator: > > > > rx: 900.7 Mbps 1877 Kpps > > > > > > > > this patchset with page_pool: > > > > rx: 423.5 Mbps 882.3 Kpps > > > > > > > > This is the perf top when receiving traffic: > > > > > > > > 27.68% [kernel] [k] __page_pool_clean_page > > > > > > This seems extremly high on the list. > > > > This looks related to the cost of dma unmap, as page_pool have > > PP_FLAG_DMA_MAP. (It is a little strange, as page_pool have flag > > DMA_ATTR_SKIP_CPU_SYNC, which should make it less expensive). > > > > > > > > 9.79% [kernel] [k] get_page_from_freelist > > > > You are clearly hitting page-allocator every time, because you are not > > using page_pool recycle facility. > > > > > > > > 7.18% [kernel] [k] free_unref_page > > > > 4.64% [kernel] [k] build_skb > > > > 4.63% [kernel] [k] __netif_receive_skb_core > > > > 3.83% [mvpp2] [k] mvpp2_poll > > > > 3.64% [kernel] [k] eth_type_trans > > > > 3.61% [kernel] [k] kmem_cache_free > > > > 3.03% [kernel] [k] kmem_cache_alloc > > > > 2.76% [kernel] [k] dev_gro_receive > > > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put > > > > 2.68% [kernel] [k] page_frag_free > > > > 1.83% [kernel] [k] inet_gro_receive > > > > 1.74% [kernel] [k] page_pool_alloc_pages > > > > 1.70% [kernel] [k] __build_skb > > > > 1.47% [kernel] [k] __alloc_pages_nodemask > > > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0 > > > > 1.29% [kernel] [k] tcf_action_exec > > > > > > > > I tried Ilias patches for page_pool recycling, I get an improvement > > > > to ~1100, but I'm still far than the original allocator. > > -- > > Best regards, > > Jesper Dangaard Brouer > > MSc.CS, Principal Kernel Engineer at Red Hat > > LinkedIn: http://www.linkedin.com/in/brouer > > > > The change I did to use the recycling is the following: > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > @@ -3071,7 +3071,7 @@ static int mvpp2_rx(struct mvpp2_port *port, > struct napi_struct *napi, > if (pp) > - page_pool_release_page(pp, virt_to_page(data)); > + skb_mark_for_recycle(skb, virt_to_page(data), &rxq->xdp_rxq.mem); > else > dma_unmap_single_attrs(dev->dev.parent, dma_addr, > > Jesper is rightm you aren't recycling anything. The mark skb_mark_for_recycle() usage seems correct. There are a few more places that we refuse to recycle (for example coalescing page pool and slub allocated pages is forbidden). I wonder if you hit any of those cases and recycling doesn't take place. We'll hopefully release updated code shortly. I'll ping you and we can test on that Thanks /Ilias > > > -- > Matteo Croce > per aspera ad upstream >