Message ID | 20200907150217.30888-5-bjorn.topel@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | xsk: increase NAPI budget for AF_XDP zero-copy path | expand |
On Mon, 7 Sep 2020 17:02:17 +0200 Björn Töpel wrote: > From: Björn Töpel <bjorn.topel@intel.com> > > Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx > zero-copy path. > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > --- > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > index 3771857cf887..f32c1ba0d237 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, > bool failure = false; > struct sk_buff *skb; > > - while (likely(total_rx_packets < budget)) { > + while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) { I was thinking that we'd multiply 'budget' here, not replace it with a constant. Looks like ixgbe dutifully passes 'per_ring_budget' into the clean_rx functions, not a complete NAPI budget. > union ixgbe_adv_rx_desc *rx_desc; > struct ixgbe_rx_buffer *bi; > unsigned int size;
On 2020-09-07 21:32, Jakub Kicinski wrote: > On Mon, 7 Sep 2020 17:02:17 +0200 Björn Töpel wrote: >> From: Björn Töpel <bjorn.topel@intel.com> >> >> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx >> zero-copy path. >> >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> >> --- >> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> index 3771857cf887..f32c1ba0d237 100644 >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, >> bool failure = false; >> struct sk_buff *skb; >> >> - while (likely(total_rx_packets < budget)) { >> + while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) { > > I was thinking that we'd multiply 'budget' here, not replace it with a > constant. Looks like ixgbe dutifully passes 'per_ring_budget' into the > clean_rx functions, not a complete NAPI budget. > Correct, and i40e/ice does the same ("per_ring_budget"). As for budget << XSK_NAPI_MULT vs replacing; Replacing the budget is more in line with what the drivers do for the Tx cleanup (xxx_clean_tx_irq), where the napi budget is discarded completely; Again, with the idea that "this is much cheaper than a "per-packet through the stack". Do you prefer the multiplier way that you describe? Cheers, Björn >> union ixgbe_adv_rx_desc *rx_desc; >> struct ixgbe_rx_buffer *bi; >> unsigned int size; >
On 9/7/20 5:02 PM, Björn Töpel wrote: > From: Björn Töpel <bjorn.topel@intel.com> > > Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx > zero-copy path. > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > --- > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > index 3771857cf887..f32c1ba0d237 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, > bool failure = false; > struct sk_buff *skb; > > - while (likely(total_rx_packets < budget)) { > + while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) { > union ixgbe_adv_rx_desc *rx_desc; > struct ixgbe_rx_buffer *bi; > unsigned int size This is a violation of NAPI API. IXGBE is already diverging a bit from best practices. There are reasons we want to control the budget from callers, if you want bigger budget just increase it instead of using your own ? I would rather use a generic patch. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7bd4fcdd0738a718d8b0f7134523cd87e4dcdb7b..33bcbdb6fef488983438c6584e3cbb0a44febb1a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2311,11 +2311,14 @@ static inline void *netdev_priv(const struct net_device *dev) */ #define SET_NETDEV_DEVTYPE(net, devtype) ((net)->dev.type = (devtype)) -/* Default NAPI poll() weight - * Device drivers are strongly advised to not use bigger value - */ +/* Default NAPI poll() weight. Highly recommended. */ #define NAPI_POLL_WEIGHT 64 +/* Device drivers are strongly advised to not use bigger value, + * as this might cause latencies in stress conditions. + */ +#define NAPI_POLL_WEIGHT_MAX 256 + /** * netif_napi_add - initialize a NAPI context * @dev: network device diff --git a/net/core/dev.c b/net/core/dev.c index 4086d335978c1bf62bd3965bd2ea96a4ac06b13d..496713fb6075bd8e5e22725e7c817172858e1dd7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6608,7 +6608,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi, INIT_LIST_HEAD(&napi->rx_list); napi->rx_count = 0; napi->poll = poll; - if (weight > NAPI_POLL_WEIGHT) + if (weight > NAPI_POLL_WEIGHT_MAX) netdev_err_once(dev, "%s() called with weight %d\n", __func__, weight); napi->weight = weight;
Dear Björn, Am 07.09.20 um 17:02 schrieb Björn Töpel: > From: Björn Töpel <bjorn.topel@intel.com> > > Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx > zero-copy path. Could you please add the description from the patch series cover letter to this commit too? To my knowledge, the message in the cover letter won’t be stored in the git repository. > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > --- > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) […] Kind regards, Paul
On 2020-09-08 12:12, Paul Menzel wrote: > Dear Björn, > > > Am 07.09.20 um 17:02 schrieb Björn Töpel: >> From: Björn Töpel <bjorn.topel@intel.com> >> >> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx >> zero-copy path. > > Could you please add the description from the patch series cover letter > to this commit too? To my knowledge, the message in the cover letter > won’t be stored in the git repository. > Paul, thanks for the input! The netdev/bpf trees always include the cover letter in the merge commit. Cheers, Björn >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> >> --- >> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > […] > > > Kind regards, > > Paul
Dear Björn, Am 08.09.20 um 13:12 schrieb Björn Töpel: > On 2020-09-08 12:12, Paul Menzel wrote: >> Am 07.09.20 um 17:02 schrieb Björn Töpel: >>> From: Björn Töpel <bjorn.topel@intel.com> >>> >>> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx >>> zero-copy path. >> >> Could you please add the description from the patch series cover >> letter to this commit too? To my knowledge, the message in the cover >> letter won’t be stored in the git repository. > > Paul, thanks for the input! The netdev/bpf trees always include the > cover letter in the merge commit. Yes, for pull/merge requests. But you posted them to the list, so I’d assume they will be applied with `git am` and not merged, or am I missing something. Could you please point me to a merge commit where the patches were posted to the list? Kind regards, Paul
On 2020-09-08 13:20, Paul Menzel wrote: [...]>> Paul, thanks for the input! The netdev/bpf trees always include the >> cover letter in the merge commit. > > Yes, for pull/merge requests. But you posted them to the list, so I’d > assume they will be applied with `git am` and not merged, or am I > missing something. Could you please point me to a merge commit where the > patches were posted to the list? > An example: A series is posted to the list [1], and when merged the merge commit look like [2]. Thanks, Björn [1] https://lore.kernel.org/bpf/20200520192103.355233-1-bjorn.topel@gmail.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=79917b242c3fe0d89e4752bc25ffef4574c2194b
On 2020-09-08 11:45, Eric Dumazet wrote: > > > On 9/7/20 5:02 PM, Björn Töpel wrote: >> From: Björn Töpel <bjorn.topel@intel.com> >> >> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx >> zero-copy path. >> >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> >> --- >> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> index 3771857cf887..f32c1ba0d237 100644 >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, >> bool failure = false; >> struct sk_buff *skb; >> >> - while (likely(total_rx_packets < budget)) { >> + while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) { >> union ixgbe_adv_rx_desc *rx_desc; >> struct ixgbe_rx_buffer *bi; >> unsigned int size > > This is a violation of NAPI API. IXGBE is already diverging a bit from best practices. > Thanks for having a look, Eric! By diverging from best practices, do you mean that multiple queues share one NAPI context, and the budget is split over the queues (say, 4 queues, 64/4 per queue), or that Tx simply ignores the budget? Or both? > There are reasons we want to control the budget from callers, > if you want bigger budget just increase it instead of using your own ? > > I would rather use a generic patch. > Hmm, so a configurable NAPI budget for, say, the AF_XDP enabled queues/NAPIs? Am I reading that correct? (Note that this is *only* for the AF_XDP enabled queues.) I'll try to rework this to something more palatable. Thanks, Björn > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 7bd4fcdd0738a718d8b0f7134523cd87e4dcdb7b..33bcbdb6fef488983438c6584e3cbb0a44febb1a 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -2311,11 +2311,14 @@ static inline void *netdev_priv(const struct net_device *dev) > */ > #define SET_NETDEV_DEVTYPE(net, devtype) ((net)->dev.type = (devtype)) > > -/* Default NAPI poll() weight > - * Device drivers are strongly advised to not use bigger value > - */ > +/* Default NAPI poll() weight. Highly recommended. */ > #define NAPI_POLL_WEIGHT 64 > > +/* Device drivers are strongly advised to not use bigger value, > + * as this might cause latencies in stress conditions. > + */ > +#define NAPI_POLL_WEIGHT_MAX 256 > + > /** > * netif_napi_add - initialize a NAPI context > * @dev: network device > diff --git a/net/core/dev.c b/net/core/dev.c > index 4086d335978c1bf62bd3965bd2ea96a4ac06b13d..496713fb6075bd8e5e22725e7c817172858e1dd7 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6608,7 +6608,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi, > INIT_LIST_HEAD(&napi->rx_list); > napi->rx_count = 0; > napi->poll = poll; > - if (weight > NAPI_POLL_WEIGHT) > + if (weight > NAPI_POLL_WEIGHT_MAX) > netdev_err_once(dev, "%s() called with weight %d\n", __func__, > weight); > napi->weight = weight; >
On 9/8/20 1:49 PM, Björn Töpel wrote: > On 2020-09-08 11:45, Eric Dumazet wrote: >> >> >> On 9/7/20 5:02 PM, Björn Töpel wrote: >>> From: Björn Töpel <bjorn.topel@intel.com> >>> >>> Start using XSK_NAPI_WEIGHT as NAPI poll budget for the AF_XDP Rx >>> zero-copy path. >>> >>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> >>> --- >>> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >>> index 3771857cf887..f32c1ba0d237 100644 >>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >>> @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, >>> bool failure = false; >>> struct sk_buff *skb; >>> - while (likely(total_rx_packets < budget)) { >>> + while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) { >>> union ixgbe_adv_rx_desc *rx_desc; >>> struct ixgbe_rx_buffer *bi; >>> unsigned int size >> >> This is a violation of NAPI API. IXGBE is already diverging a bit from best practices. >> > > Thanks for having a look, Eric! By diverging from best practices, do > you mean that multiple queues share one NAPI context, and the budget > is split over the queues (say, 4 queues, 64/4 per queue), or that Tx > simply ignores the budget? Or both? For instance, having ixgbe_poll() doing : return min(work_done, budget - 1); is quite interesting. It could hide bugs like : I got a budget of 4, and processed 9999 packets because my maths have a bug, but make sure to pretend we processed 3 packets... > >> There are reasons we want to control the budget from callers, >> if you want bigger budget just increase it instead of using your own ? >> >> I would rather use a generic patch. >> > > Hmm, so a configurable NAPI budget for, say, the AF_XDP enabled > queues/NAPIs? Am I reading that correct? (Note that this is *only* for > the AF_XDP enabled queues.) > > I'll try to rework this to something more palatable. > > > Thanks, > Björn > >
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 3771857cf887..f32c1ba0d237 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -239,7 +239,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, bool failure = false; struct sk_buff *skb; - while (likely(total_rx_packets < budget)) { + while (likely(total_rx_packets < XSK_NAPI_WEIGHT)) { union ixgbe_adv_rx_desc *rx_desc; struct ixgbe_rx_buffer *bi; unsigned int size;