Message ID | 20200904135332.60259-4-bjorn.topel@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | xsk: exit NAPI loop when AF_XDP Rx ring is full | expand |
On Fri, 4 Sep 2020 15:53:28 +0200 Björn Töpel <bjorn.topel@gmail.com> wrote: > From: Björn Töpel <bjorn.topel@intel.com> > > The xsk_do_redirect_rx_full() helper can be used to check if a failure > of xdp_do_redirect() was due to the AF_XDP socket had a full Rx ring. This is very AF_XDP specific. I think that the cpumap could likely benefit from similar approach? e.g. if the cpumap kthread is scheduled on the same CPU. But for cpumap we only want this behavior if sched on the same CPU as RX-NAPI. This could be "seen" by the cpumap code itself in the case bq_flush_to_queue() drops packets, check if rcpu->cpu equal smp_processor_id(). Maybe I'm taking this too far?
On 2020-09-04 17:11, Jesper Dangaard Brouer wrote: > On Fri, 4 Sep 2020 15:53:28 +0200 Björn Töpel > <bjorn.topel@gmail.com> wrote: > >> From: Björn Töpel <bjorn.topel@intel.com> >> >> The xsk_do_redirect_rx_full() helper can be used to check if a >> failure of xdp_do_redirect() was due to the AF_XDP socket had a >> full Rx ring. > > This is very AF_XDP specific. I think that the cpumap could likely > benefit from similar approach? e.g. if the cpumap kthread is > scheduled on the same CPU. > At least I thought this was *very* AF_XDP specific, since the kernel is dependent of that userland runs. Allocation (source) and Rx ring (sink). Maybe I was wrong! :-) The thing with AF_XDP zero-copy, is that we sort of assume that if a user enabled that most packets will have XDP_REDIRECT to an AF_XDP socket. > But for cpumap we only want this behavior if sched on the same CPU > as RX-NAPI. This could be "seen" by the cpumap code itself in the > case bq_flush_to_queue() drops packets, check if rcpu->cpu equal > smp_processor_id(). Maybe I'm taking this too far? > Interesting. So, if you're running on the same core, and redirect fail for CPUMAP, you'd like to yield the NAPI loop? Is that really OK from a fairness perspective? I mean, with AF_XDP zero-copy we pretty much know that all actions will be redirect to socket. For CPUMAP type of applications, can that assumption be made? Björn
On Fri, 4 Sep 2020 17:39:17 +0200 Björn Töpel <bjorn.topel@intel.com> wrote: > On 2020-09-04 17:11, Jesper Dangaard Brouer wrote: > > On Fri, 4 Sep 2020 15:53:28 +0200 Björn Töpel > > <bjorn.topel@gmail.com> wrote: > > > >> From: Björn Töpel <bjorn.topel@intel.com> > >> > >> The xsk_do_redirect_rx_full() helper can be used to check if a > >> failure of xdp_do_redirect() was due to the AF_XDP socket had a > >> full Rx ring. > > > > This is very AF_XDP specific. I think that the cpumap could likely > > benefit from similar approach? e.g. if the cpumap kthread is > > scheduled on the same CPU. > > > > At least I thought this was *very* AF_XDP specific, since the kernel is > dependent of that userland runs. Allocation (source) and Rx ring (sink). > Maybe I was wrong! :-) > > The thing with AF_XDP zero-copy, is that we sort of assume that if a > user enabled that most packets will have XDP_REDIRECT to an AF_XDP socket. > > > > But for cpumap we only want this behavior if sched on the same CPU > > as RX-NAPI. This could be "seen" by the cpumap code itself in the > > case bq_flush_to_queue() drops packets, check if rcpu->cpu equal > > smp_processor_id(). Maybe I'm taking this too far? > > > > Interesting. So, if you're running on the same core, and redirect fail > for CPUMAP, you'd like to yield the NAPI loop? Is that really OK from a > fairness perspective? I mean, with AF_XDP zero-copy we pretty much know > that all actions will be redirect to socket. For CPUMAP type of > applications, can that assumption be made? Yes, you are right. The RX NAPI loop could be doing something else, and yielding the NAPI loop due to detecting same-CPU is stalling on cpumap delivery might not be correct action. I just tested the same-CPU processing case for cpumap (result below signature), and it doesn't exhibit the bad 'dropping-off-edge' performance slowdown. The cpumap code also already tries to mitigate this, by calling wake_up_process() for every 8 packets (CPU_MAP_BULK_SIZE). I find your patchset very interesting, as I believe we do need some kind of general push-back "flow-control" mechanism for XDP. Maybe I should solve this differently in our XDP-TX-QoS pipe dream ;-)
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 5b1ee8a9976d..34c58b5fbc28 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -116,6 +116,11 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, xp_dma_sync_for_device(pool, dma, size); } +static inline bool xsk_do_redirect_rx_full(int err, enum bpf_map_type map_type) +{ + return err == -ENOBUFS && map_type == BPF_MAP_TYPE_XSKMAP; +} + #else static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) @@ -235,6 +240,10 @@ static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, { } +static inline bool xsk_do_redirect_rx_full(int err, enum bpf_map_type map_type) +{ + return false; +} #endif /* CONFIG_XDP_SOCKETS */ #endif /* _LINUX_XDP_SOCK_DRV_H */