Message ID | 158634663936.707275.3156718045905620430.stgit@firesoul |
---|---|
State | RFC |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [RFC,v2,01/33] xdp: add frame size to xdp_buff | expand |
On Wed, 08 Apr 2020 13:50:39 +0200 Jesper Dangaard Brouer wrote: > XDP have evolved to support several frame sizes, but xdp_buff was not > updated with this information. The frame size (frame_sz) member of > xdp_buff is introduced to know the real size of the memory the frame is > delivered in. > > When introducing this also make it clear that some tailroom is > reserved/required when creating SKBs using build_skb(). > > It would also have been an option to introduce a pointer to > data_hard_end (with reserved offset). The advantage with frame_sz is > that (like rxq) drivers only need to setup/assign this value once per > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to > store frame_sz inside xdp_rxq_info, because it's varies per packet as it > can be based/depend on packet length. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > include/net/xdp.h | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/include/net/xdp.h b/include/net/xdp.h > index 40c6d3398458..99f4374f6214 100644 > --- a/include/net/xdp.h > +++ b/include/net/xdp.h > @@ -6,6 +6,8 @@ > #ifndef __LINUX_NET_XDP_H__ > #define __LINUX_NET_XDP_H__ > > +#include <linux/skbuff.h> /* skb_shared_info */ > + > /** > * DOC: XDP RX-queue information > * > @@ -70,8 +72,23 @@ struct xdp_buff { > void *data_hard_start; > unsigned long handle; > struct xdp_rxq_info *rxq; > + u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/ Perhaps /* length of packet buffer, starting at data_hard_start */ ? > }; > > +/* Reserve memory area at end-of data area. I wouldn't say this reserves anything. It just computes the end pointer, no? > + * > + * This macro reserves tailroom in the XDP buffer by limiting the > + * XDP/BPF data access to data_hard_end. Notice same area (and size) > + * is used for XDP_PASS, when constructing the SKB via build_skb(). > + */ > +#define xdp_data_hard_end(xdp) \ > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) I think it should be said somewhere that the drivers are expected to DMA map memory up to xdp_data_hard_end(xdp). > + > +/* Like skb_shinfo */ > +#define xdp_shinfo(xdp) ((struct skb_shared_info *)(xdp_data_hard_end(xdp))) > +// XXX: Above likely belongs in later patch > + > struct xdp_frame { > void *data; > u16 len; > >
On Wed, 2020-04-08 at 10:53 -0700, Jakub Kicinski wrote: > On Wed, 08 Apr 2020 13:50:39 +0200 Jesper Dangaard Brouer wrote: > > XDP have evolved to support several frame sizes, but xdp_buff was > > not > > updated with this information. The frame size (frame_sz) member of > > xdp_buff is introduced to know the real size of the memory the > > frame is > > delivered in. > > > > When introducing this also make it clear that some tailroom is > > reserved/required when creating SKBs using build_skb(). > > > > It would also have been an option to introduce a pointer to > > data_hard_end (with reserved offset). The advantage with frame_sz > > is > > that (like rxq) drivers only need to setup/assign this value once > > per > > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible > > to > > store frame_sz inside xdp_rxq_info, because it's varies per packet > > as it > > can be based/depend on packet length. > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > > --- > > include/net/xdp.h | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h > > index 40c6d3398458..99f4374f6214 100644 > > --- a/include/net/xdp.h > > +++ b/include/net/xdp.h > > @@ -6,6 +6,8 @@ > > #ifndef __LINUX_NET_XDP_H__ > > #define __LINUX_NET_XDP_H__ > > > > +#include <linux/skbuff.h> /* skb_shared_info */ > > + > > /** > > * DOC: XDP RX-queue information > > * > > @@ -70,8 +72,23 @@ struct xdp_buff { > > void *data_hard_start; > > unsigned long handle; > > struct xdp_rxq_info *rxq; > > + u32 frame_sz; /* frame size to deduct data_hard_end/reserved > > tailroom*/ > > Perhaps > > /* length of packet buffer, starting at data_hard_start */ > > ? > > > }; > > > > +/* Reserve memory area at end-of data area. > > I wouldn't say this reserves anything. It just computes the end > pointer, no? > > > + * > > + * This macro reserves tailroom in the XDP buffer by limiting the > > + * XDP/BPF data access to data_hard_end. Notice same area (and > > size) > > + * is used for XDP_PASS, when constructing the SKB via > > build_skb(). > > + */ > > +#define xdp_data_hard_end(xdp) \ > > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > > I think it should be said somewhere that the drivers are expected to > DMA map memory up to xdp_data_hard_end(xdp). > but this works on a specific xdp buff, drivers work with mtu and what if the driver want to have this as an option per packet .. i.e.: if there is enough tail room, then build_skb, otherwise alloc new skb, copy headers, setup data frags.. etc having such limitations on driver can be very strict, i think the decision must remain dynamic per frame.. of-course drivers should optimize to preserve enough tail room for all rx packets.. > > + > > +/* Like skb_shinfo */ > > +#define xdp_shinfo(xdp) ((struct skb_shared_info > > *)(xdp_data_hard_end(xdp))) > > +// XXX: Above likely belongs in later patch > > + > > struct xdp_frame { > > void *data; > > u16 len; > > > >
On Wed, 2020-04-08 at 13:50 +0200, Jesper Dangaard Brouer wrote: > XDP have evolved to support several frame sizes, but xdp_buff was not > updated with this information. The frame size (frame_sz) member of > xdp_buff is introduced to know the real size of the memory the frame > is > delivered in. > > When introducing this also make it clear that some tailroom is > reserved/required when creating SKBs using build_skb(). > > It would also have been an option to introduce a pointer to > data_hard_end (with reserved offset). The advantage with frame_sz is > that (like rxq) drivers only need to setup/assign this value once per > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible > to > store frame_sz inside xdp_rxq_info, because it's varies per packet as > it > can be based/depend on packet length. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > include/net/xdp.h | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/include/net/xdp.h b/include/net/xdp.h > index 40c6d3398458..99f4374f6214 100644 > --- a/include/net/xdp.h > +++ b/include/net/xdp.h > @@ -6,6 +6,8 @@ > #ifndef __LINUX_NET_XDP_H__ > #define __LINUX_NET_XDP_H__ > > +#include <linux/skbuff.h> /* skb_shared_info */ > + I think it is wrong to make xdp.h depend on skbuff.h we must keep xdp.h minimal and independent, the new macros should be defined in skbuff.h > /** > * DOC: XDP RX-queue information > * > @@ -70,8 +72,23 @@ struct xdp_buff { > void *data_hard_start; > unsigned long handle; > struct xdp_rxq_info *rxq; > + u32 frame_sz; /* frame size to deduct data_hard_end/reserved > tailroom*/ why u32 ? u16 should be more than enough.. > }; > > +/* Reserve memory area at end-of data area. > + * > + * This macro reserves tailroom in the XDP buffer by limiting the > + * XDP/BPF data access to data_hard_end. Notice same area (and > size) > + * is used for XDP_PASS, when constructing the SKB via build_skb(). > + */ > +#define xdp_data_hard_end(xdp) \ > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > + this macro is not safe when unary operators are being used > +/* Like skb_shinfo */ > +#define xdp_shinfo(xdp) ((struct skb_shared_info > *)(xdp_data_hard_end(xdp))) > +// XXX: Above likely belongs in later patch > + > struct xdp_frame { > void *data; > u16 len; > >
On Thu, 9 Apr 2020 00:48:30 +0000 Saeed Mahameed wrote: > > > + * This macro reserves tailroom in the XDP buffer by limiting the > > > + * XDP/BPF data access to data_hard_end. Notice same area (and > > > size) > > > + * is used for XDP_PASS, when constructing the SKB via > > > build_skb(). > > > + */ > > > +#define xdp_data_hard_end(xdp) \ > > > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > > > > I think it should be said somewhere that the drivers are expected to > > DMA map memory up to xdp_data_hard_end(xdp). > > > > but this works on a specific xdp buff, drivers work with mtu > > and what if the driver want to have this as an option per packet .. > i.e.: if there is enough tail room, then build_skb, otherwise > alloc new skb, copy headers, setup data frags.. etc > > having such limitations on driver can be very strict, i think the > decision must remain dynamic per frame.. > > of-course drivers should optimize to preserve enough tail room for all > rx packets.. My concern is that driver may allocate a full page for each frame but only DMA map the amount that can reasonably contain data given the MTU. To save on DMA syncs. Today that wouldn't be a problem, because XDP_REDIRECT will re-map the page, and XDP_TX has the same MTU. In this set xdp_data_hard_end is used both to find the end of memory buffer, and end of DMA buffer. Implementation of bpf_xdp_adjust_tail() assumes anything < SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) from the end is fair game. So I was trying to say that we should warn driver authors that the DMA buffer can now grow / move beyond what the driver may expect in XDP_TX. Drivers can either DMA map enough memory, or handle the corner case in a special way. IDK if that makes sense, we may be talking past each other :)
On Wed, 2020-04-08 at 18:13 -0700, Jakub Kicinski wrote: > On Thu, 9 Apr 2020 00:48:30 +0000 Saeed Mahameed wrote: > > > > + * This macro reserves tailroom in the XDP buffer by limiting > > > > the > > > > + * XDP/BPF data access to data_hard_end. Notice same area > > > > (and > > > > size) > > > > + * is used for XDP_PASS, when constructing the SKB via > > > > build_skb(). > > > > + */ > > > > +#define xdp_data_hard_end(xdp) \ > > > > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > > > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > > > > > > I think it should be said somewhere that the drivers are expected > > > to > > > DMA map memory up to xdp_data_hard_end(xdp). > > > > > > > but this works on a specific xdp buff, drivers work with mtu > > > > and what if the driver want to have this as an option per packet > > .. > > i.e.: if there is enough tail room, then build_skb, otherwise > > alloc new skb, copy headers, setup data frags.. etc > > > > having such limitations on driver can be very strict, i think the > > decision must remain dynamic per frame.. > > > > of-course drivers should optimize to preserve enough tail room for > > all > > rx packets.. > > My concern is that driver may allocate a full page for each frame but > only DMA map the amount that can reasonably contain data given the > MTU. > To save on DMA syncs. > > Today that wouldn't be a problem, because XDP_REDIRECT will re-map > the > page, and XDP_TX has the same MTU. > I am not worried about dma at all, i am worried about the xdp progs which are now allowed to extend packets beyond the mtu and do XDP_TX. but as i am thinking about this i just realized that this can already happen with xdp_adjust_head().. but as you stated above this puts alot of assumptions on how driver should dma rx buffs > In this set xdp_data_hard_end is used both to find the end of memory > buffer, and end of DMA buffer. Implementation of > bpf_xdp_adjust_tail() > assumes anything < SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > from > the end is fair game. > but why skb_shared_info in particular though ? this assumes someone needs this tail for building skbs .. looks weird to me. > So I was trying to say that we should warn driver authors that the > DMA > buffer can now grow / move beyond what the driver may expect in > XDP_TX. Ack, but can we do it by desing ? i.e instead of having hardcoded limits (e.g. SKB_DATA_ALIGN(shinfo)) in bpf_xdp_adjust_tail(), let the driver provide this, or any other restrictions, e.g mtu for tx, or driver specific memory model restrictions .. > Drivers can either DMA map enough memory, or handle the corner case > in > a special way. > > IDK if that makes sense, we may be talking past each other :)
On Thu, 9 Apr 2020 23:07:42 +0000 Saeed Mahameed wrote: > > My concern is that driver may allocate a full page for each frame but > > only DMA map the amount that can reasonably contain data given the > > MTU. > > To save on DMA syncs. > > > > Today that wouldn't be a problem, because XDP_REDIRECT will re-map > > the > > page, and XDP_TX has the same MTU. > > I am not worried about dma at all, i am worried about the xdp progs > which are now allowed to extend packets beyond the mtu and do XDP_TX. > but as i am thinking about this i just realized that this can already > happen with xdp_adjust_head().. > > but as you stated above this puts alot of assumptions on how driver > should dma rx buffs > > > In this set xdp_data_hard_end is used both to find the end of memory > > buffer, and end of DMA buffer. Implementation of > > bpf_xdp_adjust_tail() > > assumes anything < SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > > from > > the end is fair game. > > but why skb_shared_info in particular though ? this assumes someone > needs this tail for building skbs .. looks weird to me. Fair, simplifies the internals, I guess. > > So I was trying to say that we should warn driver authors that the > > DMA > > buffer can now grow / move beyond what the driver may expect in > > XDP_TX. > > Ack, but can we do it by desing ? i.e instead of having hardcoded > limits (e.g. SKB_DATA_ALIGN(shinfo)) in bpf_xdp_adjust_tail(), let the > driver provide this, or any other restrictions, e.g mtu for tx, or > driver specific memory model restrictions .. Right, actually for NFP we need to add the check already - looking at the code - the DMA mapping does not cover anything beyond the headroom + MTU. > > Drivers can either DMA map enough memory, or handle the corner case > > in > > a special way. > > > > IDK if that makes sense, we may be talking past each other :)
On Wed, 8 Apr 2020 10:53:39 -0700 Jakub Kicinski <kuba@kernel.org> wrote: > > + * This macro reserves tailroom in the XDP buffer by limiting the > > + * XDP/BPF data access to data_hard_end. Notice same area (and size) > > + * is used for XDP_PASS, when constructing the SKB via build_skb(). > > + */ > > +#define xdp_data_hard_end(xdp) \ > > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > > I think it should be said somewhere that the drivers are expected to > DMA map memory up to xdp_data_hard_end(xdp). No, I don't want driver to DMA map memory up to xdp_data_hard_end(xdp). The driver will/should map up-to the configured MTU. Reading ahead, I can see that you worry about XDP_TX, that doesn't do the MTU check in xdp_ok_fwd_dev() like we do for XDP_REDIRECT. Guess, we need to check that before doing XDP_TX, such that we don't DMA sync for_device, for an area that does not included the original DMA-map area. I wonder if that is a violation, if so, it is also problematic for adjust *head*.
On Thu, 9 Apr 2020 00:50:02 +0000 Saeed Mahameed <saeedm@mellanox.com> wrote: > On Wed, 2020-04-08 at 13:50 +0200, Jesper Dangaard Brouer wrote: > > XDP have evolved to support several frame sizes, but xdp_buff was not > > updated with this information. The frame size (frame_sz) member of > > xdp_buff is introduced to know the real size of the memory the frame > > is > > delivered in. > > > > When introducing this also make it clear that some tailroom is > > reserved/required when creating SKBs using build_skb(). > > > > It would also have been an option to introduce a pointer to > > data_hard_end (with reserved offset). The advantage with frame_sz is > > that (like rxq) drivers only need to setup/assign this value once per > > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible > > to > > store frame_sz inside xdp_rxq_info, because it's varies per packet as > > it > > can be based/depend on packet length. > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > > --- > > include/net/xdp.h | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h > > index 40c6d3398458..99f4374f6214 100644 > > --- a/include/net/xdp.h > > +++ b/include/net/xdp.h > > @@ -6,6 +6,8 @@ > > #ifndef __LINUX_NET_XDP_H__ > > #define __LINUX_NET_XDP_H__ > > > > +#include <linux/skbuff.h> /* skb_shared_info */ > > + > > I think it is wrong to make xdp.h depend on skbuff.h > we must keep xdp.h minimal and independent, I agree, that it seems strange to have xdp.h include skbuff.h, and I'm not happy with that approach myself, but the alternatives all looked kind of ugly. > the new macros should be defined in skbuff.h Moving #define xdp_data_hard_end(xdp) into skbuff.h also seems strange. > > /** > > * DOC: XDP RX-queue information > > * > > @@ -70,8 +72,23 @@ struct xdp_buff { > > void *data_hard_start; > > unsigned long handle; > > struct xdp_rxq_info *rxq; > > + u32 frame_sz; /* frame size to deduct data_hard_end/reserved > > tailroom*/ > > why u32 ? u16 should be more than enough.. Nope. It need to be able to store PAGE_SIZE == 65536. $ echo $((1<<12)) 4096 $ echo $((1<<16)) 65536 $ printf "0x%X\n" 65536 0x10000 > > }; > > > > +/* Reserve memory area at end-of data area. > > + * > > + * This macro reserves tailroom in the XDP buffer by limiting the > > + * XDP/BPF data access to data_hard_end. Notice same area (and size) > > + * is used for XDP_PASS, when constructing the SKB via build_skb(). > > + */ > > +#define xdp_data_hard_end(xdp) \ > > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > > + > > this macro is not safe when unary operators are being used The parentheses round (xdp) does make xdp_data_hard_end(&xdp) work correctly. What other cases are you worried about?
On Thu, 2020-04-16 at 15:02 +0200, Jesper Dangaard Brouer wrote: > On Thu, 9 Apr 2020 00:50:02 +0000 > Saeed Mahameed <saeedm@mellanox.com> wrote: > > > On Wed, 2020-04-08 at 13:50 +0200, Jesper Dangaard Brouer wrote: > > > XDP have evolved to support several frame sizes, but xdp_buff was > > > not > > > updated with this information. The frame size (frame_sz) member > > > of > > > xdp_buff is introduced to know the real size of the memory the > > > frame > > > is > > > delivered in. > > > > > > When introducing this also make it clear that some tailroom is > > > reserved/required when creating SKBs using build_skb(). > > > > > > It would also have been an option to introduce a pointer to > > > data_hard_end (with reserved offset). The advantage with frame_sz > > > is > > > that (like rxq) drivers only need to setup/assign this value once > > > per > > > NAPI cycle. Due to XDP-generic (and some drivers) it's not > > > possible > > > to > > > store frame_sz inside xdp_rxq_info, because it's varies per > > > packet as > > > it > > > can be based/depend on packet length. > > > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > > > --- > > > include/net/xdp.h | 17 +++++++++++++++++ > > > 1 file changed, 17 insertions(+) > > > > > > diff --git a/include/net/xdp.h b/include/net/xdp.h > > > index 40c6d3398458..99f4374f6214 100644 > > > --- a/include/net/xdp.h > > > +++ b/include/net/xdp.h > > > @@ -6,6 +6,8 @@ > > > #ifndef __LINUX_NET_XDP_H__ > > > #define __LINUX_NET_XDP_H__ > > > > > > +#include <linux/skbuff.h> /* skb_shared_info */ > > > + > > > > I think it is wrong to make xdp.h depend on skbuff.h > > we must keep xdp.h minimal and independent, > > I agree, that it seems strange to have xdp.h include skbuff.h, and > I'm > not happy with that approach myself, but the alternatives all looked > kind of ugly. > > > the new macros should be defined in skbuff.h > > Moving #define xdp_data_hard_end(xdp) into skbuff.h also seems > strange. > So maybe we shouldn't have any dependencies by design, and let the drivers decide how much tailroom they want to preserve, and remove the hardcoded sizeof(skb_shinfo).. maybe per rxq ? on memory model registration ? > > > > /** > > > * DOC: XDP RX-queue information > > > * > > > @@ -70,8 +72,23 @@ struct xdp_buff { > > > void *data_hard_start; > > > unsigned long handle; > > > struct xdp_rxq_info *rxq; > > > + u32 frame_sz; /* frame size to deduct data_hard_end/reserved > > > tailroom*/ > > > > why u32 ? u16 should be more than enough.. > > Nope. It need to be able to store PAGE_SIZE == 65536. > > $ echo $((1<<12)) > 4096 > $ echo $((1<<16)) > 65536 > > $ printf "0x%X\n" 65536 > 0x10000 > :( > > > > }; > > > > > > +/* Reserve memory area at end-of data area. > > > + * > > > + * This macro reserves tailroom in the XDP buffer by limiting > > > the > > > + * XDP/BPF data access to data_hard_end. Notice same area (and > > > size) > > > + * is used for XDP_PASS, when constructing the SKB via > > > build_skb(). > > > + */ > > > +#define xdp_data_hard_end(xdp) \ > > > + ((xdp)->data_hard_start + (xdp)->frame_sz - \ > > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > > > + > > > > this macro is not safe when unary operators are being used > > The parentheses round (xdp) does make xdp_data_hard_end(&xdp) work > correctly. What other cases are you worried about? > > consider: xdp_data_hard_end(xdp_ptr++)
diff --git a/include/net/xdp.h b/include/net/xdp.h index 40c6d3398458..99f4374f6214 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -6,6 +6,8 @@ #ifndef __LINUX_NET_XDP_H__ #define __LINUX_NET_XDP_H__ +#include <linux/skbuff.h> /* skb_shared_info */ + /** * DOC: XDP RX-queue information * @@ -70,8 +72,23 @@ struct xdp_buff { void *data_hard_start; unsigned long handle; struct xdp_rxq_info *rxq; + u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/ }; +/* Reserve memory area at end-of data area. + * + * This macro reserves tailroom in the XDP buffer by limiting the + * XDP/BPF data access to data_hard_end. Notice same area (and size) + * is used for XDP_PASS, when constructing the SKB via build_skb(). + */ +#define xdp_data_hard_end(xdp) \ + ((xdp)->data_hard_start + (xdp)->frame_sz - \ + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + +/* Like skb_shinfo */ +#define xdp_shinfo(xdp) ((struct skb_shared_info *)(xdp_data_hard_end(xdp))) +// XXX: Above likely belongs in later patch + struct xdp_frame { void *data; u16 len;
XDP have evolved to support several frame sizes, but xdp_buff was not updated with this information. The frame size (frame_sz) member of xdp_buff is introduced to know the real size of the memory the frame is delivered in. When introducing this also make it clear that some tailroom is reserved/required when creating SKBs using build_skb(). It would also have been an option to introduce a pointer to data_hard_end (with reserved offset). The advantage with frame_sz is that (like rxq) drivers only need to setup/assign this value once per NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to store frame_sz inside xdp_rxq_info, because it's varies per packet as it can be based/depend on packet length. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- include/net/xdp.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)