Message ID | 1324550017.7877.77.camel@zakaz.uk.xensource.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Ian Campbell <Ian.Campbell@citrix.com> Date: Thu, 22 Dec 2011 10:33:36 +0000 > On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Wed, 21 Dec 2011 15:02:18 +0100 >> >> > No idea on this +2 point. >> >> I think I know, and I believe I instructed Alexey Kuznetsov to do >> this. >> >> When sendfile() is performed, we might start the SKB with the last few >> bytes of one page, and end the SKB with the first few bytes of another >> page. >> >> In order to fit a full 64K frame into an SKB in this situation we have >> to accomodate this case. > > Thanks David, that makes sense. > > However I think you only actually need 1 extra page for that. If the > data in frag[0] starts at $offset then frag[16] will need to have > $offset bytes in it. e.g. > 4096-$offset + 4096*15 + $offset = 65536 > which == 17 pages rather than 18. > > The following documents the status quo but I could update to switch to + > 1 instead if there are no flaws in the above logic... Indeed, you're right. Please change this to 1 and document it, and we can put that change into net-next, thanks a lot! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>: > On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Wed, 21 Dec 2011 15:02:18 +0100 >> > No idea on this +2 point. >> I think I know, and I believe I instructed Alexey Kuznetsov to do >> this. >> >> When sendfile() is performed, we might start the SKB with the last few >> bytes of one page, and end the SKB with the first few bytes of another >> page. >> >> In order to fit a full 64K frame into an SKB in this situation we have >> to accomodate this case. > Thanks David, that makes sense. > > However I think you only actually need 1 extra page for that. If the > data in frag[0] starts at $offset then frag[16] will need to have > $offset bytes in it. e.g. > 4096-$offset + 4096*15 + $offset = 65536 > which == 17 pages rather than 18. > > The following documents the status quo but I could update to switch to + > 1 instead if there are no flaws in the above logic... Since max IP datagram is 64K-1, adding ethernet and possibly VLAN headers makes the max size slightly above 64K and then you have 64K/PAGE_SIZE+2 pages appear in worst case. Best Regards, Michał Mirosław -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Michał Mirosław <mirqus@gmail.com> Date: Thu, 22 Dec 2011 19:34:21 +0100 > 2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>: >> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote: >>> From: Eric Dumazet <eric.dumazet@gmail.com> >>> Date: Wed, 21 Dec 2011 15:02:18 +0100 >>> > No idea on this +2 point. >>> I think I know, and I believe I instructed Alexey Kuznetsov to do >>> this. >>> >>> When sendfile() is performed, we might start the SKB with the last few >>> bytes of one page, and end the SKB with the first few bytes of another >>> page. >>> >>> In order to fit a full 64K frame into an SKB in this situation we have >>> to accomodate this case. >> Thanks David, that makes sense. >> >> However I think you only actually need 1 extra page for that. If the >> data in frag[0] starts at $offset then frag[16] will need to have >> $offset bytes in it. e.g. >> 4096-$offset + 4096*15 + $offset = 65536 >> which == 17 pages rather than 18. >> >> The following documents the status quo but I could update to switch to + >> 1 instead if there are no flaws in the above logic... > > Since max IP datagram is 64K-1, adding ethernet and possibly VLAN > headers makes the max size slightly above 64K and then you have > 64K/PAGE_SIZE+2 pages appear in worst case. Headers go into the linear area, so they are not relevant for these calculations.
2011/12/22 David Miller <davem@davemloft.net>: > From: Michał Mirosław <mirqus@gmail.com> > Date: Thu, 22 Dec 2011 19:34:21 +0100 > >> 2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>: >>> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote: >>>> From: Eric Dumazet <eric.dumazet@gmail.com> >>>> Date: Wed, 21 Dec 2011 15:02:18 +0100 >>>> > No idea on this +2 point. >>>> I think I know, and I believe I instructed Alexey Kuznetsov to do >>>> this. >>>> >>>> When sendfile() is performed, we might start the SKB with the last few >>>> bytes of one page, and end the SKB with the first few bytes of another >>>> page. >>>> >>>> In order to fit a full 64K frame into an SKB in this situation we have >>>> to accomodate this case. >>> Thanks David, that makes sense. >>> >>> However I think you only actually need 1 extra page for that. If the >>> data in frag[0] starts at $offset then frag[16] will need to have >>> $offset bytes in it. e.g. >>> 4096-$offset + 4096*15 + $offset = 65536 >>> which == 17 pages rather than 18. >>> >>> The following documents the status quo but I could update to switch to + >>> 1 instead if there are no flaws in the above logic... >> >> Since max IP datagram is 64K-1, adding ethernet and possibly VLAN >> headers makes the max size slightly above 64K and then you have >> 64K/PAGE_SIZE+2 pages appear in worst case. > > Headers go into the linear area, so they are not relevant for these > calculations. Does this hold for LRO'ed packets and packets sent via PF_PACKET socket? Best Regards, Michał Mirosław -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/12/22 David Miller <davem@davemloft.net>: > From: Michał Mirosław <mirqus@gmail.com> > Date: Thu, 22 Dec 2011 20:29:30 +0100 > >> 2011/12/22 David Miller <davem@davemloft.net>: >>> From: Michał Mirosław <mirqus@gmail.com> >>> Date: Thu, 22 Dec 2011 19:34:21 +0100 >>> >>>> 2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>: >>>>> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote: >>>>>> From: Eric Dumazet <eric.dumazet@gmail.com> >>>>>> Date: Wed, 21 Dec 2011 15:02:18 +0100 >>>>>> > No idea on this +2 point. >>>>>> I think I know, and I believe I instructed Alexey Kuznetsov to do >>>>>> this. >>>>>> >>>>>> When sendfile() is performed, we might start the SKB with the last few >>>>>> bytes of one page, and end the SKB with the first few bytes of another >>>>>> page. >>>>>> >>>>>> In order to fit a full 64K frame into an SKB in this situation we have >>>>>> to accomodate this case. >>>>> Thanks David, that makes sense. >>>>> >>>>> However I think you only actually need 1 extra page for that. If the >>>>> data in frag[0] starts at $offset then frag[16] will need to have >>>>> $offset bytes in it. e.g. >>>>> 4096-$offset + 4096*15 + $offset = 65536 >>>>> which == 17 pages rather than 18. >>>>> >>>>> The following documents the status quo but I could update to switch to + >>>>> 1 instead if there are no flaws in the above logic... >>>> >>>> Since max IP datagram is 64K-1, adding ethernet and possibly VLAN >>>> headers makes the max size slightly above 64K and then you have >>>> 64K/PAGE_SIZE+2 pages appear in worst case. >>> >>> Headers go into the linear area, so they are not relevant for these >>> calculations. >> >> Does this hold for LRO'ed packets and packets sent via PF_PACKET socket? > > LRO is for receive, not packets we build on transmit, and in any event > have their headers also pulled into the linear area before the stack > sees it. Yes, but the drivers seem to first fill the data to frags[] and only then move the packets header (I browsed myri10ge). > For PF_PACKET, in the packet_snd() case it uses a linear SKB and in > the tpacket_snd() case it uses a linear SKB as well. I saw skb_fill_page_desc() in af_packet.c (called by tpacket_fill_skb() which is called by tpacket_snd()), that's why I'm asking. Best Regards, Michał Mirosław -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index fe86488..1c894a8 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -128,8 +128,12 @@ struct sk_buff_head { struct sk_buff; -/* To allow 64K frame to be packed as single skb without frag_list. Since - * GRO uses frags we allocate at least 16 regardless of page size. +/* To allow 64K frame to be packed as single skb without frag_list we + * require 64K/PAGE_SIZE pages plus 2 additional pages to allow for + * buffers which do not start on a page boundary. + * + * Since GRO uses frags we allocate at least 16 regardless of page + * size. */ #if (65536/PAGE_SIZE + 2) < 16 #define MAX_SKB_FRAGS 16UL