Message ID | 20150504231448.1538.84164.stgit@ahduyck-vm-fedora22 |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2015-05-04 at 16:14 -0700, Alexander Duyck wrote: > This change adds a function called skb_free_frag which is meant to > compliment the function __alloc_page_frag. The general idea is to enable a > more lightweight version of page freeing since we don't actually need all > the overhead of a put_page, and we don't quite fit the model of __free_pages. Could you describe what are the things that put_page() handle that we don't need for skb frags ? It looks the change could benefit to other users (outside of networking) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/04/2015 05:16 PM, Eric Dumazet wrote: > On Mon, 2015-05-04 at 16:14 -0700, Alexander Duyck wrote: >> This change adds a function called skb_free_frag which is meant to >> compliment the function __alloc_page_frag. The general idea is to enable a >> more lightweight version of page freeing since we don't actually need all >> the overhead of a put_page, and we don't quite fit the model of __free_pages. > > Could you describe what are the things that put_page() handle that we > don't need for skb frags ? A large part of it is just all the extra code flow with each level having to retest flags. So if you follow the calls for put_page w/ a page frag you end up with something like: skb_free_head - virt_to_head_page put_page() - (Head | Tail) put_compound_page() - Tail, Head, _count __put_compound_page() - compound_dtor __page_cache_release() - LRU, mem_cgroup free_compound_page() - Head, compound_order __free_pages_ok() If I free the same page frag in skb_frag_free the path ends up looking like: skb_free_head - inlined by compiler skb_free_frag - virt_to_head_page, count, head, order __free_pages_ok() > It looks the change could benefit to other users (outside of networking) It could, but there are also other mechanisms already in place for freeing pages. For example in the Intel drivers I had gotten into the habit of using __free_pages for Rx pages since it does exactly the same thing but you need to know the order of the pages you are freeing before you can use it. In the case of head frags we don't really know that since it can be a page fragment given to us by any of the drivers using alloc_pages. The motivation behind creating a centralized function was to take care of the virt_to_head_page and compound_order portions of this in one centralized spot. It avoids bloating things as I'm able to get away with little tricks like combining the compound_order Head flag check with the one to determine if I call the compound freeing function or the order 0 one. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 04 May 2015 16:14:48 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote: > +/** > + * skb_free_frag - free a page fragment > + * @head: virtual address of page fragment > + * > + * Frees a page fragment allocated out of either a compound or order 0 page. > + * The function itself is a hybrid between free_pages and free_compound_page > + * which can be found in mm/page_alloc.c > + */ > +void skb_free_frag(void *head) > +{ > + struct page *page = virt_to_head_page(head); > + > + if (unlikely(put_page_testzero(page))) { > + if (likely(PageHead(page))) > + __free_pages_ok(page, compound_order(page)); > + else > + free_hot_cold_page(page, false); > + } > +} Why are we testing for PageHead in here? If the code were to simply do if (unlikely(put_page_testzero(page))) __free_pages_ok(page, compound_order(page)); that would still work? There's nothing networking-specific in here. I suggest the function be renamed and moved to page_alloc.c. Add an inlined skb_free_frag() in a net header which calls it. This way the mm developers know about it and will hopefully maintain it. It would need a comment explaining when and why people should and shouldn't use it. The term "page fragment" is a net thing and isn't something we know about. What is it? From context I'm thinking a definition would look something like An arbitrary-length arbitrary-offset area of memory which resides within a 0 or higher order page. Multiple fragments within that page are individually refcounted, in the page's reference counter. Is that correct and complete? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/06/2015 12:38 PM, Andrew Morton wrote: > On Mon, 04 May 2015 16:14:48 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote: > >> +/** >> + * skb_free_frag - free a page fragment >> + * @head: virtual address of page fragment >> + * >> + * Frees a page fragment allocated out of either a compound or order 0 page. >> + * The function itself is a hybrid between free_pages and free_compound_page >> + * which can be found in mm/page_alloc.c >> + */ >> +void skb_free_frag(void *head) >> +{ >> + struct page *page = virt_to_head_page(head); >> + >> + if (unlikely(put_page_testzero(page))) { >> + if (likely(PageHead(page))) >> + __free_pages_ok(page, compound_order(page)); >> + else >> + free_hot_cold_page(page, false); >> + } >> +} > Why are we testing for PageHead in here? If the code were to simply do > > if (unlikely(put_page_testzero(page))) > __free_pages_ok(page, compound_order(page)); > > that would still work? My assumption was that there was a performance difference between __free_pages_ok and free_hot_cold_page for order 0 pages. From what I can tell free_hot_cold_page will do bulk cleanup via free_pcppages_bulk while __free_pages_ok just calls free_one_page. > There's nothing networking-specific in here. I suggest the function be > renamed and moved to page_alloc.c. Add an inlined skb_free_frag() in a > net header which calls it. This way the mm developers know about it > and will hopefully maintain it. It would need a comment explaining > when and why people should and shouldn't use it. That's true. While I am at it I should probably pull the allocation out as well just so it is all in one location. > The term "page fragment" is a net thing and isn't something we know > about. What is it? From context I'm thinking a definition would look > something like > > An arbitrary-length arbitrary-offset area of memory which resides > within a 0 or higher order page. Multiple fragments within that page > are individually refcounted, in the page's reference counter. > > Is that correct and complete? Yeah that is pretty complete. I've added Eric who originally authored this to the Cc in case there is something he wants to add. I'll see about updating this and will likely have a v2 ready in the next couple of hours. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 06 May 2015 13:27:43 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote: > >> +void skb_free_frag(void *head) > >> +{ > >> + struct page *page = virt_to_head_page(head); > >> + > >> + if (unlikely(put_page_testzero(page))) { > >> + if (likely(PageHead(page))) > >> + __free_pages_ok(page, compound_order(page)); > >> + else > >> + free_hot_cold_page(page, false); > >> + } > >> +} > > Why are we testing for PageHead in here? If the code were to simply do > > > > if (unlikely(put_page_testzero(page))) > > __free_pages_ok(page, compound_order(page)); > > > > that would still work? > > My assumption was that there was a performance difference between > __free_pages_ok and free_hot_cold_page for order 0 pages. From what I > can tell free_hot_cold_page will do bulk cleanup via free_pcppages_bulk > while __free_pages_ok just calls free_one_page. Could be. Plus there's hopefully some performance advantage if the page is genuinely cache-hot. I don't think that anyone has verified the benefits of the hot/cold optimisation in the last decade or two, and it was always pretty marginal.. Is the PageHead thing really "likely"? We're usually dealing with order>0 pages here? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/06/2015 01:41 PM, Andrew Morton wrote: > On Wed, 06 May 2015 13:27:43 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote: > >>>> +void skb_free_frag(void *head) >>>> +{ >>>> + struct page *page = virt_to_head_page(head); >>>> + >>>> + if (unlikely(put_page_testzero(page))) { >>>> + if (likely(PageHead(page))) >>>> + __free_pages_ok(page, compound_order(page)); >>>> + else >>>> + free_hot_cold_page(page, false); >>>> + } >>>> +} >>> Why are we testing for PageHead in here? If the code were to simply do >>> >>> if (unlikely(put_page_testzero(page))) >>> __free_pages_ok(page, compound_order(page)); >>> >>> that would still work? >> My assumption was that there was a performance difference between >> __free_pages_ok and free_hot_cold_page for order 0 pages. From what I >> can tell free_hot_cold_page will do bulk cleanup via free_pcppages_bulk >> while __free_pages_ok just calls free_one_page. > Could be. Plus there's hopefully some performance advantage if the > page is genuinely cache-hot. I don't think that anyone has verified > the benefits of the hot/cold optimisation in the last decade or two, > and it was always pretty marginal.. Either way it doesn't make much difference. If you would prefer I can probably just call __free_pages_ok for all cases. > Is the PageHead thing really "likely"? We're usually dealing with > order>0 pages here? On any system that only supports 4K pages the default is to allocate an order 3 page (32K) and then pull the fragments out of that. So if __free_pages_ok works for an order 0 page I'll just call it since it shouldn't be a very common occurrence anyway unless we are under memory pressure. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 97a9373e61e8..edd19a06e2ac 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -365,6 +365,7 @@ extern void __free_pages(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); extern void free_hot_cold_page(struct page *page, bool cold); extern void free_hot_cold_page_list(struct list_head *list, bool cold); +extern void __free_pages_ok(struct page *page, unsigned int order); extern void __free_kmem_pages(struct page *page, unsigned int order); extern void free_kmem_pages(unsigned long addr, unsigned int order); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9c2f793573fa..3bfe3340929e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2186,6 +2186,7 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev, return __netdev_alloc_skb_ip_align(dev, length, GFP_ATOMIC); } +void skb_free_frag(void *head); void *napi_alloc_frag(unsigned int fragsz); struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int length, gfp_t gfp_mask); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ebffa0e4a9c0..ab9ba9360730 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -165,8 +165,6 @@ bool pm_suspended_storage(void) int pageblock_order __read_mostly; #endif -static void __free_pages_ok(struct page *page, unsigned int order); - /* * results with 256, 32 in the lowmem_reserve sysctl: * 1G machine -> (16M dma, 800M-16M normal, 1G-800M high) @@ -815,7 +813,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) return true; } -static void __free_pages_ok(struct page *page, unsigned int order) +void __free_pages_ok(struct page *page, unsigned int order) { unsigned long flags; int migratetype; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1e4278a4dd7e..754842557fb0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -428,6 +428,27 @@ refill: return page_address(page) + offset; } +/** + * skb_free_frag - free a page fragment + * @head: virtual address of page fragment + * + * Frees a page fragment allocated out of either a compound or order 0 page. + * The function itself is a hybrid between free_pages and free_compound_page + * which can be found in mm/page_alloc.c + */ +void skb_free_frag(void *head) +{ + struct page *page = virt_to_head_page(head); + + if (unlikely(put_page_testzero(page))) { + if (likely(PageHead(page))) + __free_pages_ok(page, compound_order(page)); + else + free_hot_cold_page(page, false); + } +} +EXPORT_SYMBOL(skb_free_frag); + static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask) { unsigned long flags; @@ -499,7 +520,7 @@ static struct sk_buff *__alloc_rx_skb(unsigned int length, gfp_t gfp_mask, if (likely(data)) { skb = build_skb(data, fragsz); if (unlikely(!skb)) - put_page(virt_to_head_page(data)); + skb_free_frag(data); } } else { skb = __alloc_skb(length, gfp_mask, @@ -611,10 +632,12 @@ static void skb_clone_fraglist(struct sk_buff *skb) static void skb_free_head(struct sk_buff *skb) { + unsigned char *head = skb->head; + if (skb->head_frag) - put_page(virt_to_head_page(skb->head)); + skb_free_frag(head); else - kfree(skb->head); + kfree(head); } static void skb_release_data(struct sk_buff *skb)
This change adds a function called skb_free_frag which is meant to compliment the function __alloc_page_frag. The general idea is to enable a more lightweight version of page freeing since we don't actually need all the overhead of a put_page, and we don't quite fit the model of __free_pages. The model for this is based off of __free_pages since we don't actually need to deal with all of the cases that put_page handles. I incorporated the virt_to_head_page call and compound_order into the function as it actually allows for a signficant size reduction by reducing code duplication. In order to enable the function it was necessary to move __free_pages_ok from being a statically defined function so that I could use it in skbuff.c. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> --- include/linux/gfp.h | 1 + include/linux/skbuff.h | 1 + mm/page_alloc.c | 4 +--- net/core/skbuff.c | 29 ++++++++++++++++++++++++++--- 4 files changed, 29 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html