diff mbox

[net-next,1/6] net: Add skb_free_frag to replace use of put_page in freeing skb->head

Message ID 20150504231448.1538.84164.stgit@ahduyck-vm-fedora22
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Alexander Duyck May 4, 2015, 11:14 p.m. UTC
This change adds a function called skb_free_frag which is meant to
compliment the function __alloc_page_frag.  The general idea is to enable a
more lightweight version of page freeing since we don't actually need all
the overhead of a put_page, and we don't quite fit the model of __free_pages.

The model for this is based off of __free_pages since we don't actually
need to deal with all of the cases that put_page handles.  I incorporated
the virt_to_head_page call and compound_order into the function as it
actually allows for a signficant size reduction by reducing code
duplication.

In order to enable the function it was necessary to move __free_pages_ok
from being a statically defined function so that I could use it in
skbuff.c.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/gfp.h    |    1 +
 include/linux/skbuff.h |    1 +
 mm/page_alloc.c        |    4 +---
 net/core/skbuff.c      |   29 ++++++++++++++++++++++++++---
 4 files changed, 29 insertions(+), 6 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet May 5, 2015, 12:16 a.m. UTC | #1
On Mon, 2015-05-04 at 16:14 -0700, Alexander Duyck wrote:
> This change adds a function called skb_free_frag which is meant to
> compliment the function __alloc_page_frag.  The general idea is to enable a
> more lightweight version of page freeing since we don't actually need all
> the overhead of a put_page, and we don't quite fit the model of __free_pages.

Could you describe what are the things that put_page() handle that we
don't need for skb frags ?

It looks the change could benefit to other users (outside of networking)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Duyck May 5, 2015, 2:49 a.m. UTC | #2
On 05/04/2015 05:16 PM, Eric Dumazet wrote:
> On Mon, 2015-05-04 at 16:14 -0700, Alexander Duyck wrote:
>> This change adds a function called skb_free_frag which is meant to
>> compliment the function __alloc_page_frag.  The general idea is to enable a
>> more lightweight version of page freeing since we don't actually need all
>> the overhead of a put_page, and we don't quite fit the model of __free_pages.
>
> Could you describe what are the things that put_page() handle that we
> don't need for skb frags ?

A large part of it is just all the extra code flow with each level 
having to retest flags.  So if you follow the calls for put_page w/ a 
page frag you end up with something like:
skb_free_head - virt_to_head_page
   put_page() - (Head | Tail)
     put_compound_page() - Tail, Head, _count
       __put_compound_page() - compound_dtor
         __page_cache_release() - LRU, mem_cgroup
           free_compound_page() - Head, compound_order
             __free_pages_ok()

If I free the same page frag in skb_frag_free the path ends up looking like:
skb_free_head - inlined by compiler
   skb_free_frag - virt_to_head_page, count, head, order
     __free_pages_ok()

> It looks the change could benefit to other users (outside of networking)

It could, but there are also other mechanisms already in place for 
freeing pages.  For example in the Intel drivers I had gotten into the 
habit of using __free_pages for Rx pages since it does exactly the same 
thing but you need to know the order of the pages you are freeing before 
you can use it.  In the case of head frags we don't really know that 
since it can be a page fragment given to us by any of the drivers using 
alloc_pages.

The motivation behind creating a centralized function was to take care 
of the virt_to_head_page and compound_order portions of this in one 
centralized spot.  It avoids bloating things as I'm able to get away 
with little tricks like combining the compound_order Head flag check 
with the one to determine if I call the compound freeing function or the 
order 0 one.

- Alex

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton May 6, 2015, 7:38 p.m. UTC | #3
On Mon, 04 May 2015 16:14:48 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote:

> +/**
> + * skb_free_frag - free a page fragment
> + * @head: virtual address of page fragment
> + *
> + * Frees a page fragment allocated out of either a compound or order 0 page.
> + * The function itself is a hybrid between free_pages and free_compound_page
> + * which can be found in mm/page_alloc.c
> + */
> +void skb_free_frag(void *head)
> +{
> +	struct page *page = virt_to_head_page(head);
> +
> +	if (unlikely(put_page_testzero(page))) {
> +		if (likely(PageHead(page)))
> +			__free_pages_ok(page, compound_order(page));
> +		else
> +			free_hot_cold_page(page, false);
> +	}
> +}

Why are we testing for PageHead in here?  If the code were to simply do

	if (unlikely(put_page_testzero(page)))
		__free_pages_ok(page, compound_order(page));

that would still work?


There's nothing networking-specific in here.  I suggest the function be
renamed and moved to page_alloc.c.  Add an inlined skb_free_frag() in a
net header which calls it.  This way the mm developers know about it
and will hopefully maintain it.  It would need a comment explaining
when and why people should and shouldn't use it.

The term "page fragment" is a net thing and isn't something we know
about.  What is it?  From context I'm thinking a definition would look
something like

  An arbitrary-length arbitrary-offset area of memory which resides
  within a 0 or higher order page.  Multiple fragments within that page
  are individually refcounted, in the page's reference counter.

Is that correct and complete?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Duyck May 6, 2015, 8:27 p.m. UTC | #4
On 05/06/2015 12:38 PM, Andrew Morton wrote:
> On Mon, 04 May 2015 16:14:48 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
>
>> +/**
>> + * skb_free_frag - free a page fragment
>> + * @head: virtual address of page fragment
>> + *
>> + * Frees a page fragment allocated out of either a compound or order 0 page.
>> + * The function itself is a hybrid between free_pages and free_compound_page
>> + * which can be found in mm/page_alloc.c
>> + */
>> +void skb_free_frag(void *head)
>> +{
>> +	struct page *page = virt_to_head_page(head);
>> +
>> +	if (unlikely(put_page_testzero(page))) {
>> +		if (likely(PageHead(page)))
>> +			__free_pages_ok(page, compound_order(page));
>> +		else
>> +			free_hot_cold_page(page, false);
>> +	}
>> +}
> Why are we testing for PageHead in here?  If the code were to simply do
>
> 	if (unlikely(put_page_testzero(page)))
> 		__free_pages_ok(page, compound_order(page));
>
> that would still work?

My assumption was that there was a performance difference between 
__free_pages_ok and free_hot_cold_page for order 0 pages.  From what I 
can tell free_hot_cold_page will do bulk cleanup via free_pcppages_bulk 
while __free_pages_ok just calls free_one_page.

> There's nothing networking-specific in here.  I suggest the function be
> renamed and moved to page_alloc.c.  Add an inlined skb_free_frag() in a
> net header which calls it.  This way the mm developers know about it
> and will hopefully maintain it.  It would need a comment explaining
> when and why people should and shouldn't use it.

That's true.  While I am at it I should probably pull the allocation out 
as well just so it is all in one location.

> The term "page fragment" is a net thing and isn't something we know
> about.  What is it?  From context I'm thinking a definition would look
> something like
>
>    An arbitrary-length arbitrary-offset area of memory which resides
>    within a 0 or higher order page.  Multiple fragments within that page
>    are individually refcounted, in the page's reference counter.
>
> Is that correct and complete?

Yeah that is pretty complete.  I've added Eric who originally authored 
this to the Cc in case there is something he wants to add. I'll see 
about updating this and will likely have a v2 ready in the next couple 
of hours.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton May 6, 2015, 8:41 p.m. UTC | #5
On Wed, 06 May 2015 13:27:43 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote:

> >> +void skb_free_frag(void *head)
> >> +{
> >> +	struct page *page = virt_to_head_page(head);
> >> +
> >> +	if (unlikely(put_page_testzero(page))) {
> >> +		if (likely(PageHead(page)))
> >> +			__free_pages_ok(page, compound_order(page));
> >> +		else
> >> +			free_hot_cold_page(page, false);
> >> +	}
> >> +}
> > Why are we testing for PageHead in here?  If the code were to simply do
> >
> > 	if (unlikely(put_page_testzero(page)))
> > 		__free_pages_ok(page, compound_order(page));
> >
> > that would still work?
> 
> My assumption was that there was a performance difference between 
> __free_pages_ok and free_hot_cold_page for order 0 pages.  From what I 
> can tell free_hot_cold_page will do bulk cleanup via free_pcppages_bulk 
> while __free_pages_ok just calls free_one_page.

Could be.  Plus there's hopefully some performance advantage if the
page is genuinely cache-hot.  I don't think that anyone has verified
the benefits of the hot/cold optimisation in the last decade or two,
and it was always pretty marginal..

Is the PageHead thing really "likely"?  We're usually dealing with
order>0 pages here?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Duyck May 6, 2015, 8:55 p.m. UTC | #6
On 05/06/2015 01:41 PM, Andrew Morton wrote:
> On Wed, 06 May 2015 13:27:43 -0700 Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
>
>>>> +void skb_free_frag(void *head)
>>>> +{
>>>> +	struct page *page = virt_to_head_page(head);
>>>> +
>>>> +	if (unlikely(put_page_testzero(page))) {
>>>> +		if (likely(PageHead(page)))
>>>> +			__free_pages_ok(page, compound_order(page));
>>>> +		else
>>>> +			free_hot_cold_page(page, false);
>>>> +	}
>>>> +}
>>> Why are we testing for PageHead in here?  If the code were to simply do
>>>
>>> 	if (unlikely(put_page_testzero(page)))
>>> 		__free_pages_ok(page, compound_order(page));
>>>
>>> that would still work?
>> My assumption was that there was a performance difference between
>> __free_pages_ok and free_hot_cold_page for order 0 pages.  From what I
>> can tell free_hot_cold_page will do bulk cleanup via free_pcppages_bulk
>> while __free_pages_ok just calls free_one_page.
> Could be.  Plus there's hopefully some performance advantage if the
> page is genuinely cache-hot.  I don't think that anyone has verified
> the benefits of the hot/cold optimisation in the last decade or two,
> and it was always pretty marginal..

Either way it doesn't make much difference.  If you would prefer I can 
probably just call __free_pages_ok for all cases.

> Is the PageHead thing really "likely"?  We're usually dealing with
> order>0 pages here?

On any system that only supports 4K pages the default is to allocate an 
order 3 page (32K) and then pull the fragments out of that.  So if 
__free_pages_ok works for an order 0 page I'll just call it since it 
shouldn't be a very common occurrence anyway unless we are under memory 
pressure.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 97a9373e61e8..edd19a06e2ac 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -365,6 +365,7 @@  extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
 extern void free_hot_cold_page(struct page *page, bool cold);
 extern void free_hot_cold_page_list(struct list_head *list, bool cold);
+extern void __free_pages_ok(struct page *page, unsigned int order);
 
 extern void __free_kmem_pages(struct page *page, unsigned int order);
 extern void free_kmem_pages(unsigned long addr, unsigned int order);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9c2f793573fa..3bfe3340929e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2186,6 +2186,7 @@  static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 	return __netdev_alloc_skb_ip_align(dev, length, GFP_ATOMIC);
 }
 
+void skb_free_frag(void *head);
 void *napi_alloc_frag(unsigned int fragsz);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
 				 unsigned int length, gfp_t gfp_mask);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ebffa0e4a9c0..ab9ba9360730 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -165,8 +165,6 @@  bool pm_suspended_storage(void)
 int pageblock_order __read_mostly;
 #endif
 
-static void __free_pages_ok(struct page *page, unsigned int order);
-
 /*
  * results with 256, 32 in the lowmem_reserve sysctl:
  *	1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
@@ -815,7 +813,7 @@  static bool free_pages_prepare(struct page *page, unsigned int order)
 	return true;
 }
 
-static void __free_pages_ok(struct page *page, unsigned int order)
+void __free_pages_ok(struct page *page, unsigned int order)
 {
 	unsigned long flags;
 	int migratetype;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1e4278a4dd7e..754842557fb0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -428,6 +428,27 @@  refill:
 	return page_address(page) + offset;
 }
 
+/**
+ * skb_free_frag - free a page fragment
+ * @head: virtual address of page fragment
+ *
+ * Frees a page fragment allocated out of either a compound or order 0 page.
+ * The function itself is a hybrid between free_pages and free_compound_page
+ * which can be found in mm/page_alloc.c
+ */
+void skb_free_frag(void *head)
+{
+	struct page *page = virt_to_head_page(head);
+
+	if (unlikely(put_page_testzero(page))) {
+		if (likely(PageHead(page)))
+			__free_pages_ok(page, compound_order(page));
+		else
+			free_hot_cold_page(page, false);
+	}
+}
+EXPORT_SYMBOL(skb_free_frag);
+
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
 	unsigned long flags;
@@ -499,7 +520,7 @@  static struct sk_buff *__alloc_rx_skb(unsigned int length, gfp_t gfp_mask,
 		if (likely(data)) {
 			skb = build_skb(data, fragsz);
 			if (unlikely(!skb))
-				put_page(virt_to_head_page(data));
+				skb_free_frag(data);
 		}
 	} else {
 		skb = __alloc_skb(length, gfp_mask,
@@ -611,10 +632,12 @@  static void skb_clone_fraglist(struct sk_buff *skb)
 
 static void skb_free_head(struct sk_buff *skb)
 {
+	unsigned char *head = skb->head;
+
 	if (skb->head_frag)
-		put_page(virt_to_head_page(skb->head));
+		skb_free_frag(head);
 	else
-		kfree(skb->head);
+		kfree(head);
 }
 
 static void skb_release_data(struct sk_buff *skb)