Message ID | 1340375443-22455-5-git-send-email-mgorman@suse.de |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Jun 22, 2012 at 03:30:31PM +0100, Mel Gorman wrote: > This is needed to allow network softirq packet processing to make > use of PF_MEMALLOC. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b6c0727..5c6d9c6 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2265,7 +2265,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { > if (gfp_mask & __GFP_MEMALLOC) > alloc_flags |= ALLOC_NO_WATERMARKS; > - else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt()) > + else if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) > + alloc_flags |= ALLOC_NO_WATERMARKS; > + else if (!in_interrupt() && > + ((current->flags & PF_MEMALLOC) || > + unlikely(test_thread_flag(TIF_MEMDIE)))) > alloc_flags |= ALLOC_NO_WATERMARKS; > } You allocate in RX path with __GFP_MEMALLOC and your sk->sk_allocation has also __GFP_MEMALLOC set. That means you should get ALLOC_NO_WATERMARKS in alloc_flags. Is this to done to avoid GFP annotations in skb_share_check() and friends on your __netif_receive_skb() path? Sebastian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 26, 2012 at 06:55:13PM +0200, Sebastian Andrzej Siewior wrote: > On Fri, Jun 22, 2012 at 03:30:31PM +0100, Mel Gorman wrote: > > This is needed to allow network softirq packet processing to make > > use of PF_MEMALLOC. > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index b6c0727..5c6d9c6 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2265,7 +2265,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { > > if (gfp_mask & __GFP_MEMALLOC) > > alloc_flags |= ALLOC_NO_WATERMARKS; > > - else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt()) > > + else if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) > > + alloc_flags |= ALLOC_NO_WATERMARKS; > > + else if (!in_interrupt() && > > + ((current->flags & PF_MEMALLOC) || > > + unlikely(test_thread_flag(TIF_MEMDIE)))) > > alloc_flags |= ALLOC_NO_WATERMARKS; > > } > > You allocate in RX path with __GFP_MEMALLOC and your sk->sk_allocation has > also __GFP_MEMALLOC set. That means you should get ALLOC_NO_WATERMARKS in > alloc_flags. In the cases where they are annotated correctly, yes. It is recordeed if the page gets allocated from the PFMEMALLOC reserves. If the received packet is not SOCK_MEMALLOC and the page was allocated from PFMEMALLOC reserves it is then discarded and the packet must be retransmitted. > Is this to done to avoid GFP annotations in skb_share_check() and > friends on your __netif_receive_skb() path? > I don't get your question as the annotations are not being avoided. If they are set, they are used. In the __netif_receive_skb path, PF_MEMALLOC is set for PFMEMALLOC skbs to avoid having to annotate every single allocation call site.
On Wed, Jun 27, 2012 at 09:26:14AM +0100, Mel Gorman wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index b6c0727..5c6d9c6 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -2265,7 +2265,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > > if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { > > > if (gfp_mask & __GFP_MEMALLOC) > > > alloc_flags |= ALLOC_NO_WATERMARKS; > > > - else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt()) > > > + else if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) > > > + alloc_flags |= ALLOC_NO_WATERMARKS; > > > + else if (!in_interrupt() && > > > + ((current->flags & PF_MEMALLOC) || > > > + unlikely(test_thread_flag(TIF_MEMDIE)))) > > > alloc_flags |= ALLOC_NO_WATERMARKS; > > > } > > > > You allocate in RX path with __GFP_MEMALLOC and your sk->sk_allocation has > > also __GFP_MEMALLOC set. That means you should get ALLOC_NO_WATERMARKS in > > alloc_flags. > > In the cases where they are annotated correctly, yes. It is recordeed if > the page gets allocated from the PFMEMALLOC reserves. If the received > packet is not SOCK_MEMALLOC and the page was allocated from PFMEMALLOC > reserves it is then discarded and the packet must be retransmitted. Let me try again: - lets assume your allocation happens with alloc_page(), without __GFP_MEMALLOC in GFP_FLAGS and with PF_MEMALLOC in current->flags. Now you may get memory which you wouldn't receive otherwise (without PF_MEMALLOC). Okay, understood. So you don't have to annotate each page allocation in your receive path for instance as long as the process has the flag set. - lets assume your allocation happens with kmalloc() without __GFP_MEMALLOC and current->flags has PF_MEMALLOC ORed and your SLAB pool is empty. This forces SLAB to allocate more pages from the buddy allocator with it will receive more likely (due to ->current->flags + PF_MEMALLOC) but SLAB will drop this extra memory because the page has ->pf_memory (or something like that) set and the GFP_FLAGS do not have __GFP_MEMALLOC set. Is there something I missed? Sebastian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jul 08, 2012 at 08:12:11PM +0200, Sebastian Andrzej Siewior wrote: > On Wed, Jun 27, 2012 at 09:26:14AM +0100, Mel Gorman wrote: > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index b6c0727..5c6d9c6 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -2265,7 +2265,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > > > if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { > > > > if (gfp_mask & __GFP_MEMALLOC) > > > > alloc_flags |= ALLOC_NO_WATERMARKS; > > > > - else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt()) > > > > + else if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) > > > > + alloc_flags |= ALLOC_NO_WATERMARKS; > > > > + else if (!in_interrupt() && > > > > + ((current->flags & PF_MEMALLOC) || > > > > + unlikely(test_thread_flag(TIF_MEMDIE)))) > > > > alloc_flags |= ALLOC_NO_WATERMARKS; > > > > } > > > > > > You allocate in RX path with __GFP_MEMALLOC and your sk->sk_allocation has > > > also __GFP_MEMALLOC set. That means you should get ALLOC_NO_WATERMARKS in > > > alloc_flags. > > > > In the cases where they are annotated correctly, yes. It is recordeed if > > the page gets allocated from the PFMEMALLOC reserves. If the received > > packet is not SOCK_MEMALLOC and the page was allocated from PFMEMALLOC > > reserves it is then discarded and the packet must be retransmitted. > > Let me try again: > - lets assume your allocation happens with alloc_page(), without > __GFP_MEMALLOC in GFP_FLAGS and with PF_MEMALLOC in current->flags. Now > you may get memory which you wouldn't receive otherwise (without > PF_MEMALLOC). Okay, understood. So you don't have to annotate each page > allocation in your receive path for instance as long as the process has the > flag set. Yes. > - lets assume your allocation happens with kmalloc() without __GFP_MEMALLOC > and current->flags has PF_MEMALLOC ORed and your SLAB pool is empty. This > forces SLAB to allocate more pages from the buddy allocator with it will > receive more likely (due to ->current->flags + PF_MEMALLOC) but SLAB will > drop this extra memory because the page has ->pf_memory (or something like > that) set and the GFP_FLAGS do not have __GFP_MEMALLOC set. > It's recorded if the slab page was allocated from PFMEMALLOC reserves (see patch 2 from the swap over NBD series). slab will use this page for objects but only allocate them to callers that pass a gfp_pfmemalloc_allowed() check. kmalloc() users with either __GFP_MEMALLOC or PF_MEMALLOC will get the pages they need but they will not "leak" to !_GFP_MEMALLOC users as that would potentially deadlock.
On Mon, Jul 09, 2012 at 11:04:42AM +0100, Mel Gorman wrote: > > - lets assume your allocation happens with kmalloc() without __GFP_MEMALLOC > > and current->flags has PF_MEMALLOC ORed and your SLAB pool is empty. This > > forces SLAB to allocate more pages from the buddy allocator with it will > > receive more likely (due to ->current->flags + PF_MEMALLOC) but SLAB will > > drop this extra memory because the page has ->pf_memory (or something like > > that) set and the GFP_FLAGS do not have __GFP_MEMALLOC set. > > > > It's recorded if the slab page was allocated from PFMEMALLOC reserves (see > patch 2 from the swap over NBD series). slab will use this page for objects > but only allocate them to callers that pass a gfp_pfmemalloc_allowed() check. > kmalloc() users with either __GFP_MEMALLOC or PF_MEMALLOC will get > the pages they need but they will not "leak" to !_GFP_MEMALLOC users as > that would potentially deadlock. Argh, I missed that gfp_to_alloc_flags() is not only called from within the buddy allocater but also from slab. So this is fine then :) One thing: You only get current->flags |= PF_MEMALLOC in softirq _if_ the skb, which is passed to netif_receive_skb(), was allocated with __GFP_MEMALLOC. That means if the NIC's RX allocation did not require an allocation from the emergency pool (without ->pfmemalloc set) then you never use this extra pool, even if this skb would end up in your swap socket. Also, the other way around, where you allocate it from the emergency pool but it is a user socket and you could drop it. What about extending sk_set_memalloc() to record socket's ips + ports in a separate list so that skb_pfmemalloc_protocol() might use that information and decide on per-protocol basis if the skb is worth to spend more ressource to deliver it. That means you would enable the extra pool if the currently received skb is part of your swap socket and not if the skb was allocated from the emergency pool. That said, there is nothing wrong with the code as of now and this optimization could be added later (if at all). Sebastian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 09, 2012 at 06:57:10PM +0200, Sebastian Andrzej Siewior wrote: > On Mon, Jul 09, 2012 at 11:04:42AM +0100, Mel Gorman wrote: > > > - lets assume your allocation happens with kmalloc() without __GFP_MEMALLOC > > > and current->flags has PF_MEMALLOC ORed and your SLAB pool is empty. This > > > forces SLAB to allocate more pages from the buddy allocator with it will > > > receive more likely (due to ->current->flags + PF_MEMALLOC) but SLAB will > > > drop this extra memory because the page has ->pf_memory (or something like > > > that) set and the GFP_FLAGS do not have __GFP_MEMALLOC set. > > > > > > > It's recorded if the slab page was allocated from PFMEMALLOC reserves (see > > patch 2 from the swap over NBD series). slab will use this page for objects > > but only allocate them to callers that pass a gfp_pfmemalloc_allowed() check. > > kmalloc() users with either __GFP_MEMALLOC or PF_MEMALLOC will get > > the pages they need but they will not "leak" to !_GFP_MEMALLOC users as > > that would potentially deadlock. > > Argh, I missed that gfp_to_alloc_flags() is not only called from > within the buddy allocater but also from slab. So this is fine then :) > Good to hear. I appreciate you taking the time to give it a solid review like this looking for holes. > One thing: > You only get current->flags |= PF_MEMALLOC in softirq _if_ the skb, which is > passed to netif_receive_skb(), was allocated with __GFP_MEMALLOC. That > means if the NIC's RX allocation did not require an allocation from the > emergency pool (without ->pfmemalloc set) then you never use this extra > pool, even if this skb would end up in your swap socket. Also, the other way > around, where you allocate it from the emergency pool but it is a user > socket and you could drop it. > While there is a possibility that packets may get dropped later like this, they still get retransmitted and eventually it'll get through. This is not optimal but optimised swap-over-network was not the primary goal of the series, deadlock avoidance was. > What about extending sk_set_memalloc() to record socket's ips + ports > in a separate list so that skb_pfmemalloc_protocol() might use that > information and decide on per-protocol basis if the skb is worth to > spend more ressource to deliver it. That means you would enable the > extra pool if the currently received skb is part of your swap socket and > not if the skb was allocated from the emergency pool. > > That said, there is nothing wrong with the code as of now and this > optimization could be added later (if at all). > I think it is a good idea but it could also be done later iff a user had a serious problem with the performance and that this made a measurable difference. The series is already quite complex and I'd rather not add to that complexity without strong motivation.
diff --git a/include/linux/sched.h b/include/linux/sched.h index 08384db..706e405 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1913,6 +1913,13 @@ static inline void rcu_switch_from(struct task_struct *prev) #endif +static inline void tsk_restore_flags(struct task_struct *task, + unsigned long orig_flags, unsigned long flags) +{ + task->flags &= ~flags; + task->flags |= orig_flags & flags; +} + #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); diff --git a/kernel/softirq.c b/kernel/softirq.c index 671f959..b73e681 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -210,6 +210,14 @@ asmlinkage void __do_softirq(void) __u32 pending; int max_restart = MAX_SOFTIRQ_RESTART; int cpu; + unsigned long old_flags = current->flags; + + /* + * Mask out PF_MEMALLOC s current task context is borrowed for the + * softirq. A softirq handled such as network RX might set PF_MEMALLOC + * again if the socket is related to swap + */ + current->flags &= ~PF_MEMALLOC; pending = local_softirq_pending(); account_system_vtime(current); @@ -265,6 +273,7 @@ restart: account_system_vtime(current); __local_bh_enable(SOFTIRQ_OFFSET); + tsk_restore_flags(current, old_flags, PF_MEMALLOC); } #ifndef __ARCH_HAS_DO_SOFTIRQ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b6c0727..5c6d9c6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2265,7 +2265,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask) if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) { if (gfp_mask & __GFP_MEMALLOC) alloc_flags |= ALLOC_NO_WATERMARKS; - else if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt()) + else if (in_serving_softirq() && (current->flags & PF_MEMALLOC)) + alloc_flags |= ALLOC_NO_WATERMARKS; + else if (!in_interrupt() && + ((current->flags & PF_MEMALLOC) || + unlikely(test_thread_flag(TIF_MEMDIE)))) alloc_flags |= ALLOC_NO_WATERMARKS; }