Message ID | 20160525154419.GE3354@mtj.duckdns.org |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On 05/25/2016 05:44 PM, Tejun Heo wrote: > Atomic allocations can trigger async map extensions which is serviced > by chunk->map_extend_work. pcpu_balance_work which is responsible for > destroying idle chunks wasn't synchronizing properly against > chunk->map_extend_work and may end up freeing the chunk while the work > item is still in flight. > > This patch fixes the bug by rolling async map extension operations > into pcpu_balance_work. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-and-tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Reported-by: Vlastimil Babka <vbabka@suse.cz> > Reported-by: Sasha Levin <sasha.levin@oracle.com> > Cc: stable@vger.kernel.org # v3.18+ > Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space") I didn't spot issues, but I'm not that familiar with the code, so it doesn't mean much. Just one question below: > --- > mm/percpu.c | 57 ++++++++++++++++++++++++++++++++++++--------------------- > 1 file changed, 36 insertions(+), 21 deletions(-) > > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -112,7 +112,7 @@ struct pcpu_chunk { > int map_used; /* # of map entries used before the sentry */ > int map_alloc; /* # of map entries allocated */ > int *map; /* allocation map */ > - struct work_struct map_extend_work;/* async ->map[] extension */ > + struct list_head map_extend_list;/* on pcpu_map_extend_chunks */ > > void *data; /* chunk data */ > int first_free; /* no free below this */ > @@ -166,6 +166,9 @@ static DEFINE_MUTEX(pcpu_alloc_mutex); / > > static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */ > > +/* chunks which need their map areas extended, protected by pcpu_lock */ > +static LIST_HEAD(pcpu_map_extend_chunks); > + > /* > * The number of empty populated pages, protected by pcpu_lock. The > * reserved chunk doesn't contribute to the count. > @@ -395,13 +398,19 @@ static int pcpu_need_to_extend(struct pc > { > int margin, new_alloc; > > + lockdep_assert_held(&pcpu_lock); > + > if (is_atomic) { > margin = 3; > > if (chunk->map_alloc < > - chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW && > - pcpu_async_enabled) > - schedule_work(&chunk->map_extend_work); > + chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) { > + if (list_empty(&chunk->map_extend_list)) { So why this list_empty condition? Doesn't it deserve a comment then? And isn't using a list an overkill in that case? Thanks. > + list_add_tail(&chunk->map_extend_list, > + &pcpu_map_extend_chunks); > + pcpu_schedule_balance_work(); > + } > + } > } else { > margin = PCPU_ATOMIC_MAP_MARGIN_HIGH; > } [...]
Hello, On Thu, May 26, 2016 at 11:19:06AM +0200, Vlastimil Babka wrote: > > if (is_atomic) { > > margin = 3; > > > > if (chunk->map_alloc < > > - chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW && > > - pcpu_async_enabled) > > - schedule_work(&chunk->map_extend_work); > > + chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) { > > + if (list_empty(&chunk->map_extend_list)) { > So why this list_empty condition? Doesn't it deserve a comment then? And Because doing list_add() twice corrupts the list. I'm not sure that deserves a comment. We can do list_move() instead but that isn't necessarily better. > isn't using a list an overkill in that case? That would require rebalance work to scan all chunks whenever it's scheduled and if a lot of atomic allocations are taking place, it has some possibility to become expensive with a lot of chunks. Thanks.
On 26.5.2016 21:21, Tejun Heo wrote: > Hello, > > On Thu, May 26, 2016 at 11:19:06AM +0200, Vlastimil Babka wrote: >>> if (is_atomic) { >>> margin = 3; >>> >>> if (chunk->map_alloc < >>> - chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW && >>> - pcpu_async_enabled) >>> - schedule_work(&chunk->map_extend_work); >>> + chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) { >>> + if (list_empty(&chunk->map_extend_list)) { > >> So why this list_empty condition? Doesn't it deserve a comment then? And > > Because doing list_add() twice corrupts the list. I'm not sure that > deserves a comment. We can do list_move() instead but that isn't > necessarily better. Ugh, right, somehow I thought it was testing &pcpu_map_extend_chunks. My second question was based on the assumption that the list can have only one item. Sorry about the noise. >> isn't using a list an overkill in that case? > > That would require rebalance work to scan all chunks whenever it's > scheduled and if a lot of atomic allocations are taking place, it has > some possibility to become expensive with a lot of chunks. > > Thanks. >
--- a/mm/percpu.c +++ b/mm/percpu.c @@ -112,7 +112,7 @@ struct pcpu_chunk { int map_used; /* # of map entries used before the sentry */ int map_alloc; /* # of map entries allocated */ int *map; /* allocation map */ - struct work_struct map_extend_work;/* async ->map[] extension */ + struct list_head map_extend_list;/* on pcpu_map_extend_chunks */ void *data; /* chunk data */ int first_free; /* no free below this */ @@ -166,6 +166,9 @@ static DEFINE_MUTEX(pcpu_alloc_mutex); / static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */ +/* chunks which need their map areas extended, protected by pcpu_lock */ +static LIST_HEAD(pcpu_map_extend_chunks); + /* * The number of empty populated pages, protected by pcpu_lock. The * reserved chunk doesn't contribute to the count. @@ -395,13 +398,19 @@ static int pcpu_need_to_extend(struct pc { int margin, new_alloc; + lockdep_assert_held(&pcpu_lock); + if (is_atomic) { margin = 3; if (chunk->map_alloc < - chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW && - pcpu_async_enabled) - schedule_work(&chunk->map_extend_work); + chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) { + if (list_empty(&chunk->map_extend_list)) { + list_add_tail(&chunk->map_extend_list, + &pcpu_map_extend_chunks); + pcpu_schedule_balance_work(); + } + } } else { margin = PCPU_ATOMIC_MAP_MARGIN_HIGH; } @@ -467,20 +476,6 @@ out_unlock: return 0; } -static void pcpu_map_extend_workfn(struct work_struct *work) -{ - struct pcpu_chunk *chunk = container_of(work, struct pcpu_chunk, - map_extend_work); - int new_alloc; - - spin_lock_irq(&pcpu_lock); - new_alloc = pcpu_need_to_extend(chunk, false); - spin_unlock_irq(&pcpu_lock); - - if (new_alloc) - pcpu_extend_area_map(chunk, new_alloc); -} - /** * pcpu_fit_in_area - try to fit the requested allocation in a candidate area * @chunk: chunk the candidate area belongs to @@ -740,7 +735,7 @@ static struct pcpu_chunk *pcpu_alloc_chu chunk->map_used = 1; INIT_LIST_HEAD(&chunk->list); - INIT_WORK(&chunk->map_extend_work, pcpu_map_extend_workfn); + INIT_LIST_HEAD(&chunk->map_extend_list); chunk->free_size = pcpu_unit_size; chunk->contig_hint = pcpu_unit_size; @@ -1129,6 +1124,7 @@ static void pcpu_balance_workfn(struct w if (chunk == list_first_entry(free_head, struct pcpu_chunk, list)) continue; + list_del_init(&chunk->map_extend_list); list_move(&chunk->list, &to_free); } @@ -1146,6 +1142,25 @@ static void pcpu_balance_workfn(struct w pcpu_destroy_chunk(chunk); } + /* service chunks which requested async area map extension */ + do { + int new_alloc = 0; + + spin_lock_irq(&pcpu_lock); + + chunk = list_first_entry_or_null(&pcpu_map_extend_chunks, + struct pcpu_chunk, map_extend_list); + if (chunk) { + list_del_init(&chunk->map_extend_list); + new_alloc = pcpu_need_to_extend(chunk, false); + } + + spin_unlock_irq(&pcpu_lock); + + if (new_alloc) + pcpu_extend_area_map(chunk, new_alloc); + } while (chunk); + /* * Ensure there are certain number of free populated pages for * atomic allocs. Fill up from the most packed so that atomic @@ -1644,7 +1659,7 @@ int __init pcpu_setup_first_chunk(const */ schunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0); INIT_LIST_HEAD(&schunk->list); - INIT_WORK(&schunk->map_extend_work, pcpu_map_extend_workfn); + INIT_LIST_HEAD(&schunk->map_extend_list); schunk->base_addr = base_addr; schunk->map = smap; schunk->map_alloc = ARRAY_SIZE(smap); @@ -1673,7 +1688,7 @@ int __init pcpu_setup_first_chunk(const if (dyn_size) { dchunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0); INIT_LIST_HEAD(&dchunk->list); - INIT_WORK(&dchunk->map_extend_work, pcpu_map_extend_workfn); + INIT_LIST_HEAD(&dchunk->map_extend_list); dchunk->base_addr = base_addr; dchunk->map = dmap; dchunk->map_alloc = ARRAY_SIZE(dmap);
Atomic allocations can trigger async map extensions which is serviced by chunk->map_extend_work. pcpu_balance_work which is responsible for destroying idle chunks wasn't synchronizing properly against chunk->map_extend_work and may end up freeing the chunk while the work item is still in flight. This patch fixes the bug by rolling async map extension operations into pcpu_balance_work. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: stable@vger.kernel.org # v3.18+ Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space") --- mm/percpu.c | 57 ++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 21 deletions(-)