Message ID | 20200311230815.1432367-1-cheloha@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC,v1] pseries/drmem: don't cache node id in drmem_lmb struct | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (ab326587bb5fb91cc97df9b9f48e9e1469f04621) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 114 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
On Wed, Mar 11, 2020 at 06:08:15PM -0500, Scott Cheloha wrote: > At memory hot-remove time we can retrieve an LMB's nid from its > corresponding memory_block. There is no need to store the nid > in multiple locations. > > Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> > --- > The linear search in powerpc's memory_add_physaddr_to_nid() has become a > bottleneck at boot on systems with many LMBs. > > As described in this patch here: > > https://lore.kernel.org/linuxppc-dev/20200221172901.1596249-2-cheloha@linux.ibm.com/ > > the linear search seriously cripples drmem_init(). > > The obvious solution (shown in that patch) is to just make the search > in memory_add_physaddr_to_nid() faster. An XArray seems well-suited > to the task of mapping an address range to an LMB object. > > The less obvious approach is to just call memory_add_physaddr_to_nid() > in fewer places. > > I'm not sure which approach is correct, hence the RFC. You basically revert the below which will likely cause the very error that was fixed there: commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40 Author: Nathan Fontenot <nfont@linux.vnet.ibm.com> Date: Tue Oct 2 10:35:59 2018 -0500 powerpc/pseries: Track LMB nid instead of using device tree When removing memory we need to remove the memory from the node it was added to instead of looking up the node it should be in in the device tree. During testing we have seen scenarios where the affinity for a LMB changes due to a partition migration or PRRN event. In these cases the node the LMB exists in may not match the node the device tree indicates it belongs in. This can lead to a system crash when trying to DLPAR remove the LMB after a migration or PRRN event. The current code looks up the node in the device tree to remove the LMB from, the crash occurs when we try to offline this node and it does not have any data, i.e. node_data[nid] == NULL. Thanks Michal
Hi Michal, On Thu, Mar 12, 2020 at 06:02:37AM +0100, Michal Suchánek wrote: > On Wed, Mar 11, 2020 at 06:08:15PM -0500, Scott Cheloha wrote: > > At memory hot-remove time we can retrieve an LMB's nid from its > > corresponding memory_block. There is no need to store the nid > > in multiple locations. > > > > Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> > > --- > > The linear search in powerpc's memory_add_physaddr_to_nid() has become a > > bottleneck at boot on systems with many LMBs. > > > > As described in this patch here: > > > > https://lore.kernel.org/linuxppc-dev/20200221172901.1596249-2-cheloha@linux.ibm.com/ > > > > the linear search seriously cripples drmem_init(). > > > > The obvious solution (shown in that patch) is to just make the search > > in memory_add_physaddr_to_nid() faster. An XArray seems well-suited > > to the task of mapping an address range to an LMB object. > > > > The less obvious approach is to just call memory_add_physaddr_to_nid() > > in fewer places. > > > > I'm not sure which approach is correct, hence the RFC. > > You basically revert the below which will likely cause the very error > that was fixed there: > > commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40 > Author: Nathan Fontenot <nfont@linux.vnet.ibm.com> > Date: Tue Oct 2 10:35:59 2018 -0500 > > powerpc/pseries: Track LMB nid instead of using device tree > > When removing memory we need to remove the memory from the node > it was added to instead of looking up the node it should be in > in the device tree. > > During testing we have seen scenarios where the affinity for a > LMB changes due to a partition migration or PRRN event. In these > cases the node the LMB exists in may not match the node the device > tree indicates it belongs in. This can lead to a system crash > when trying to DLPAR remove the LMB after a migration or PRRN > event. The current code looks up the node in the device tree to > remove the LMB from, the crash occurs when we try to offline this > node and it does not have any data, i.e. node_data[nid] == NULL. I'm aware of this patch and that this is a near-total revert. I'm not reintroducing the original behavior, though. Instead of going to the device tree to recompute the expected node id I'm retrieving it from the LMB's corresponding memory_block. That crucial difference is this chunk: --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -376,25 +376,29 @@ static int dlpar_add_lmb(struct drmem_lmb *); static int dlpar_remove_lmb(struct drmem_lmb *lmb) { + struct memory_block *mem_block; unsigned long block_sz; int rc; if (!lmb_is_removable(lmb)) return -EINVAL; + mem_block = lmb_to_memblock(lmb); + if (mem_block == NULL) + return -EINVAL; + rc = dlpar_offline_lmb(lmb); if (rc) return rc; block_sz = pseries_memory_block_size(); - __remove_memory(lmb->nid, lmb->base_addr, block_sz); + __remove_memory(mem_block->nid, lmb->base_addr, block_sz); /* Update memory regions for memory remove */ memblock_remove(lmb->base_addr, block_sz); invalidate_lmb_associativity_index(lmb); - lmb_clear_nid(lmb); lmb->flags &= ~DRCONF_MEM_ASSIGNED; return 0; --- Unless it's possible that the call: __add_memory(my_nid, my_addr, my_size); does not yield the following: memory_block.nid == my_nid on success, then I think the solution in my patch is equivalent to and simpler than the one introduced in the patch you quote. Can you see a way the above would not hold? Then again, maybe there's a good reason not to retrieve the node id in this way. I'm thinking David Hildenbrand and/or Nathan Fontenont may have some insight on that.
Scott Cheloha <cheloha@linux.ibm.com> writes: > Hi Michal, > > On Thu, Mar 12, 2020 at 06:02:37AM +0100, Michal Suchánek wrote: >> >> You basically revert the below which will likely cause the very error >> that was fixed there: >> >> commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40 >> Author: Nathan Fontenot <nfont@linux.vnet.ibm.com> >> Date: Tue Oct 2 10:35:59 2018 -0500 >> >> powerpc/pseries: Track LMB nid instead of using device tree >> >> When removing memory we need to remove the memory from the node >> it was added to instead of looking up the node it should be in >> in the device tree. >> >> During testing we have seen scenarios where the affinity for a >> LMB changes due to a partition migration or PRRN event. In these >> cases the node the LMB exists in may not match the node the device >> tree indicates it belongs in. This can lead to a system crash >> when trying to DLPAR remove the LMB after a migration or PRRN >> event. The current code looks up the node in the device tree to >> remove the LMB from, the crash occurs when we try to offline this >> node and it does not have any data, i.e. node_data[nid] == NULL. > > I'm aware of this patch and that this is a near-total revert. > > I'm not reintroducing the original behavior, though. Instead of going > to the device tree to recompute the expected node id I'm retrieving it > from the LMB's corresponding memory_block. > > That crucial difference is this chunk: > > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c > @@ -376,25 +376,29 @@ static int dlpar_add_lmb(struct drmem_lmb *); > > static int dlpar_remove_lmb(struct drmem_lmb *lmb) > { > + struct memory_block *mem_block; > unsigned long block_sz; > int rc; > > if (!lmb_is_removable(lmb)) > return -EINVAL; > > + mem_block = lmb_to_memblock(lmb); > + if (mem_block == NULL) > + return -EINVAL; > + Assuming lmb_to_memblock() -> find_memory_block() isn't engaging in O(n) behavior or worse (which is the case in linux-next), then I think Scott's change makes sense and is a net win.
diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h index 3d76e1c388c2..4e7b6b70e366 100644 --- a/arch/powerpc/include/asm/drmem.h +++ b/arch/powerpc/include/asm/drmem.h @@ -13,9 +13,6 @@ struct drmem_lmb { u32 drc_index; u32 aa_index; u32 flags; -#ifdef CONFIG_MEMORY_HOTPLUG - int nid; -#endif }; struct drmem_lmb_info { @@ -103,22 +100,4 @@ static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb) lmb->aa_index = 0xffffffff; } -#ifdef CONFIG_MEMORY_HOTPLUG -static inline void lmb_set_nid(struct drmem_lmb *lmb) -{ - lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr); -} -static inline void lmb_clear_nid(struct drmem_lmb *lmb) -{ - lmb->nid = -1; -} -#else -static inline void lmb_set_nid(struct drmem_lmb *lmb) -{ -} -static inline void lmb_clear_nid(struct drmem_lmb *lmb) -{ -} -#endif - #endif /* _ASM_POWERPC_LMB_H */ diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c index 59327cefbc6a..873fcfc7b875 100644 --- a/arch/powerpc/mm/drmem.c +++ b/arch/powerpc/mm/drmem.c @@ -362,10 +362,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop) if (!drmem_info->lmbs) return; - for_each_drmem_lmb(lmb) { + for_each_drmem_lmb(lmb) read_drconf_v1_cell(lmb, &prop); - lmb_set_nid(lmb); - } } static void __init init_drmem_v2_lmbs(const __be32 *prop) @@ -410,8 +408,6 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop) lmb->aa_index = dr_cell.aa_index; lmb->flags = dr_cell.flags; - - lmb_set_nid(lmb); } } } diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index a4d40a3ceea3..f6d4236286cf 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -376,25 +376,29 @@ static int dlpar_add_lmb(struct drmem_lmb *); static int dlpar_remove_lmb(struct drmem_lmb *lmb) { + struct memory_block *mem_block; unsigned long block_sz; int rc; if (!lmb_is_removable(lmb)) return -EINVAL; + mem_block = lmb_to_memblock(lmb); + if (mem_block == NULL) + return -EINVAL; + rc = dlpar_offline_lmb(lmb); if (rc) return rc; block_sz = pseries_memory_block_size(); - __remove_memory(lmb->nid, lmb->base_addr, block_sz); + __remove_memory(mem_block->nid, lmb->base_addr, block_sz); /* Update memory regions for memory remove */ memblock_remove(lmb->base_addr, block_sz); invalidate_lmb_associativity_index(lmb); - lmb_clear_nid(lmb); lmb->flags &= ~DRCONF_MEM_ASSIGNED; return 0; @@ -651,7 +655,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index) static int dlpar_add_lmb(struct drmem_lmb *lmb) { unsigned long block_sz; - int rc; + int nid, rc; if (lmb->flags & DRCONF_MEM_ASSIGNED) return -EINVAL; @@ -662,11 +666,13 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) return rc; } - lmb_set_nid(lmb); block_sz = memory_block_size_bytes(); + /* Find the node id for this address. */ + nid = memory_add_physaddr_to_nid(lmb->base_addr); + /* Add the memory */ - rc = __add_memory(lmb->nid, lmb->base_addr, block_sz); + rc = __add_memory(nid, lmb->base_addr, block_sz); if (rc) { invalidate_lmb_associativity_index(lmb); return rc; @@ -674,9 +680,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) rc = dlpar_online_lmb(lmb); if (rc) { - __remove_memory(lmb->nid, lmb->base_addr, block_sz); + __remove_memory(nid, lmb->base_addr, block_sz); invalidate_lmb_associativity_index(lmb); - lmb_clear_nid(lmb); } else { lmb->flags |= DRCONF_MEM_ASSIGNED; }
At memory hot-remove time we can retrieve an LMB's nid from its corresponding memory_block. There is no need to store the nid in multiple locations. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> --- The linear search in powerpc's memory_add_physaddr_to_nid() has become a bottleneck at boot on systems with many LMBs. As described in this patch here: https://lore.kernel.org/linuxppc-dev/20200221172901.1596249-2-cheloha@linux.ibm.com/ the linear search seriously cripples drmem_init(). The obvious solution (shown in that patch) is to just make the search in memory_add_physaddr_to_nid() faster. An XArray seems well-suited to the task of mapping an address range to an LMB object. The less obvious approach is to just call memory_add_physaddr_to_nid() in fewer places. I'm not sure which approach is correct, hence the RFC. The attached patch is an example of how we could just eliminate the memory_add_physaddr_to_nid() calls during drmem_init(). Unless this is somehow an abuse of the memory_block I think retrieving the nid in this way deduplicates memoization. There is another spot where we don't *need* to search by address to find the node id. In dlpar_memory_add() we're calling lmb_set_nid(), which is just memory_add_physaddr_to_nid(). But we could easily bypass the search and call of_drconf_to_nid_single() directly with the LMB we already have a pointer to. Then again, I see callers like the xen balloon driver and probe_store() that seemingly need to do an address-to-nid lookup. So, does memory_add_physaddr_to_nid() need to be fast? Should I bite the bullet and make it fast so we can leverage it without slowdown? Or is calling it in fewer places an acceptable approach? arch/powerpc/include/asm/drmem.h | 21 ------------------- arch/powerpc/mm/drmem.c | 6 +----- .../platforms/pseries/hotplug-memory.c | 19 ++++++++++------- 3 files changed, 13 insertions(+), 33 deletions(-)