Message ID | 170784021983.6249.10039296655906636112.stgit@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 0846dd77c8349ec92ca0079c9c71d130f34cb192 |
Headers | show |
Series | powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 23 jobs. |
On Tue, Feb 13, 2024 at 10:05:22AM -0600, Shivaprasad G Bhat wrote: > The function spapr_tce_platform_iommu_attach_dev() is missing to call > iommu_group_put() when the domain is already set. This refcount leak > shows up with BUG_ON() during DLPAR remove operation as, > > KernelBug: Kernel bug in state 'None': kernel BUG at arch/powerpc/platforms/pseries/iommu.c:100! > Oops: Exception in kernel mode, sig: 5 [#1] > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries > <snip> > Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_016) hv:phyp pSeries > NIP: c0000000000ff4d4 LR: c0000000000ff4cc CTR: 0000000000000000 > REGS: c0000013aed5f840 TRAP: 0700 Tainted: G I (6.8.0-rc3-autotest-g99bd3cb0d12e) > MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002402 XER: 20040000 > CFAR: c000000000a0d170 IRQMASK: 0 > GPR00: c0000000000ff4cc c0000013aed5fae0 c000000001512700 c0000013aa362138 > GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000119c8afd0 > GPR08: 0000000000000000 c000001284442b00 0000000000000001 0000000000001003 > GPR12: 0000000300000000 c0000018ffff2f00 0000000000000000 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR24: c0000013aed5fc40 0000000000000002 0000000000000000 c000000002757d90 > GPR28: c0000000000ff440 c000000002757cb8 c00000183799c1a0 c0000013aa362b00 > NIP [c0000000000ff4d4] iommu_reconfig_notifier+0x94/0x200 > LR [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 > Call Trace: > [c0000013aed5fae0] [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 (unreliable) > [c0000013aed5fb10] [c0000000001a27b0] notifier_call_chain+0xb8/0x19c > [c0000013aed5fb70] [c0000000001a2a78] blocking_notifier_call_chain+0x64/0x98 > [c0000013aed5fbb0] [c000000000c4a898] of_reconfig_notify+0x44/0xdc > [c0000013aed5fc20] [c000000000c4add4] of_detach_node+0x78/0xb0 > [c0000013aed5fc70] [c0000000000f96a8] ofdt_write.part.0+0x86c/0xbb8 > [c0000013aed5fce0] [c00000000069b4bc] proc_reg_write+0xf4/0x150 > [c0000013aed5fd10] [c0000000005bfeb4] vfs_write+0xf8/0x488 > [c0000013aed5fdc0] [c0000000005c0570] ksys_write+0x84/0x140 > [c0000013aed5fe10] [c000000000033358] system_call_exception+0x138/0x330 > [c0000013aed5fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec > --- interrupt: 3000 at 0x20000433acb4 > <snip> > ---[ end trace 0000000000000000 ]--- > > The patch adds the missing iommu_group_put() call. > > Fixes: a8ca9fc9134c ("powerpc/iommu: Do not do platform domain attach atctions after probe") > Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> > --- > arch/powerpc/kernel/iommu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) Doh, that is a weird splat for this but thanks for finding it Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
Thanks for the patch. Applied this patch and verified and issue is fixed. This issue way originally reported in the below mail. https://marc.info/?l=linux-kernel&m=170737160630106&w=2 Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> On 13/02/24 10:51 pm, Jason Gunthorpe wrote: > On Tue, Feb 13, 2024 at 10:05:22AM -0600, Shivaprasad G Bhat wrote: >> The function spapr_tce_platform_iommu_attach_dev() is missing to call >> iommu_group_put() when the domain is already set. This refcount leak >> shows up with BUG_ON() during DLPAR remove operation as, >> >> KernelBug: Kernel bug in state 'None': kernel BUG at arch/powerpc/platforms/pseries/iommu.c:100! >> Oops: Exception in kernel mode, sig: 5 [#1] >> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries >> <snip> >> Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_016) hv:phyp pSeries >> NIP: c0000000000ff4d4 LR: c0000000000ff4cc CTR: 0000000000000000 >> REGS: c0000013aed5f840 TRAP: 0700 Tainted: G I (6.8.0-rc3-autotest-g99bd3cb0d12e) >> MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002402 XER: 20040000 >> CFAR: c000000000a0d170 IRQMASK: 0 >> GPR00: c0000000000ff4cc c0000013aed5fae0 c000000001512700 c0000013aa362138 >> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000119c8afd0 >> GPR08: 0000000000000000 c000001284442b00 0000000000000001 0000000000001003 >> GPR12: 0000000300000000 c0000018ffff2f00 0000000000000000 0000000000000000 >> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR24: c0000013aed5fc40 0000000000000002 0000000000000000 c000000002757d90 >> GPR28: c0000000000ff440 c000000002757cb8 c00000183799c1a0 c0000013aa362b00 >> NIP [c0000000000ff4d4] iommu_reconfig_notifier+0x94/0x200 >> LR [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 >> Call Trace: >> [c0000013aed5fae0] [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 (unreliable) >> [c0000013aed5fb10] [c0000000001a27b0] notifier_call_chain+0xb8/0x19c >> [c0000013aed5fb70] [c0000000001a2a78] blocking_notifier_call_chain+0x64/0x98 >> [c0000013aed5fbb0] [c000000000c4a898] of_reconfig_notify+0x44/0xdc >> [c0000013aed5fc20] [c000000000c4add4] of_detach_node+0x78/0xb0 >> [c0000013aed5fc70] [c0000000000f96a8] ofdt_write.part.0+0x86c/0xbb8 >> [c0000013aed5fce0] [c00000000069b4bc] proc_reg_write+0xf4/0x150 >> [c0000013aed5fd10] [c0000000005bfeb4] vfs_write+0xf8/0x488 >> [c0000013aed5fdc0] [c0000000005c0570] ksys_write+0x84/0x140 >> [c0000013aed5fe10] [c000000000033358] system_call_exception+0x138/0x330 >> [c0000013aed5fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec >> --- interrupt: 3000 at 0x20000433acb4 >> <snip> >> ---[ end trace 0000000000000000 ]--- >> >> The patch adds the missing iommu_group_put() call. >> >> Fixes: a8ca9fc9134c ("powerpc/iommu: Do not do platform domain attach atctions after probe") >> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> >> --- >> arch/powerpc/kernel/iommu.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) > Doh, that is a weird splat for this but thanks for finding it > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > > Jason >
Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> writes: > Thanks for the patch. Applied this patch and verified and issue is fixed. > > This issue way originally reported in the below mail. > > https://marc.info/?l=linux-kernel&m=170737160630106&w=2 Please use lore for links, in this case: https://lore.kernel.org/all/274e0d2b-b5cc-475e-94e6-8427e88e271d@linux.vnet.ibm.com/ cheers > On 13/02/24 10:51 pm, Jason Gunthorpe wrote: >> On Tue, Feb 13, 2024 at 10:05:22AM -0600, Shivaprasad G Bhat wrote: >>> The function spapr_tce_platform_iommu_attach_dev() is missing to call >>> iommu_group_put() when the domain is already set. This refcount leak >>> shows up with BUG_ON() during DLPAR remove operation as, >>> >>> KernelBug: Kernel bug in state 'None': kernel BUG at arch/powerpc/platforms/pseries/iommu.c:100! >>> Oops: Exception in kernel mode, sig: 5 [#1] >>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries >>> <snip> >>> Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_016) hv:phyp pSeries >>> NIP: c0000000000ff4d4 LR: c0000000000ff4cc CTR: 0000000000000000 >>> REGS: c0000013aed5f840 TRAP: 0700 Tainted: G I (6.8.0-rc3-autotest-g99bd3cb0d12e) >>> MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002402 XER: 20040000 >>> CFAR: c000000000a0d170 IRQMASK: 0 >>> GPR00: c0000000000ff4cc c0000013aed5fae0 c000000001512700 c0000013aa362138 >>> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000119c8afd0 >>> GPR08: 0000000000000000 c000001284442b00 0000000000000001 0000000000001003 >>> GPR12: 0000000300000000 c0000018ffff2f00 0000000000000000 0000000000000000 >>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> GPR24: c0000013aed5fc40 0000000000000002 0000000000000000 c000000002757d90 >>> GPR28: c0000000000ff440 c000000002757cb8 c00000183799c1a0 c0000013aa362b00 >>> NIP [c0000000000ff4d4] iommu_reconfig_notifier+0x94/0x200 >>> LR [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 >>> Call Trace: >>> [c0000013aed5fae0] [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 (unreliable) >>> [c0000013aed5fb10] [c0000000001a27b0] notifier_call_chain+0xb8/0x19c >>> [c0000013aed5fb70] [c0000000001a2a78] blocking_notifier_call_chain+0x64/0x98 >>> [c0000013aed5fbb0] [c000000000c4a898] of_reconfig_notify+0x44/0xdc >>> [c0000013aed5fc20] [c000000000c4add4] of_detach_node+0x78/0xb0 >>> [c0000013aed5fc70] [c0000000000f96a8] ofdt_write.part.0+0x86c/0xbb8 >>> [c0000013aed5fce0] [c00000000069b4bc] proc_reg_write+0xf4/0x150 >>> [c0000013aed5fd10] [c0000000005bfeb4] vfs_write+0xf8/0x488 >>> [c0000013aed5fdc0] [c0000000005c0570] ksys_write+0x84/0x140 >>> [c0000013aed5fe10] [c000000000033358] system_call_exception+0x138/0x330 >>> [c0000013aed5fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec >>> --- interrupt: 3000 at 0x20000433acb4 >>> <snip> >>> ---[ end trace 0000000000000000 ]--- >>> >>> The patch adds the missing iommu_group_put() call. >>> >>> Fixes: a8ca9fc9134c ("powerpc/iommu: Do not do platform domain attach atctions after probe") >>> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> >>> --- >>> arch/powerpc/kernel/iommu.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >> Doh, that is a weird splat for this but thanks for finding it >> >> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> >> >> Jason >>
On Wed, Feb 14, 2024 at 11:53:20PM +1100, Michael Ellerman wrote: > Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> writes: > > Thanks for the patch. Applied this patch and verified and issue is fixed. > > > > This issue way originally reported in the below mail. > > > > https://marc.info/?l=linux-kernel&m=170737160630106&w=2 > > Please use lore for links, in this case: > > https://lore.kernel.org/all/274e0d2b-b5cc-475e-94e6-8427e88e271d@linux.vnet.ibm.com/ Also if you are respinning you may prefer this @@ -1285,14 +1285,15 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain, struct device *dev) { struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct iommu_group *grp = iommu_group_get(dev); struct iommu_table_group *table_group; + struct iommu_group *grp; int ret = -EINVAL; /* At first attach the ownership is already set */ if (!domain) return 0; + grp = iommu_group_get(dev); if (!grp) return -ENODEV; Which is sort of why this happened in the first place :) Jason
On 2/14/24 18:28, Jason Gunthorpe wrote: > On Wed, Feb 14, 2024 at 11:53:20PM +1100, Michael Ellerman wrote: >> Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> writes: >>> Thanks for the patch. Applied this patch and verified and issue is fixed. >>> >>> This issue way originally reported in the below mail. >>> >>> https://marc.info/?l=linux-kernel&m=170737160630106&w=2 >> Please use lore for links, in this case: >> >> https://lore.kernel.org/all/274e0d2b-b5cc-475e-94e6-8427e88e271d@linux.vnet.ibm.com/ > Also if you are respinning you may prefer this > > @@ -1285,14 +1285,15 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain, > struct device *dev) > { > struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > - struct iommu_group *grp = iommu_group_get(dev); > struct iommu_table_group *table_group; > + struct iommu_group *grp; > int ret = -EINVAL; > > /* At first attach the ownership is already set */ > if (!domain) > return 0; > > + grp = iommu_group_get(dev); > if (!grp) > return -ENODEV; > > Which is sort of why this happened in the first place :) Right! Posted the v2 here https://lore.kernel.org/linux-iommu/170793401503.7491.9431631474642074097.stgit@linux.ibm.com/ Thanks, Shivaprasad > Jason
On Tue, 13 Feb 2024 10:05:22 -0600, Shivaprasad G Bhat wrote: > The function spapr_tce_platform_iommu_attach_dev() is missing to call > iommu_group_put() when the domain is already set. This refcount leak > shows up with BUG_ON() during DLPAR remove operation as, > > KernelBug: Kernel bug in state 'None': kernel BUG at arch/powerpc/platforms/pseries/iommu.c:100! > Oops: Exception in kernel mode, sig: 5 [#1] > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries > <snip> > Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_016) hv:phyp pSeries > NIP: c0000000000ff4d4 LR: c0000000000ff4cc CTR: 0000000000000000 > REGS: c0000013aed5f840 TRAP: 0700 Tainted: G I (6.8.0-rc3-autotest-g99bd3cb0d12e) > MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002402 XER: 20040000 > CFAR: c000000000a0d170 IRQMASK: 0 > GPR00: c0000000000ff4cc c0000013aed5fae0 c000000001512700 c0000013aa362138 > GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000119c8afd0 > GPR08: 0000000000000000 c000001284442b00 0000000000000001 0000000000001003 > GPR12: 0000000300000000 c0000018ffff2f00 0000000000000000 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR24: c0000013aed5fc40 0000000000000002 0000000000000000 c000000002757d90 > GPR28: c0000000000ff440 c000000002757cb8 c00000183799c1a0 c0000013aa362b00 > NIP [c0000000000ff4d4] iommu_reconfig_notifier+0x94/0x200 > LR [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 > Call Trace: > [c0000013aed5fae0] [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 (unreliable) > [c0000013aed5fb10] [c0000000001a27b0] notifier_call_chain+0xb8/0x19c > [c0000013aed5fb70] [c0000000001a2a78] blocking_notifier_call_chain+0x64/0x98 > [c0000013aed5fbb0] [c000000000c4a898] of_reconfig_notify+0x44/0xdc > [c0000013aed5fc20] [c000000000c4add4] of_detach_node+0x78/0xb0 > [c0000013aed5fc70] [c0000000000f96a8] ofdt_write.part.0+0x86c/0xbb8 > [c0000013aed5fce0] [c00000000069b4bc] proc_reg_write+0xf4/0x150 > [c0000013aed5fd10] [c0000000005bfeb4] vfs_write+0xf8/0x488 > [c0000013aed5fdc0] [c0000000005c0570] ksys_write+0x84/0x140 > [c0000013aed5fe10] [c000000000033358] system_call_exception+0x138/0x330 > [c0000013aed5fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec > --- interrupt: 3000 at 0x20000433acb4 > <snip> > ---[ end trace 0000000000000000 ]--- > > [...] Applied to powerpc/fixes. [1/1] powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach https://git.kernel.org/powerpc/c/0846dd77c8349ec92ca0079c9c71d130f34cb192 cheers
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index d71eac3b2887..a9bebfd56b3b 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1289,8 +1289,10 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain, struct iommu_table_group *table_group; /* At first attach the ownership is already set */ - if (!domain) + if (!domain) { + iommu_group_put(grp); return 0; + } table_group = iommu_group_get_iommudata(grp); /*
The function spapr_tce_platform_iommu_attach_dev() is missing to call iommu_group_put() when the domain is already set. This refcount leak shows up with BUG_ON() during DLPAR remove operation as, KernelBug: Kernel bug in state 'None': kernel BUG at arch/powerpc/platforms/pseries/iommu.c:100! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries <snip> Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_016) hv:phyp pSeries NIP: c0000000000ff4d4 LR: c0000000000ff4cc CTR: 0000000000000000 REGS: c0000013aed5f840 TRAP: 0700 Tainted: G I (6.8.0-rc3-autotest-g99bd3cb0d12e) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002402 XER: 20040000 CFAR: c000000000a0d170 IRQMASK: 0 GPR00: c0000000000ff4cc c0000013aed5fae0 c000000001512700 c0000013aa362138 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000119c8afd0 GPR08: 0000000000000000 c000001284442b00 0000000000000001 0000000000001003 GPR12: 0000000300000000 c0000018ffff2f00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: c0000013aed5fc40 0000000000000002 0000000000000000 c000000002757d90 GPR28: c0000000000ff440 c000000002757cb8 c00000183799c1a0 c0000013aa362b00 NIP [c0000000000ff4d4] iommu_reconfig_notifier+0x94/0x200 LR [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 Call Trace: [c0000013aed5fae0] [c0000000000ff4cc] iommu_reconfig_notifier+0x8c/0x200 (unreliable) [c0000013aed5fb10] [c0000000001a27b0] notifier_call_chain+0xb8/0x19c [c0000013aed5fb70] [c0000000001a2a78] blocking_notifier_call_chain+0x64/0x98 [c0000013aed5fbb0] [c000000000c4a898] of_reconfig_notify+0x44/0xdc [c0000013aed5fc20] [c000000000c4add4] of_detach_node+0x78/0xb0 [c0000013aed5fc70] [c0000000000f96a8] ofdt_write.part.0+0x86c/0xbb8 [c0000013aed5fce0] [c00000000069b4bc] proc_reg_write+0xf4/0x150 [c0000013aed5fd10] [c0000000005bfeb4] vfs_write+0xf8/0x488 [c0000013aed5fdc0] [c0000000005c0570] ksys_write+0x84/0x140 [c0000013aed5fe10] [c000000000033358] system_call_exception+0x138/0x330 [c0000013aed5fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x20000433acb4 <snip> ---[ end trace 0000000000000000 ]--- The patch adds the missing iommu_group_put() call. Fixes: a8ca9fc9134c ("powerpc/iommu: Do not do platform domain attach atctions after probe") Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> --- arch/powerpc/kernel/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)