diff mbox series

sparc/pci: Make pci_poke_lock a raw_spinlock_t.

Message ID 20241125085314.1iSDFulg@linutronix.de
State New
Headers show
Series sparc/pci: Make pci_poke_lock a raw_spinlock_t. | expand

Commit Message

Sebastian Andrzej Siewior Nov. 25, 2024, 8:53 a.m. UTC
The pci_poke_lock is used underneath of the pci_lock. The pci_lock is a
low level lock used by the core code in sections with disabled
insterrupts. Therefore the pci_poke_lock must be a raw_spinlock_t.

Make pci_poke_lock a raw_spinlock_t.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/7656395b-58fc-4874-a9f3-6d934e2ef7ee@roeck-us.net
Fixes: 560af5dc839ee ("lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
On 2024-11-23 08:27:08 [-0800], Guenter Roeck wrote:
> 
> Is this a problem with the test or with the platform ?

platform. The patch below should fix it. It makes no difference unless
used on PREEMPT_RT. Since sparc does not support it I made the fixes
where the default option changed.

Could you test it, please? I don't have a even a compiler for sparc
right now.

 arch/sparc/kernel/pci.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

Comments

Guenter Roeck Nov. 25, 2024, 5:01 p.m. UTC | #1
On 11/25/24 00:53, Sebastian Andrzej Siewior wrote:
> The pci_poke_lock is used underneath of the pci_lock. The pci_lock is a
> low level lock used by the core code in sections with disabled
> insterrupts. Therefore the pci_poke_lock must be a raw_spinlock_t.
> 
> Make pci_poke_lock a raw_spinlock_t.
> 
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Closes: https://lore.kernel.org/7656395b-58fc-4874-a9f3-6d934e2ef7ee@roeck-us.net
> Fixes: 560af5dc839ee ("lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING.")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> On 2024-11-23 08:27:08 [-0800], Guenter Roeck wrote:
>>
>> Is this a problem with the test or with the platform ?
> 
> platform. The patch below should fix it. It makes no difference unless
> used on PREEMPT_RT. Since sparc does not support it I made the fixes
> where the default option changed.
> 
> Could you test it, please? I don't have a even a compiler for sparc
> right now.
> 

Unfortunately it doesn't make a difference.

[    1.050499] =============================
[    1.050801] [ BUG: Invalid wait context ]
[    1.051200] 6.12.0+ #1 Not tainted
[    1.051571] -----------------------------
[    1.051875] swapper/0/1 is trying to lock:
[    1.052201] 0000000001b694c8 (pci_poke_lock){....}-{3:3}, at: pci_config_read16+0x8/0x80
[    1.052994] other info that might help us debug this:
[    1.053331] context-{5:5}
[    1.053641] 2 locks held by swapper/0/1:
[    1.053959]  #0: fffff800042b50f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x80/0x160
[    1.054388]  #1: 0000000001d29078 (pci_lock){....}-{2:2}, at: pci_bus_read_config_word+0x18/0x80
[    1.054793] stack backtrace:
[    1.055171] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
[    1.055632] Call Trace:
[    1.055985] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
[    1.056329] [<00000000004e63e8>] lock_acquire+0xe8/0x340
[    1.056645] [<00000000010f0dfc>] _raw_spin_lock_irqsave+0x3c/0x80
[    1.056966] [<0000000000443828>] pci_config_read16+0x8/0x80
[    1.057278] [<000000000044442c>] sun4u_read_pci_cfg+0x12c/0x1a0
[    1.057593] [<0000000000b7657c>] pci_bus_read_config_word+0x3c/0x80
[    1.057913] [<0000000000b7fa78>] pci_find_capability+0x18/0xa0
[    1.058228] [<0000000000b794b0>] set_pcie_port_type+0x10/0x160
[    1.058543] [<0000000000442a98>] pci_of_scan_bus+0x158/0xb00
[    1.058854] [<00000000010c74a0>] pci_scan_one_pbm+0xd0/0xf8
[    1.059167] [<0000000000446174>] sabre_probe+0x1f4/0x5c0
[    1.059476] [<0000000000c13a48>] platform_probe+0x28/0x80
[    1.059785] [<0000000000c11158>] really_probe+0xb8/0x340
[    1.060098] [<0000000000c11584>] driver_probe_device+0x24/0xe0
[    1.060413] [<0000000000c117ac>] __driver_attach+0x8c/0x160
[    1.060728] [<0000000000c0ef54>] bus_for_each_dev+0x54/0xc0

The original call trace also included _raw_spin_lock_irqsave(), and
I don't have CONFIG_PREEMPT_RT enabled in my sparc64 builds to start with.

FWIW, I don't understand the value of
	pr_warn("context-{%d:%d}\n", curr_inner, curr_inner);
Why print curr_inner twice ?

Thanks,
Guenter
Sebastian Andrzej Siewior Nov. 25, 2024, 5:43 p.m. UTC | #2
On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote:
> Unfortunately it doesn't make a difference.

stunning. It looks like the exact same error message.

> [    1.050499] =============================
> [    1.050801] [ BUG: Invalid wait context ]
> [    1.051200] 6.12.0+ #1 Not tainted
> [    1.051571] -----------------------------
> [    1.051875] swapper/0/1 is trying to lock:
> [    1.052201] 0000000001b694c8 (pci_poke_lock){....}-{3:3}, at: pci_config_read16+0x8/0x80
> [    1.052994] other info that might help us debug this:
> [    1.053331] context-{5:5}
> [    1.053641] 2 locks held by swapper/0/1:
> [    1.053959]  #0: fffff800042b50f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x80/0x160
> [    1.054388]  #1: 0000000001d29078 (pci_lock){....}-{2:2}, at: pci_bus_read_config_word+0x18/0x80
> [    1.054793] stack backtrace:
> [    1.055171] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
> [    1.055632] Call Trace:
> [    1.055985] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
> [    1.056329] [<00000000004e63e8>] lock_acquire+0xe8/0x340
> [    1.056645] [<00000000010f0dfc>] _raw_spin_lock_irqsave+0x3c/0x80
> [    1.056966] [<0000000000443828>] pci_config_read16+0x8/0x80
> [    1.057278] [<000000000044442c>] sun4u_read_pci_cfg+0x12c/0x1a0
> [    1.057593] [<0000000000b7657c>] pci_bus_read_config_word+0x3c/0x80
> [    1.057913] [<0000000000b7fa78>] pci_find_capability+0x18/0xa0
> [    1.058228] [<0000000000b794b0>] set_pcie_port_type+0x10/0x160
> [    1.058543] [<0000000000442a98>] pci_of_scan_bus+0x158/0xb00
> [    1.058854] [<00000000010c74a0>] pci_scan_one_pbm+0xd0/0xf8
> [    1.059167] [<0000000000446174>] sabre_probe+0x1f4/0x5c0
> [    1.059476] [<0000000000c13a48>] platform_probe+0x28/0x80
> [    1.059785] [<0000000000c11158>] really_probe+0xb8/0x340
> [    1.060098] [<0000000000c11584>] driver_probe_device+0x24/0xe0
> [    1.060413] [<0000000000c117ac>] __driver_attach+0x8c/0x160
> [    1.060728] [<0000000000c0ef54>] bus_for_each_dev+0x54/0xc0
> 
> The original call trace also included _raw_spin_lock_irqsave(), and
> I don't have CONFIG_PREEMPT_RT enabled in my sparc64 builds to start with.

You don't have to. "CONFIG_PROVE_RAW_LOCK_NESTING" looks if you try to
acquire raw_spinlock_t -> spinlock_t. Which it did before I made the
patch.
The pci_lock is from drivers/pci/access.c and is defined as
raw_spinlock_t. And I made pci_poke_lock of the same time. But debug
says 3:3 which suggests LD_WAIT_CONFIG. (No patch applied).

> FWIW, I don't understand the value of
> 	pr_warn("context-{%d:%d}\n", curr_inner, curr_inner);
> Why print curr_inner twice ?

The syntax was once (or is) inner:outer. If you look from the top, you
have 4 (mutex_t) followed pci_lock (the raw_spinlock_t) 2. You are at
level 2 now and try to acquire spin_lock_t (3).

> Thanks,
> Guenter

Sebastian
Guenter Roeck Nov. 25, 2024, 5:59 p.m. UTC | #3
On 11/25/24 09:43, Sebastian Andrzej Siewior wrote:
> On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote:
>> Unfortunately it doesn't make a difference.
> 
> stunning. It looks like the exact same error message.
> 

I think it uses

#define spin_lock_irqsave(lock, flags)                          \
do {                                                            \
         raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
} while (0)

from include/linux/spinlock.h, meaning your patch doesn't really make a difference.

>> [    1.050499] =============================
>> [    1.050801] [ BUG: Invalid wait context ]
>> [    1.051200] 6.12.0+ #1 Not tainted
>> [    1.051571] -----------------------------
>> [    1.051875] swapper/0/1 is trying to lock:
>> [    1.052201] 0000000001b694c8 (pci_poke_lock){....}-{3:3}, at: pci_config_read16+0x8/0x80
>> [    1.052994] other info that might help us debug this:
>> [    1.053331] context-{5:5}
>> [    1.053641] 2 locks held by swapper/0/1:
>> [    1.053959]  #0: fffff800042b50f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x80/0x160
>> [    1.054388]  #1: 0000000001d29078 (pci_lock){....}-{2:2}, at: pci_bus_read_config_word+0x18/0x80
>> [    1.054793] stack backtrace:
>> [    1.055171] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
>> [    1.055632] Call Trace:
>> [    1.055985] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
>> [    1.056329] [<00000000004e63e8>] lock_acquire+0xe8/0x340
>> [    1.056645] [<00000000010f0dfc>] _raw_spin_lock_irqsave+0x3c/0x80
>> [    1.056966] [<0000000000443828>] pci_config_read16+0x8/0x80
>> [    1.057278] [<000000000044442c>] sun4u_read_pci_cfg+0x12c/0x1a0
>> [    1.057593] [<0000000000b7657c>] pci_bus_read_config_word+0x3c/0x80
>> [    1.057913] [<0000000000b7fa78>] pci_find_capability+0x18/0xa0
>> [    1.058228] [<0000000000b794b0>] set_pcie_port_type+0x10/0x160
>> [    1.058543] [<0000000000442a98>] pci_of_scan_bus+0x158/0xb00
>> [    1.058854] [<00000000010c74a0>] pci_scan_one_pbm+0xd0/0xf8
>> [    1.059167] [<0000000000446174>] sabre_probe+0x1f4/0x5c0
>> [    1.059476] [<0000000000c13a48>] platform_probe+0x28/0x80
>> [    1.059785] [<0000000000c11158>] really_probe+0xb8/0x340
>> [    1.060098] [<0000000000c11584>] driver_probe_device+0x24/0xe0
>> [    1.060413] [<0000000000c117ac>] __driver_attach+0x8c/0x160
>> [    1.060728] [<0000000000c0ef54>] bus_for_each_dev+0x54/0xc0
>>
>> The original call trace also included _raw_spin_lock_irqsave(), and
>> I don't have CONFIG_PREEMPT_RT enabled in my sparc64 builds to start with.
> 
> You don't have to. "CONFIG_PROVE_RAW_LOCK_NESTING" looks if you try to
> acquire raw_spinlock_t -> spinlock_t. Which it did before I made the
> patch.
> The pci_lock is from drivers/pci/access.c and is defined as
> raw_spinlock_t. And I made pci_poke_lock of the same time. But debug
> says 3:3 which suggests LD_WAIT_CONFIG. (No patch applied).
> 
>> FWIW, I don't understand the value of
>> 	pr_warn("context-{%d:%d}\n", curr_inner, curr_inner);
>> Why print curr_inner twice ?
> 
> The syntax was once (or is) inner:outer. If you look from the top, you
> have 4 (mutex_t) followed pci_lock (the raw_spinlock_t) 2. You are at
> level 2 now and try to acquire spin_lock_t (3).
> 

How does that explain the
	context-{5:5}
which is created from the following ?
	pr_warn("context-{%d:%d}\n", curr_inner, curr_inner);

Again, why print curr_inner twice ?

Thanks,
Guenter
Sebastian Andrzej Siewior Nov. 25, 2024, 6:12 p.m. UTC | #4
On 2024-11-25 09:59:09 [-0800], Guenter Roeck wrote:
> On 11/25/24 09:43, Sebastian Andrzej Siewior wrote:
> > On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote:
> > > Unfortunately it doesn't make a difference.
> > 
> > stunning. It looks like the exact same error message.
> > 
> 
> I think it uses
> 
> #define spin_lock_irqsave(lock, flags)                          \
> do {                                                            \
>         raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
> } while (0)
> 
> from include/linux/spinlock.h, meaning your patch doesn't really make a difference.

The difference comes from DEFINE_SPINLOCK vs DEFINE_RAW_SPINLOCK. There
is the .lock_type init which goes from LD_WAIT_CONFIG to LD_WAIT_SPIN.
And this is all it matters.

> > > [    1.050499] =============================
> > > [    1.050801] [ BUG: Invalid wait context ]
> > > [    1.051200] 6.12.0+ #1 Not tainted
> > > [    1.051571] -----------------------------
> > > [    1.051875] swapper/0/1 is trying to lock:
> > > [    1.052201] 0000000001b694c8 (pci_poke_lock){....}-{3:3}, at: pci_config_read16+0x8/0x80
> > > [    1.052994] other info that might help us debug this:
> > > [    1.053331] context-{5:5}
> > > [    1.053641] 2 locks held by swapper/0/1:
> > > [    1.053959]  #0: fffff800042b50f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x80/0x160
> > > [    1.054388]  #1: 0000000001d29078 (pci_lock){....}-{2:2}, at: pci_bus_read_config_word+0x18/0x80
> > > [    1.054793] stack backtrace:
> > > [    1.055171] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
> > > [    1.055632] Call Trace:
> > > [    1.055985] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
> > > [    1.056329] [<00000000004e63e8>] lock_acquire+0xe8/0x340
> > > [    1.056645] [<00000000010f0dfc>] _raw_spin_lock_irqsave+0x3c/0x80
> > > [    1.056966] [<0000000000443828>] pci_config_read16+0x8/0x80
> > > [    1.057278] [<000000000044442c>] sun4u_read_pci_cfg+0x12c/0x1a0
> > > [    1.057593] [<0000000000b7657c>] pci_bus_read_config_word+0x3c/0x80
> > > [    1.057913] [<0000000000b7fa78>] pci_find_capability+0x18/0xa0
> > > [    1.058228] [<0000000000b794b0>] set_pcie_port_type+0x10/0x160
> > > [    1.058543] [<0000000000442a98>] pci_of_scan_bus+0x158/0xb00
> > > [    1.058854] [<00000000010c74a0>] pci_scan_one_pbm+0xd0/0xf8
> > > [    1.059167] [<0000000000446174>] sabre_probe+0x1f4/0x5c0
> > > [    1.059476] [<0000000000c13a48>] platform_probe+0x28/0x80
> > > [    1.059785] [<0000000000c11158>] really_probe+0xb8/0x340
> > > [    1.060098] [<0000000000c11584>] driver_probe_device+0x24/0xe0
> > > [    1.060413] [<0000000000c117ac>] __driver_attach+0x8c/0x160
> > > [    1.060728] [<0000000000c0ef54>] bus_for_each_dev+0x54/0xc0
> > > 
> > > The original call trace also included _raw_spin_lock_irqsave(), and
> > > I don't have CONFIG_PREEMPT_RT enabled in my sparc64 builds to start with.
> > 
> > You don't have to. "CONFIG_PROVE_RAW_LOCK_NESTING" looks if you try to
> > acquire raw_spinlock_t -> spinlock_t. Which it did before I made the
> > patch.
> > The pci_lock is from drivers/pci/access.c and is defined as
> > raw_spinlock_t. And I made pci_poke_lock of the same time. But debug
> > says 3:3 which suggests LD_WAIT_CONFIG. (No patch applied).
> > 
> > > FWIW, I don't understand the value of
> > > 	pr_warn("context-{%d:%d}\n", curr_inner, curr_inner);
> > > Why print curr_inner twice ?
> > 
> > The syntax was once (or is) inner:outer. If you look from the top, you
> > have 4 (mutex_t) followed pci_lock (the raw_spinlock_t) 2. You are at
> > level 2 now and try to acquire spin_lock_t (3).
> > 
> 
> How does that explain the
> 	context-{5:5}

This is max value based on context. Your context is a simple process.
Not handling an interrupt or anything of this kind.
The culprit is 
|  swapper/0/1 is trying to lock:
|  0000000001b694c8 (pci_poke_lock){....}-{3:3}, at: pci_config_read16+0x8/0x80

where you have pci_poke_lock classified as a 3. The context allows a 5
so based on the context, the 3 would fly. But since pci_lock is a 2 we
have the splat here.

> which is created from the following ?
> 	pr_warn("context-{%d:%d}\n", curr_inner, curr_inner);
> 
> Again, why print curr_inner twice ?

It is the same syntax as in print_lock_name(). Except here, we don't
have an outer type. The difference is RCU because it has a lower type
than a spinlock_t and you can acquire a spinlock_t within an RCU
section and lockdep is fine with it. It comes yelling once you try this
with a mutex_t.

> Thanks,
> Guenter

Sebastian
Guenter Roeck Nov. 25, 2024, 7:23 p.m. UTC | #5
On 11/25/24 10:12, Sebastian Andrzej Siewior wrote:
> On 2024-11-25 09:59:09 [-0800], Guenter Roeck wrote:
>> On 11/25/24 09:43, Sebastian Andrzej Siewior wrote:
>>> On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote:
>>>> Unfortunately it doesn't make a difference.
>>>
>>> stunning. It looks like the exact same error message.
>>>
>>
>> I think it uses
>>
>> #define spin_lock_irqsave(lock, flags)                          \
>> do {                                                            \
>>          raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
>> } while (0)
>>
>> from include/linux/spinlock.h, meaning your patch doesn't really make a difference.
> 
> The difference comes from DEFINE_SPINLOCK vs DEFINE_RAW_SPINLOCK. There
> is the .lock_type init which goes from LD_WAIT_CONFIG to LD_WAIT_SPIN.
> And this is all it matters.
> 

Ah, now I get it. Thanks for the explanation. And it turns out my log was wrong.
I must have taken it from the old image. Sorry for that.

That specific backtrace isn't seen anymore. But there is another one.

[    1.779653] =============================
[    1.779860] [ BUG: Invalid wait context ]
[    1.780139] 6.12.0+ #1 Not tainted
[    1.780394] -----------------------------
[    1.780600] swapper/0/1 is trying to lock:
[    1.780824] 0000000001b68888 (cpu_map_lock){....}-{3:3}, at: map_to_cpu+0x10/0x80
[    1.781393] other info that might help us debug this:
[    1.781624] context-{5:5}
[    1.781838] 3 locks held by swapper/0/1:
[    1.782055]  #0: fffff800042b90f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x80/0x160
[    1.782345]  #1: fffff800040f2c18 (&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq+0xa0/0x6e0
[    1.782632]  #2: fffff800040f2ab0 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xc8/0x6e0
[    1.782912] stack backtrace:
[    1.783172] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
[    1.783498] Call Trace:
[    1.783734] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
[    1.783971] [<00000000004e63e8>] lock_acquire+0xe8/0x340
[    1.784191] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
[    1.784417] [<000000000043ed90>] map_to_cpu+0x10/0x80
[    1.784633] [<000000000042b2b8>] sun4u_irq_enable+0x18/0x80
[    1.784854] [<00000000004fb6b4>] irq_enable+0x34/0xc0
[    1.785069] [<00000000004fb7b8>] __irq_startup+0x78/0xe0
[    1.785287] [<00000000004fb8f0>] irq_startup+0xd0/0x1a0
[    1.785503] [<00000000004f85b4>] __setup_irq+0x5f4/0x6e0
[    1.785726] [<00000000004f8754>] request_threaded_irq+0xb4/0x1a0
[    1.785950] [<0000000000439930>] power_probe+0x70/0xe0
[    1.786165] [<0000000000c13a68>] platform_probe+0x28/0x80
[    1.786382] [<0000000000c11178>] really_probe+0xb8/0x340
[    1.786599] [<0000000000c115a4>] driver_probe_device+0x24/0xe0
[    1.786820] [<0000000000c117cc>] __driver_attach+0x8c/0x160
[    1.787039] [<0000000000c0ef74>] bus_for_each_dev+0x54/0xc0

After replacing cpu_map_lock with a raw spinlock, I get:

[    2.015140] =============================
[    2.015247] [ BUG: Invalid wait context ]
[    2.015419] 6.12.0+ #1 Not tainted
[    2.015564] -----------------------------
[    2.015668] swapper/0/1 is trying to lock:
[    2.015791] fffff80004870610 (&mm->context.lock){....}-{3:3}, at: __schedule+0x410/0x5b0
[    2.016306] other info that might help us debug this:
[    2.016451] context-{5:5}
[    2.016539] 3 locks held by swapper/0/1:
[    2.016652]  #0: 0000000001d11f38 (key_types_sem){++++}-{4:4}, at: __key_create_or_update+0x5c/0x4c0
[    2.016934]  #1: 0000000001d1b850 (asymmetric_key_parsers_sem){++++}-{4:4}, at: asymmetric_key_preparse+0x18/0xa0
[    2.017197]  #2: fffff8001f811a98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xdc/0x5b0
[    2.017412] stack backtrace:
[    2.017551] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1
[    2.017800] Call Trace:
[    2.017910] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
[    2.018062] [<00000000004e63e8>] lock_acquire+0xe8/0x340
[    2.018192] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
[    2.018341] [<00000000010e5050>] __schedule+0x410/0x5b0
[    2.018469] [<00000000010e5ae4>] schedule+0x44/0x1c0
[    2.018591] [<00000000010f0684>] schedule_timeout+0xa4/0x100
[    2.018730] [<00000000010e668c>] __wait_for_common+0xac/0x1a0
[    2.018869] [<00000000010e6878>] wait_for_completion_state+0x18/0x40
[    2.019022] [<000000000048ad18>] call_usermodehelper_exec+0x138/0x1c0
[    2.019177] [<000000000052eb40>] __request_module+0x160/0x2e0
[    2.019316] [<00000000009ba6dc>] crypto_alg_mod_lookup+0x17c/0x280
[    2.019466] [<00000000009ba990>] crypto_alloc_tfm_node+0x30/0x100
[    2.019614] [<00000000009dcc5c>] public_key_verify_signature+0xbc/0x260
[    2.019772] [<00000000009ded8c>] x509_check_for_self_signed+0xac/0x280
[    2.019928] [<00000000009dddec>] x509_cert_parse+0x14c/0x220
[    2.020065] [<00000000009dea08>] x509_key_preparse+0x8/0x1e0

The problem here is

typedef struct {
         spinlock_t              lock;		<--
         unsigned long           sparc64_ctx_val;
         unsigned long           hugetlb_pte_count;
         unsigned long           thp_pte_count;
         struct tsb_config       tsb_block[MM_NUM_TSBS];
         struct hv_tsb_descr     tsb_descr[MM_NUM_TSBS];
         void                    *vdso;
         bool                    adi;
         tag_storage_desc_t      *tag_store;
         spinlock_t              tag_lock;
} mm_context_t;

Replacing that with a raw spinlock just triggers the next one.

[    2.035384] =============================
[    2.035490] [ BUG: Invalid wait context ]
[    2.035660] 6.12.0+ #3 Not tainted
[    2.035802] -----------------------------
[    2.035906] kworker/u4:3/48 is trying to lock:
[    2.036036] 0000000001b6a790 (ctx_alloc_lock){....}-{3:3}, at: get_new_mmu_context+0x14/0x280
[    2.036558] other info that might help us debug this:
[    2.036697] context-{5:5}
[    2.036784] 4 locks held by kworker/u4:3/48:
[    2.036906]  #0: fffff80004838a70 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xc/0x8e0
[    2.037169]  #1: fffff80004838b08 (&sig->exec_update_lock){+.+.}-{4:4}, at: begin_new_exec+0x344/0xbe0
[    2.037411]  #2: fffff800047fc940 (&p->alloc_lock){+.+.}-{3:3}, at: begin_new_exec+0x3a0/0xbe0
[    2.037639]  #3: fffff80004848610 (&mm->context.lock){....}-{2:2}, at: begin_new_exec+0x41c/0xbe0

Fixing that finally gives me a clean run. Nevertheless, that makes me wonder:
Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime tests ?

Thanks,
Guenter
Waiman Long Nov. 25, 2024, 7:33 p.m. UTC | #6
On 11/25/24 2:23 PM, Guenter Roeck wrote:
> On 11/25/24 10:12, Sebastian Andrzej Siewior wrote:
>> On 2024-11-25 09:59:09 [-0800], Guenter Roeck wrote:
>>> On 11/25/24 09:43, Sebastian Andrzej Siewior wrote:
>>>> On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote:
>>>>> Unfortunately it doesn't make a difference.
>>>>
>>>> stunning. It looks like the exact same error message.
>>>>
>>>
>>> I think it uses
>>>
>>> #define spin_lock_irqsave(lock, flags)                          \
>>> do {                                                            \
>>>          raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
>>> } while (0)
>>>
>>> from include/linux/spinlock.h, meaning your patch doesn't really 
>>> make a difference.
>>
>> The difference comes from DEFINE_SPINLOCK vs DEFINE_RAW_SPINLOCK. There
>> is the .lock_type init which goes from LD_WAIT_CONFIG to LD_WAIT_SPIN.
>> And this is all it matters.
>>
>
> Ah, now I get it. Thanks for the explanation. And it turns out my log 
> was wrong.
> I must have taken it from the old image. Sorry for that.
>
> That specific backtrace isn't seen anymore. But there is another one.
>
> [    1.779653] =============================
> [    1.779860] [ BUG: Invalid wait context ]
> [    1.780139] 6.12.0+ #1 Not tainted
> [    1.780394] -----------------------------
> [    1.780600] swapper/0/1 is trying to lock:
> [    1.780824] 0000000001b68888 (cpu_map_lock){....}-{3:3}, at: 
> map_to_cpu+0x10/0x80
> [    1.781393] other info that might help us debug this:
> [    1.781624] context-{5:5}
> [    1.781838] 3 locks held by swapper/0/1:
> [    1.782055]  #0: fffff800042b90f8 (&dev->mutex){....}-{4:4}, at: 
> __driver_attach+0x80/0x160
> [    1.782345]  #1: fffff800040f2c18 
> (&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq+0xa0/0x6e0
> [    1.782632]  #2: fffff800040f2ab0 
> (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xc8/0x6e0
> [    1.782912] stack backtrace:
> [    1.783172] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
> 6.12.0+ #1
> [    1.783498] Call Trace:
> [    1.783734] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
> [    1.783971] [<00000000004e63e8>] lock_acquire+0xe8/0x340
> [    1.784191] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
> [    1.784417] [<000000000043ed90>] map_to_cpu+0x10/0x80
> [    1.784633] [<000000000042b2b8>] sun4u_irq_enable+0x18/0x80
> [    1.784854] [<00000000004fb6b4>] irq_enable+0x34/0xc0
> [    1.785069] [<00000000004fb7b8>] __irq_startup+0x78/0xe0
> [    1.785287] [<00000000004fb8f0>] irq_startup+0xd0/0x1a0
> [    1.785503] [<00000000004f85b4>] __setup_irq+0x5f4/0x6e0
> [    1.785726] [<00000000004f8754>] request_threaded_irq+0xb4/0x1a0
> [    1.785950] [<0000000000439930>] power_probe+0x70/0xe0
> [    1.786165] [<0000000000c13a68>] platform_probe+0x28/0x80
> [    1.786382] [<0000000000c11178>] really_probe+0xb8/0x340
> [    1.786599] [<0000000000c115a4>] driver_probe_device+0x24/0xe0
> [    1.786820] [<0000000000c117cc>] __driver_attach+0x8c/0x160
> [    1.787039] [<0000000000c0ef74>] bus_for_each_dev+0x54/0xc0
>
> After replacing cpu_map_lock with a raw spinlock, I get:
>
> [    2.015140] =============================
> [    2.015247] [ BUG: Invalid wait context ]
> [    2.015419] 6.12.0+ #1 Not tainted
> [    2.015564] -----------------------------
> [    2.015668] swapper/0/1 is trying to lock:
> [    2.015791] fffff80004870610 (&mm->context.lock){....}-{3:3}, at: 
> __schedule+0x410/0x5b0
> [    2.016306] other info that might help us debug this:
> [    2.016451] context-{5:5}
> [    2.016539] 3 locks held by swapper/0/1:
> [    2.016652]  #0: 0000000001d11f38 (key_types_sem){++++}-{4:4}, at: 
> __key_create_or_update+0x5c/0x4c0
> [    2.016934]  #1: 0000000001d1b850 
> (asymmetric_key_parsers_sem){++++}-{4:4}, at: 
> asymmetric_key_preparse+0x18/0xa0
> [    2.017197]  #2: fffff8001f811a98 (&rq->__lock){-.-.}-{2:2}, at: 
> __schedule+0xdc/0x5b0
> [    2.017412] stack backtrace:
> [    2.017551] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
> 6.12.0+ #1
> [    2.017800] Call Trace:
> [    2.017910] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
> [    2.018062] [<00000000004e63e8>] lock_acquire+0xe8/0x340
> [    2.018192] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
> [    2.018341] [<00000000010e5050>] __schedule+0x410/0x5b0
> [    2.018469] [<00000000010e5ae4>] schedule+0x44/0x1c0
> [    2.018591] [<00000000010f0684>] schedule_timeout+0xa4/0x100
> [    2.018730] [<00000000010e668c>] __wait_for_common+0xac/0x1a0
> [    2.018869] [<00000000010e6878>] wait_for_completion_state+0x18/0x40
> [    2.019022] [<000000000048ad18>] call_usermodehelper_exec+0x138/0x1c0
> [    2.019177] [<000000000052eb40>] __request_module+0x160/0x2e0
> [    2.019316] [<00000000009ba6dc>] crypto_alg_mod_lookup+0x17c/0x280
> [    2.019466] [<00000000009ba990>] crypto_alloc_tfm_node+0x30/0x100
> [    2.019614] [<00000000009dcc5c>] 
> public_key_verify_signature+0xbc/0x260
> [    2.019772] [<00000000009ded8c>] x509_check_for_self_signed+0xac/0x280
> [    2.019928] [<00000000009dddec>] x509_cert_parse+0x14c/0x220
> [    2.020065] [<00000000009dea08>] x509_key_preparse+0x8/0x1e0
>
> The problem here is
>
> typedef struct {
>         spinlock_t              lock;        <--
>         unsigned long           sparc64_ctx_val;
>         unsigned long           hugetlb_pte_count;
>         unsigned long           thp_pte_count;
>         struct tsb_config       tsb_block[MM_NUM_TSBS];
>         struct hv_tsb_descr     tsb_descr[MM_NUM_TSBS];
>         void                    *vdso;
>         bool                    adi;
>         tag_storage_desc_t      *tag_store;
>         spinlock_t              tag_lock;
> } mm_context_t;
>
> Replacing that with a raw spinlock just triggers the next one.
>
> [    2.035384] =============================
> [    2.035490] [ BUG: Invalid wait context ]
> [    2.035660] 6.12.0+ #3 Not tainted
> [    2.035802] -----------------------------
> [    2.035906] kworker/u4:3/48 is trying to lock:
> [    2.036036] 0000000001b6a790 (ctx_alloc_lock){....}-{3:3}, at: 
> get_new_mmu_context+0x14/0x280
> [    2.036558] other info that might help us debug this:
> [    2.036697] context-{5:5}
> [    2.036784] 4 locks held by kworker/u4:3/48:
> [    2.036906]  #0: fffff80004838a70 
> (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xc/0x8e0
> [    2.037169]  #1: fffff80004838b08 
> (&sig->exec_update_lock){+.+.}-{4:4}, at: begin_new_exec+0x344/0xbe0
> [    2.037411]  #2: fffff800047fc940 (&p->alloc_lock){+.+.}-{3:3}, at: 
> begin_new_exec+0x3a0/0xbe0
> [    2.037639]  #3: fffff80004848610 (&mm->context.lock){....}-{2:2}, 
> at: begin_new_exec+0x41c/0xbe0
>
> Fixing that finally gives me a clean run. Nevertheless, that makes me 
> wonder:
> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime 
> tests ?

If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose you 
could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.

Cheers,
Longman
Guenter Roeck Nov. 25, 2024, 8:06 p.m. UTC | #7
On 11/25/24 11:33, Waiman Long wrote:
[ ... ]
>> Fixing that finally gives me a clean run. Nevertheless, that makes me wonder:
>> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime tests ?
> 
> If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose you could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.
> 

SGTM. I'll do that unless someone gives me a good reason to keep it enabled.

Thanks,
Guenter
Guenter Roeck Nov. 25, 2024, 8:23 p.m. UTC | #8
On 11/25/24 12:06, Guenter Roeck wrote:
> On 11/25/24 11:33, Waiman Long wrote:
> [ ... ]
>>> Fixing that finally gives me a clean run. Nevertheless, that makes me wonder:
>>> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime tests ?
>>
>> If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose you could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.
>>
> 
> SGTM. I'll do that unless someone gives me a good reason to keep it enabled.
> 

Actually it can not be disabled with a configuration flag. It is
automatically enabled. I'll have to disable PROVE_LOCKING to disable it.

config PROVE_RAW_LOCK_NESTING
         bool					<---- no longer user configurable
         depends on PROVE_LOCKING
         default y
         help
          Enable the raw_spinlock vs. spinlock nesting checks which ensure
          that the lock nesting rules for PREEMPT_RT enabled kernels are
          not violated.

I don't really like that, and I don't understand the logic behind it,
but it is what it is.

FWIW, the description of commit 560af5dc839 is misleading. It says "Enable
PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what the
commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
enabled. It is all or nothing.

Guenter
Waiman Long Nov. 25, 2024, 8:54 p.m. UTC | #9
On 11/25/24 3:23 PM, Guenter Roeck wrote:
> On 11/25/24 12:06, Guenter Roeck wrote:
>> On 11/25/24 11:33, Waiman Long wrote:
>> [ ... ]
>>>> Fixing that finally gives me a clean run. Nevertheless, that makes 
>>>> me wonder:
>>>> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc 
>>>> runtime tests ?
>>>
>>> If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose you 
>>> could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.
>>>
>>
>> SGTM. I'll do that unless someone gives me a good reason to keep it 
>> enabled.
>>
>
> Actually it can not be disabled with a configuration flag. It is
> automatically enabled. I'll have to disable PROVE_LOCKING to disable it.
>
> config PROVE_RAW_LOCK_NESTING
>         bool                    <---- no longer user configurable
>         depends on PROVE_LOCKING
>         default y
>         help
>          Enable the raw_spinlock vs. spinlock nesting checks which ensure
>          that the lock nesting rules for PREEMPT_RT enabled kernels are
>          not violated.
>
> I don't really like that, and I don't understand the logic behind it,
> but it is what it is.
>
> FWIW, the description of commit 560af5dc839 is misleading. It says 
> "Enable
> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what 
> the
> commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
> enabled. It is all or nothing.
>
I think we can relax it by

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5d9eca035d47..bfdbd3fa2d29 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1399,7 +1399,7 @@ config PROVE_LOCKING
  config PROVE_RAW_LOCK_NESTING
         bool
         depends on PROVE_LOCKING
-       default y
+       default y if ARCH_SUPPORTS_RT
         help
          Enable the raw_spinlock vs. spinlock nesting checks which ensure
          that the lock nesting rules for PREEMPT_RT enabled kernels are

Sebastian, what do you think?

Cheers,
Longman

> Guenter
>
Guenter Roeck Nov. 25, 2024, 9:25 p.m. UTC | #10
On 11/25/24 12:54, Waiman Long wrote:
> 
> On 11/25/24 3:23 PM, Guenter Roeck wrote:
>> On 11/25/24 12:06, Guenter Roeck wrote:
>>> On 11/25/24 11:33, Waiman Long wrote:
>>> [ ... ]
>>>>> Fixing that finally gives me a clean run. Nevertheless, that makes me wonder:
>>>>> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime tests ?
>>>>
>>>> If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose you could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.
>>>>
>>>
>>> SGTM. I'll do that unless someone gives me a good reason to keep it enabled.
>>>
>>
>> Actually it can not be disabled with a configuration flag. It is
>> automatically enabled. I'll have to disable PROVE_LOCKING to disable it.
>>
>> config PROVE_RAW_LOCK_NESTING
>>         bool                    <---- no longer user configurable
>>         depends on PROVE_LOCKING
>>         default y
>>         help
>>          Enable the raw_spinlock vs. spinlock nesting checks which ensure
>>          that the lock nesting rules for PREEMPT_RT enabled kernels are
>>          not violated.
>>
>> I don't really like that, and I don't understand the logic behind it,
>> but it is what it is.
>>
>> FWIW, the description of commit 560af5dc839 is misleading. It says "Enable
>> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what the
>> commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
>> enabled. It is all or nothing.
>>
> I think we can relax it by
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 5d9eca035d47..bfdbd3fa2d29 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>   config PROVE_RAW_LOCK_NESTING
>          bool
>          depends on PROVE_LOCKING
> -       default y
> +       default y if ARCH_SUPPORTS_RT
>          help
>           Enable the raw_spinlock vs. spinlock nesting checks which ensure
>           that the lock nesting rules for PREEMPT_RT enabled kernels are
> 
> Sebastian, what do you think?
> 

	depends on PROVE_LOCKING && ARCH_SUPPORTS_RT

seems to make more sense to me.

Guenter
Waiman Long Nov. 25, 2024, 9:29 p.m. UTC | #11
On 11/25/24 4:25 PM, Guenter Roeck wrote:
> On 11/25/24 12:54, Waiman Long wrote:
>>
>> On 11/25/24 3:23 PM, Guenter Roeck wrote:
>>> On 11/25/24 12:06, Guenter Roeck wrote:
>>>> On 11/25/24 11:33, Waiman Long wrote:
>>>> [ ... ]
>>>>>> Fixing that finally gives me a clean run. Nevertheless, that 
>>>>>> makes me wonder:
>>>>>> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc 
>>>>>> runtime tests ?
>>>>>
>>>>> If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose 
>>>>> you could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.
>>>>>
>>>>
>>>> SGTM. I'll do that unless someone gives me a good reason to keep it 
>>>> enabled.
>>>>
>>>
>>> Actually it can not be disabled with a configuration flag. It is
>>> automatically enabled. I'll have to disable PROVE_LOCKING to disable 
>>> it.
>>>
>>> config PROVE_RAW_LOCK_NESTING
>>>         bool                    <---- no longer user configurable
>>>         depends on PROVE_LOCKING
>>>         default y
>>>         help
>>>          Enable the raw_spinlock vs. spinlock nesting checks which 
>>> ensure
>>>          that the lock nesting rules for PREEMPT_RT enabled kernels are
>>>          not violated.
>>>
>>> I don't really like that, and I don't understand the logic behind it,
>>> but it is what it is.
>>>
>>> FWIW, the description of commit 560af5dc839 is misleading. It says 
>>> "Enable
>>> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not 
>>> what the
>>> commit does. It force-enables PROVE_RAW_LOCK_NESTING if 
>>> PROVE_LOCKING is
>>> enabled. It is all or nothing.
>>>
>> I think we can relax it by
>>
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index 5d9eca035d47..bfdbd3fa2d29 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>>   config PROVE_RAW_LOCK_NESTING
>>          bool
>>          depends on PROVE_LOCKING
>> -       default y
>> +       default y if ARCH_SUPPORTS_RT
>>          help
>>           Enable the raw_spinlock vs. spinlock nesting checks which 
>> ensure
>>           that the lock nesting rules for PREEMPT_RT enabled kernels are
>>
>> Sebastian, what do you think?
>>
>
>     depends on PROVE_LOCKING && ARCH_SUPPORTS_RT
>
> seems to make more sense to me.

That will work too, but that will enforce that arches with no 
ARCH_SUPPORTS_RT will not be able to enable PROVE_RAW_LOCK_NESTING even 
if people want to try it out.

Cheers,
Longman
Guenter Roeck Nov. 25, 2024, 9:54 p.m. UTC | #12
On 11/25/24 13:29, Waiman Long wrote:
> 
> On 11/25/24 4:25 PM, Guenter Roeck wrote:
>> On 11/25/24 12:54, Waiman Long wrote:
>>>
>>> On 11/25/24 3:23 PM, Guenter Roeck wrote:
>>>> On 11/25/24 12:06, Guenter Roeck wrote:
>>>>> On 11/25/24 11:33, Waiman Long wrote:
>>>>> [ ... ]
>>>>>>> Fixing that finally gives me a clean run. Nevertheless, that makes me wonder:
>>>>>>> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime tests ?
>>>>>>
>>>>>> If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose you could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.
>>>>>>
>>>>>
>>>>> SGTM. I'll do that unless someone gives me a good reason to keep it enabled.
>>>>>
>>>>
>>>> Actually it can not be disabled with a configuration flag. It is
>>>> automatically enabled. I'll have to disable PROVE_LOCKING to disable it.
>>>>
>>>> config PROVE_RAW_LOCK_NESTING
>>>>         bool                    <---- no longer user configurable
>>>>         depends on PROVE_LOCKING
>>>>         default y
>>>>         help
>>>>          Enable the raw_spinlock vs. spinlock nesting checks which ensure
>>>>          that the lock nesting rules for PREEMPT_RT enabled kernels are
>>>>          not violated.
>>>>
>>>> I don't really like that, and I don't understand the logic behind it,
>>>> but it is what it is.
>>>>
>>>> FWIW, the description of commit 560af5dc839 is misleading. It says "Enable
>>>> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what the
>>>> commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
>>>> enabled. It is all or nothing.
>>>>
>>> I think we can relax it by
>>>
>>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>>> index 5d9eca035d47..bfdbd3fa2d29 100644
>>> --- a/lib/Kconfig.debug
>>> +++ b/lib/Kconfig.debug
>>> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>>>   config PROVE_RAW_LOCK_NESTING
>>>          bool
>>>          depends on PROVE_LOCKING
>>> -       default y
>>> +       default y if ARCH_SUPPORTS_RT
>>>          help
>>>           Enable the raw_spinlock vs. spinlock nesting checks which ensure
>>>           that the lock nesting rules for PREEMPT_RT enabled kernels are
>>>
>>> Sebastian, what do you think?
>>>
>>
>>     depends on PROVE_LOCKING && ARCH_SUPPORTS_RT
>>
>> seems to make more sense to me.
> 
> That will work too, but that will enforce that arches with no ARCH_SUPPORTS_RT will not be able to enable PROVE_RAW_LOCK_NESTING even if people want to try it out.
> 

No architecture will be able to enable anything because "bool" has no
string associated with it. As mentioned before, it is all or nothing.
Otherwise I could just configure "CONFIG_PROVE_RAW_LOCK_NESTING=n"
for sparc and be done.

Guenter
Waiman Long Nov. 25, 2024, 10:33 p.m. UTC | #13
On 11/25/24 4:54 PM, Guenter Roeck wrote:
> On 11/25/24 13:29, Waiman Long wrote:
>>
>> On 11/25/24 4:25 PM, Guenter Roeck wrote:
>>> On 11/25/24 12:54, Waiman Long wrote:
>>>>
>>>> On 11/25/24 3:23 PM, Guenter Roeck wrote:
>>>>> On 11/25/24 12:06, Guenter Roeck wrote:
>>>>>> On 11/25/24 11:33, Waiman Long wrote:
>>>>>> [ ... ]
>>>>>>>> Fixing that finally gives me a clean run. Nevertheless, that 
>>>>>>>> makes me wonder:
>>>>>>>> Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc 
>>>>>>>> runtime tests ?
>>>>>>>
>>>>>>> If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose 
>>>>>>> you could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the 
>>>>>>> trouble.
>>>>>>>
>>>>>>
>>>>>> SGTM. I'll do that unless someone gives me a good reason to keep 
>>>>>> it enabled.
>>>>>>
>>>>>
>>>>> Actually it can not be disabled with a configuration flag. It is
>>>>> automatically enabled. I'll have to disable PROVE_LOCKING to 
>>>>> disable it.
>>>>>
>>>>> config PROVE_RAW_LOCK_NESTING
>>>>>         bool                    <---- no longer user configurable
>>>>>         depends on PROVE_LOCKING
>>>>>         default y
>>>>>         help
>>>>>          Enable the raw_spinlock vs. spinlock nesting checks which 
>>>>> ensure
>>>>>          that the lock nesting rules for PREEMPT_RT enabled 
>>>>> kernels are
>>>>>          not violated.
>>>>>
>>>>> I don't really like that, and I don't understand the logic behind it,
>>>>> but it is what it is.
>>>>>
>>>>> FWIW, the description of commit 560af5dc839 is misleading. It says 
>>>>> "Enable
>>>>> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not 
>>>>> what the
>>>>> commit does. It force-enables PROVE_RAW_LOCK_NESTING if 
>>>>> PROVE_LOCKING is
>>>>> enabled. It is all or nothing.
>>>>>
>>>> I think we can relax it by
>>>>
>>>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>>>> index 5d9eca035d47..bfdbd3fa2d29 100644
>>>> --- a/lib/Kconfig.debug
>>>> +++ b/lib/Kconfig.debug
>>>> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>>>>   config PROVE_RAW_LOCK_NESTING
>>>>          bool
>>>>          depends on PROVE_LOCKING
>>>> -       default y
>>>> +       default y if ARCH_SUPPORTS_RT
>>>>          help
>>>>           Enable the raw_spinlock vs. spinlock nesting checks which 
>>>> ensure
>>>>           that the lock nesting rules for PREEMPT_RT enabled 
>>>> kernels are
>>>>
>>>> Sebastian, what do you think?
>>>>
>>>
>>>     depends on PROVE_LOCKING && ARCH_SUPPORTS_RT
>>>
>>> seems to make more sense to me.
>>
>> That will work too, but that will enforce that arches with no 
>> ARCH_SUPPORTS_RT will not be able to enable PROVE_RAW_LOCK_NESTING 
>> even if people want to try it out.
>>
>
> No architecture will be able to enable anything because "bool" has no
> string associated with it. As mentioned before, it is all or nothing.
> Otherwise I could just configure "CONFIG_PROVE_RAW_LOCK_NESTING=n"
> for sparc and be done.

Yes, you are right.

Cheers,
Longman
Sebastian Andrzej Siewior Nov. 26, 2024, 11:20 a.m. UTC | #14
On 2024-11-25 15:54:48 [-0500], Waiman Long wrote:
> > FWIW, the description of commit 560af5dc839 is misleading. It says
> > "Enable
> > PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what
> > the
> > commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
> > enabled. It is all or nothing.
> > 
> I think we can relax it by
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 5d9eca035d47..bfdbd3fa2d29 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>  config PROVE_RAW_LOCK_NESTING
>         bool
>         depends on PROVE_LOCKING
> -       default y
> +       default y if ARCH_SUPPORTS_RT
>         help
>          Enable the raw_spinlock vs. spinlock nesting checks which ensure
>          that the lock nesting rules for PREEMPT_RT enabled kernels are
> 
> Sebastian, what do you think?

All the changes Guenter proposed make sense and were limited to sparc.
So we could apply that. Limiting the option to the RT architectures
would silence the warnings. If there is no interest in getting RT on
sparc there is probably no interest in getting the lock ordering
straight.
I remember PeterZ did not like the option in the beginning but there was
no way around it especially since printk triggered it on boot.
I'm fine with both solutions (fixing sparc or limiting
PROVE_RAW_LOCK_NESTING). I leave the final judgment to the locking
people.

> Cheers,
> Longman

Sebastian
Waiman Long Nov. 26, 2024, 4:59 p.m. UTC | #15
On 11/26/24 6:20 AM, Sebastian Andrzej Siewior wrote:
> On 2024-11-25 15:54:48 [-0500], Waiman Long wrote:
>>> FWIW, the description of commit 560af5dc839 is misleading. It says
>>> "Enable
>>> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what
>>> the
>>> commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
>>> enabled. It is all or nothing.
>>>
>> I think we can relax it by
>>
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index 5d9eca035d47..bfdbd3fa2d29 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>>   config PROVE_RAW_LOCK_NESTING
>>          bool
>>          depends on PROVE_LOCKING
>> -       default y
>> +       default y if ARCH_SUPPORTS_RT
>>          help
>>           Enable the raw_spinlock vs. spinlock nesting checks which ensure
>>           that the lock nesting rules for PREEMPT_RT enabled kernels are
>>
>> Sebastian, what do you think?
> All the changes Guenter proposed make sense and were limited to sparc.
> So we could apply that. Limiting the option to the RT architectures
> would silence the warnings. If there is no interest in getting RT on
> sparc there is probably no interest in getting the lock ordering
> straight.
> I remember PeterZ did not like the option in the beginning but there was
> no way around it especially since printk triggered it on boot.
> I'm fine with both solutions (fixing sparc or limiting
> PROVE_RAW_LOCK_NESTING). I leave the final judgment to the locking
> people.

Right now, ARCH_SUPPORTS_RT is defined for most of the major arches 
where most of the testings are being done. So even if we limit this to 
just those arches, we will not lose much testing anyway. This does have 
the advantage of not forcing other legacy arches from doing extra works 
with no real gain from their point of view.

Cheers,
Longman
Andreas Larsson Nov. 27, 2024, 3:39 p.m. UTC | #16
On 2024-11-26 17:59, Waiman Long wrote:
> 
> On 11/26/24 6:20 AM, Sebastian Andrzej Siewior wrote:
>> On 2024-11-25 15:54:48 [-0500], Waiman Long wrote:
>>>> FWIW, the description of commit 560af5dc839 is misleading. It says
>>>> "Enable
>>>> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what
>>>> the
>>>> commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
>>>> enabled. It is all or nothing.
>>>>
>>> I think we can relax it by
>>>
>>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>>> index 5d9eca035d47..bfdbd3fa2d29 100644
>>> --- a/lib/Kconfig.debug
>>> +++ b/lib/Kconfig.debug
>>> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>>>   config PROVE_RAW_LOCK_NESTING
>>>          bool
>>>          depends on PROVE_LOCKING
>>> -       default y
>>> +       default y if ARCH_SUPPORTS_RT
>>>          help
>>>           Enable the raw_spinlock vs. spinlock nesting checks which ensure
>>>           that the lock nesting rules for PREEMPT_RT enabled kernels are
>>>
>>> Sebastian, what do you think?
>> All the changes Guenter proposed make sense and were limited to sparc.
>> So we could apply that. Limiting the option to the RT architectures
>> would silence the warnings. If there is no interest in getting RT on
>> sparc there is probably no interest in getting the lock ordering
>> straight.
>> I remember PeterZ did not like the option in the beginning but there was
>> no way around it especially since printk triggered it on boot.
>> I'm fine with both solutions (fixing sparc or limiting
>> PROVE_RAW_LOCK_NESTING). I leave the final judgment to the locking
>> people.
> 
> Right now, ARCH_SUPPORTS_RT is defined for most of the major arches
> where most of the testings are being done. So even if we limit this to
> just those arches, we will not lose much testing anyway. This does have
> the advantage of not forcing other legacy arches from doing extra works
> with no real gain from their point of view.

Even though this is for sparc64, there is work being done looking into
enabling RT for sparc32. If the amount of fixes needed to keep
PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd rather
see it enabled for sparc rather than risking it becoming worse in the
future.

I don't know what the situation is for other architectures that does not
support RT.

Cheers,
Andreas
Guenter Roeck Nov. 27, 2024, 4:02 p.m. UTC | #17
On 11/27/24 07:39, Andreas Larsson wrote:
> On 2024-11-26 17:59, Waiman Long wrote:
>>
>> On 11/26/24 6:20 AM, Sebastian Andrzej Siewior wrote:
>>> On 2024-11-25 15:54:48 [-0500], Waiman Long wrote:
>>>>> FWIW, the description of commit 560af5dc839 is misleading. It says
>>>>> "Enable
>>>>> PROVE_RAW_LOCK_NESTING _by default_" (emphasis mine). That is not what
>>>>> the
>>>>> commit does. It force-enables PROVE_RAW_LOCK_NESTING if PROVE_LOCKING is
>>>>> enabled. It is all or nothing.
>>>>>
>>>> I think we can relax it by
>>>>
>>>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>>>> index 5d9eca035d47..bfdbd3fa2d29 100644
>>>> --- a/lib/Kconfig.debug
>>>> +++ b/lib/Kconfig.debug
>>>> @@ -1399,7 +1399,7 @@ config PROVE_LOCKING
>>>>    config PROVE_RAW_LOCK_NESTING
>>>>           bool
>>>>           depends on PROVE_LOCKING
>>>> -       default y
>>>> +       default y if ARCH_SUPPORTS_RT
>>>>           help
>>>>            Enable the raw_spinlock vs. spinlock nesting checks which ensure
>>>>            that the lock nesting rules for PREEMPT_RT enabled kernels are
>>>>
>>>> Sebastian, what do you think?
>>> All the changes Guenter proposed make sense and were limited to sparc.
>>> So we could apply that. Limiting the option to the RT architectures
>>> would silence the warnings. If there is no interest in getting RT on
>>> sparc there is probably no interest in getting the lock ordering
>>> straight.
>>> I remember PeterZ did not like the option in the beginning but there was
>>> no way around it especially since printk triggered it on boot.
>>> I'm fine with both solutions (fixing sparc or limiting
>>> PROVE_RAW_LOCK_NESTING). I leave the final judgment to the locking
>>> people.
>>
>> Right now, ARCH_SUPPORTS_RT is defined for most of the major arches
>> where most of the testings are being done. So even if we limit this to
>> just those arches, we will not lose much testing anyway. This does have
>> the advantage of not forcing other legacy arches from doing extra works
>> with no real gain from their point of view.
> 
> Even though this is for sparc64, there is work being done looking into
> enabling RT for sparc32. If the amount of fixes needed to keep
> PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd rather
> see it enabled for sparc rather than risking it becoming worse in the
> future.
> 
> I don't know what the situation is for other architectures that does not
> support RT.
> 

For my part I still don't understand why PROVE_RAW_LOCK_NESTING is no longer
a configurable option, or in other words why it is mandated even for architectures
not supporting RT. To me this means that I'll either have to disable PROVE_LOCKING
for sparc or live with endless warning backtraces. The latter obscures real
problems, so it is a no-go.

So, if people want to keep mandating PROVE_RAW_LOCK_NESTING together with
PROVE_LOCKING for all architectures, I'll disable PROVE_LOCKING for sparc
in my testing. NP, just let me know. I'll then do the same for other
architectures not supporting RT if I hit the same problem there.

Guenter
Sebastian Andrzej Siewior Nov. 27, 2024, 4:53 p.m. UTC | #18
On 2024-11-27 08:02:50 [-0800], Guenter Roeck wrote:
> On 11/27/24 07:39, Andreas Larsson wrote:
> > Even though this is for sparc64, there is work being done looking into
> > enabling RT for sparc32. If the amount of fixes needed to keep
> > PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd rather
> > see it enabled for sparc rather than risking it becoming worse in the
> > future.

Okay. So you seem to be in favour of fixing the sparc64 splats Guenter
reported?

> > I don't know what the situation is for other architectures that does not
> > support RT.
> > 
> 
> For my part I still don't understand why PROVE_RAW_LOCK_NESTING is no longer
> a configurable option, or in other words why it is mandated even for architectures
> not supporting RT. To me this means that I'll either have to disable PROVE_LOCKING
> for sparc or live with endless warning backtraces. The latter obscures real
> problems, so it is a no-go.

It is documented in Documentation/locking/locktypes.rst how the locks
should nest. It is just nobody enabled it on sparc64 and tested. The
option was meant temporary until the big read blocks are cleared.

> So, if people want to keep mandating PROVE_RAW_LOCK_NESTING together with
> PROVE_LOCKING for all architectures, I'll disable PROVE_LOCKING for sparc
> in my testing. NP, just let me know. I'll then do the same for other
> architectures not supporting RT if I hit the same problem there.

Waiman posted a patch to disable it on architectures that don't support
PREEMPT_RT. You could also post the patches you discussed. Andreas does
not seem to be against it (but then I don't know if he is a 32 or 64bit
guy). I did not year from other architectures so far.

> Guenter

Sebastian
Guenter Roeck Nov. 27, 2024, 5:44 p.m. UTC | #19
On 11/27/24 08:53, Sebastian Andrzej Siewior wrote:
> On 2024-11-27 08:02:50 [-0800], Guenter Roeck wrote:
>> On 11/27/24 07:39, Andreas Larsson wrote:
>>> Even though this is for sparc64, there is work being done looking into
>>> enabling RT for sparc32. If the amount of fixes needed to keep
>>> PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd rather
>>> see it enabled for sparc rather than risking it becoming worse in the
>>> future.
> 
> Okay. So you seem to be in favour of fixing the sparc64 splats Guenter
> reported?
> 
>>> I don't know what the situation is for other architectures that does not
>>> support RT.
>>>
>>
>> For my part I still don't understand why PROVE_RAW_LOCK_NESTING is no longer
>> a configurable option, or in other words why it is mandated even for architectures
>> not supporting RT. To me this means that I'll either have to disable PROVE_LOCKING
>> for sparc or live with endless warning backtraces. The latter obscures real
>> problems, so it is a no-go.
> 
> It is documented in Documentation/locking/locktypes.rst how the locks
> should nest. It is just nobody enabled it on sparc64 and tested. The
> option was meant temporary until the big read blocks are cleared.
> 

That doesn't explain why PROVE_RAW_LOCK_NESTING is now mandatory if
PROVE_LOCKING is enabled, even on architectures where is was not tested.
I am all for testing, but that doesn't include making it mandatory
even where it is known to fail. Enabling it by default, sure, no problem.
Dropping the option entirely after it is proven to no longer needed,
also no problem. But force-enabling it even where untested or, worse,
known to fail, is two steps too far.

>> So, if people want to keep mandating PROVE_RAW_LOCK_NESTING together with
>> PROVE_LOCKING for all architectures, I'll disable PROVE_LOCKING for sparc
>> in my testing. NP, just let me know. I'll then do the same for other
>> architectures not supporting RT if I hit the same problem there.
> 
> Waiman posted a patch to disable it on architectures that don't support
> PREEMPT_RT. You could also post the patches you discussed. Andreas does
> not seem to be against it (but then I don't know if he is a 32 or 64bit
> guy). I did not year from other architectures so far.
> 

I don't have time to work through all the issues, much less to argue with
affected maintainers to get the various patches accepted. I am open to fixing
regressions if I can, but it appears that this is not considered to be a
regression. I'll just disable PROVE_LOCKING for affected architectures
in my testing after -rc1 is out.

Thanks,
Guenter
Waiman Long Nov. 27, 2024, 11:47 p.m. UTC | #20
On 11/27/24 12:44 PM, Guenter Roeck wrote:
> On 11/27/24 08:53, Sebastian Andrzej Siewior wrote:
>> On 2024-11-27 08:02:50 [-0800], Guenter Roeck wrote:
>>> On 11/27/24 07:39, Andreas Larsson wrote:
>>>> Even though this is for sparc64, there is work being done looking into
>>>> enabling RT for sparc32. If the amount of fixes needed to keep
>>>> PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd rather
>>>> see it enabled for sparc rather than risking it becoming worse in the
>>>> future.
>>
>> Okay. So you seem to be in favour of fixing the sparc64 splats Guenter
>> reported?
>>
>>>> I don't know what the situation is for other architectures that 
>>>> does not
>>>> support RT.
>>>>
>>>
>>> For my part I still don't understand why PROVE_RAW_LOCK_NESTING is 
>>> no longer
>>> a configurable option, or in other words why it is mandated even for 
>>> architectures
>>> not supporting RT. To me this means that I'll either have to disable 
>>> PROVE_LOCKING
>>> for sparc or live with endless warning backtraces. The latter 
>>> obscures real
>>> problems, so it is a no-go.
>>
>> It is documented in Documentation/locking/locktypes.rst how the locks
>> should nest. It is just nobody enabled it on sparc64 and tested. The
>> option was meant temporary until the big read blocks are cleared.
>>
>
> That doesn't explain why PROVE_RAW_LOCK_NESTING is now mandatory if
> PROVE_LOCKING is enabled, even on architectures where is was not tested.
> I am all for testing, but that doesn't include making it mandatory
> even where it is known to fail. Enabling it by default, sure, no problem.
> Dropping the option entirely after it is proven to no longer needed,
> also no problem. But force-enabling it even where untested or, worse,
> known to fail, is two steps too far.

The main reason for enforcing PROVE_RAW_LOCK_NESTING with PROVE_LOCKING 
is due to the fact that PREEMPT_RT kernel is much less tested than the 
non-RT kernel. I do agree that we shouldn't force this on arches that 
don't support PREEMPT_RT. However, once an arch decides to support 
PREEMPT_RT, they have to fix all these raw_spinlock nesting problems.

Cheers,
Longman
Guenter Roeck Nov. 28, 2024, 12:08 a.m. UTC | #21
On 11/27/24 15:47, Waiman Long wrote:
> On 11/27/24 12:44 PM, Guenter Roeck wrote:
>> On 11/27/24 08:53, Sebastian Andrzej Siewior wrote:
>>> On 2024-11-27 08:02:50 [-0800], Guenter Roeck wrote:
>>>> On 11/27/24 07:39, Andreas Larsson wrote:
>>>>> Even though this is for sparc64, there is work being done looking into
>>>>> enabling RT for sparc32. If the amount of fixes needed to keep
>>>>> PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd rather
>>>>> see it enabled for sparc rather than risking it becoming worse in the
>>>>> future.
>>>
>>> Okay. So you seem to be in favour of fixing the sparc64 splats Guenter
>>> reported?
>>>
>>>>> I don't know what the situation is for other architectures that does not
>>>>> support RT.
>>>>>
>>>>
>>>> For my part I still don't understand why PROVE_RAW_LOCK_NESTING is no longer
>>>> a configurable option, or in other words why it is mandated even for architectures
>>>> not supporting RT. To me this means that I'll either have to disable PROVE_LOCKING
>>>> for sparc or live with endless warning backtraces. The latter obscures real
>>>> problems, so it is a no-go.
>>>
>>> It is documented in Documentation/locking/locktypes.rst how the locks
>>> should nest. It is just nobody enabled it on sparc64 and tested. The
>>> option was meant temporary until the big read blocks are cleared.
>>>
>>
>> That doesn't explain why PROVE_RAW_LOCK_NESTING is now mandatory if
>> PROVE_LOCKING is enabled, even on architectures where is was not tested.
>> I am all for testing, but that doesn't include making it mandatory
>> even where it is known to fail. Enabling it by default, sure, no problem.
>> Dropping the option entirely after it is proven to no longer needed,
>> also no problem. But force-enabling it even where untested or, worse,
>> known to fail, is two steps too far.
> 
> The main reason for enforcing PROVE_RAW_LOCK_NESTING with PROVE_LOCKING is due to the fact that PREEMPT_RT kernel is much less tested than the non-RT kernel. I do agree that we shouldn't force this on arches that don't support PREEMPT_RT. However, once an arch decides to support PREEMPT_RT, they have to fix all these raw_spinlock nesting problems.
> 

config PROVE_RAW_LOCK_NESTING
-       bool
+       bool "Enable raw_spinlock - spinlock nesting checks" if ARCH_SUPPORTS_RT=n
         depends on PROVE_LOCKING
-       default y
+       default y if ARCH_SUPPORTS_RT

would have accomplished that while at the same time making it optional
for non-RT architectures.

Guenter
Waiman Long Nov. 28, 2024, 12:31 a.m. UTC | #22
On 11/27/24 7:08 PM, Guenter Roeck wrote:
> On 11/27/24 15:47, Waiman Long wrote:
>> On 11/27/24 12:44 PM, Guenter Roeck wrote:
>>> On 11/27/24 08:53, Sebastian Andrzej Siewior wrote:
>>>> On 2024-11-27 08:02:50 [-0800], Guenter Roeck wrote:
>>>>> On 11/27/24 07:39, Andreas Larsson wrote:
>>>>>> Even though this is for sparc64, there is work being done looking 
>>>>>> into
>>>>>> enabling RT for sparc32. If the amount of fixes needed to keep
>>>>>> PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd 
>>>>>> rather
>>>>>> see it enabled for sparc rather than risking it becoming worse in 
>>>>>> the
>>>>>> future.
>>>>
>>>> Okay. So you seem to be in favour of fixing the sparc64 splats Guenter
>>>> reported?
>>>>
>>>>>> I don't know what the situation is for other architectures that 
>>>>>> does not
>>>>>> support RT.
>>>>>>
>>>>>
>>>>> For my part I still don't understand why PROVE_RAW_LOCK_NESTING is 
>>>>> no longer
>>>>> a configurable option, or in other words why it is mandated even 
>>>>> for architectures
>>>>> not supporting RT. To me this means that I'll either have to 
>>>>> disable PROVE_LOCKING
>>>>> for sparc or live with endless warning backtraces. The latter 
>>>>> obscures real
>>>>> problems, so it is a no-go.
>>>>
>>>> It is documented in Documentation/locking/locktypes.rst how the locks
>>>> should nest. It is just nobody enabled it on sparc64 and tested. The
>>>> option was meant temporary until the big read blocks are cleared.
>>>>
>>>
>>> That doesn't explain why PROVE_RAW_LOCK_NESTING is now mandatory if
>>> PROVE_LOCKING is enabled, even on architectures where is was not 
>>> tested.
>>> I am all for testing, but that doesn't include making it mandatory
>>> even where it is known to fail. Enabling it by default, sure, no 
>>> problem.
>>> Dropping the option entirely after it is proven to no longer needed,
>>> also no problem. But force-enabling it even where untested or, worse,
>>> known to fail, is two steps too far.
>>
>> The main reason for enforcing PROVE_RAW_LOCK_NESTING with 
>> PROVE_LOCKING is due to the fact that PREEMPT_RT kernel is much less 
>> tested than the non-RT kernel. I do agree that we shouldn't force 
>> this on arches that don't support PREEMPT_RT. However, once an arch 
>> decides to support PREEMPT_RT, they have to fix all these 
>> raw_spinlock nesting problems.
>>
>
> config PROVE_RAW_LOCK_NESTING
> -       bool
> +       bool "Enable raw_spinlock - spinlock nesting checks" if 
> ARCH_SUPPORTS_RT=n
>         depends on PROVE_LOCKING
> -       default y
> +       default y if ARCH_SUPPORTS_RT
>
> would have accomplished that while at the same time making it optional
> for non-RT architectures.

I had actually thought about doing exactly that, but decide to keep the 
current mode for forcing  PROVE_RAW_LOCK_NESTING for arches that support 
PREEMPT_RT. I won't mind doing this alternative if others agree.

Cheers,
Longman
Guenter Roeck Nov. 28, 2024, 1:17 a.m. UTC | #23
On 11/27/24 16:31, Waiman Long wrote:
> 
> On 11/27/24 7:08 PM, Guenter Roeck wrote:
>> On 11/27/24 15:47, Waiman Long wrote:
>>> On 11/27/24 12:44 PM, Guenter Roeck wrote:
>>>> On 11/27/24 08:53, Sebastian Andrzej Siewior wrote:
>>>>> On 2024-11-27 08:02:50 [-0800], Guenter Roeck wrote:
>>>>>> On 11/27/24 07:39, Andreas Larsson wrote:
>>>>>>> Even though this is for sparc64, there is work being done looking into
>>>>>>> enabling RT for sparc32. If the amount of fixes needed to keep
>>>>>>> PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd rather
>>>>>>> see it enabled for sparc rather than risking it becoming worse in the
>>>>>>> future.
>>>>>
>>>>> Okay. So you seem to be in favour of fixing the sparc64 splats Guenter
>>>>> reported?
>>>>>
>>>>>>> I don't know what the situation is for other architectures that does not
>>>>>>> support RT.
>>>>>>>
>>>>>>
>>>>>> For my part I still don't understand why PROVE_RAW_LOCK_NESTING is no longer
>>>>>> a configurable option, or in other words why it is mandated even for architectures
>>>>>> not supporting RT. To me this means that I'll either have to disable PROVE_LOCKING
>>>>>> for sparc or live with endless warning backtraces. The latter obscures real
>>>>>> problems, so it is a no-go.
>>>>>
>>>>> It is documented in Documentation/locking/locktypes.rst how the locks
>>>>> should nest. It is just nobody enabled it on sparc64 and tested. The
>>>>> option was meant temporary until the big read blocks are cleared.
>>>>>
>>>>
>>>> That doesn't explain why PROVE_RAW_LOCK_NESTING is now mandatory if
>>>> PROVE_LOCKING is enabled, even on architectures where is was not tested.
>>>> I am all for testing, but that doesn't include making it mandatory
>>>> even where it is known to fail. Enabling it by default, sure, no problem.
>>>> Dropping the option entirely after it is proven to no longer needed,
>>>> also no problem. But force-enabling it even where untested or, worse,
>>>> known to fail, is two steps too far.
>>>
>>> The main reason for enforcing PROVE_RAW_LOCK_NESTING with PROVE_LOCKING is due to the fact that PREEMPT_RT kernel is much less tested than the non-RT kernel. I do agree that we shouldn't force this on arches that don't support PREEMPT_RT. However, once an arch decides to support PREEMPT_RT, they have to fix all these raw_spinlock nesting problems.
>>>
>>
>> config PROVE_RAW_LOCK_NESTING
>> -       bool
>> +       bool "Enable raw_spinlock - spinlock nesting checks" if ARCH_SUPPORTS_RT=n
>>         depends on PROVE_LOCKING
>> -       default y
>> +       default y if ARCH_SUPPORTS_RT
>>
>> would have accomplished that while at the same time making it optional
>> for non-RT architectures.
> 
> I had actually thought about doing exactly that, but decide to keep the current mode for forcing  PROVE_RAW_LOCK_NESTING for arches that support PREEMPT_RT. I won't mind doing this alternative if others agree.
> 

Forcing PROVE_RAW_LOCK_NESTING for arches that support PREEMPT_RT is exactly
what the above does.

	bool "Enable raw_spinlock - spinlock nesting checks" if ARCH_SUPPORTS_RT=n

makes the flag visible (only) if ARCH_SUPPORTS_RT=n, and

	default y if ARCH_SUPPORTS_RT

(force-)enables it if ARCH_SUPPORTS_RT=y.
	
Guenter
Waiman Long Nov. 28, 2024, 1:55 a.m. UTC | #24
On 11/27/24 8:17 PM, Guenter Roeck wrote:
> On 11/27/24 16:31, Waiman Long wrote:
>>
>> On 11/27/24 7:08 PM, Guenter Roeck wrote:
>>> On 11/27/24 15:47, Waiman Long wrote:
>>>> On 11/27/24 12:44 PM, Guenter Roeck wrote:
>>>>> On 11/27/24 08:53, Sebastian Andrzej Siewior wrote:
>>>>>> On 2024-11-27 08:02:50 [-0800], Guenter Roeck wrote:
>>>>>>> On 11/27/24 07:39, Andreas Larsson wrote:
>>>>>>>> Even though this is for sparc64, there is work being done 
>>>>>>>> looking into
>>>>>>>> enabling RT for sparc32. If the amount of fixes needed to keep
>>>>>>>> PROVE_RAW_LOCK_NESTING enabled is quite small at the moment I'd 
>>>>>>>> rather
>>>>>>>> see it enabled for sparc rather than risking it becoming worse 
>>>>>>>> in the
>>>>>>>> future.
>>>>>>
>>>>>> Okay. So you seem to be in favour of fixing the sparc64 splats 
>>>>>> Guenter
>>>>>> reported?
>>>>>>
>>>>>>>> I don't know what the situation is for other architectures that 
>>>>>>>> does not
>>>>>>>> support RT.
>>>>>>>>
>>>>>>>
>>>>>>> For my part I still don't understand why PROVE_RAW_LOCK_NESTING 
>>>>>>> is no longer
>>>>>>> a configurable option, or in other words why it is mandated even 
>>>>>>> for architectures
>>>>>>> not supporting RT. To me this means that I'll either have to 
>>>>>>> disable PROVE_LOCKING
>>>>>>> for sparc or live with endless warning backtraces. The latter 
>>>>>>> obscures real
>>>>>>> problems, so it is a no-go.
>>>>>>
>>>>>> It is documented in Documentation/locking/locktypes.rst how the 
>>>>>> locks
>>>>>> should nest. It is just nobody enabled it on sparc64 and tested. The
>>>>>> option was meant temporary until the big read blocks are cleared.
>>>>>>
>>>>>
>>>>> That doesn't explain why PROVE_RAW_LOCK_NESTING is now mandatory if
>>>>> PROVE_LOCKING is enabled, even on architectures where is was not 
>>>>> tested.
>>>>> I am all for testing, but that doesn't include making it mandatory
>>>>> even where it is known to fail. Enabling it by default, sure, no 
>>>>> problem.
>>>>> Dropping the option entirely after it is proven to no longer needed,
>>>>> also no problem. But force-enabling it even where untested or, worse,
>>>>> known to fail, is two steps too far.
>>>>
>>>> The main reason for enforcing PROVE_RAW_LOCK_NESTING with 
>>>> PROVE_LOCKING is due to the fact that PREEMPT_RT kernel is much 
>>>> less tested than the non-RT kernel. I do agree that we shouldn't 
>>>> force this on arches that don't support PREEMPT_RT. However, once 
>>>> an arch decides to support PREEMPT_RT, they have to fix all these 
>>>> raw_spinlock nesting problems.
>>>>
>>>
>>> config PROVE_RAW_LOCK_NESTING
>>> -       bool
>>> +       bool "Enable raw_spinlock - spinlock nesting checks" if 
>>> ARCH_SUPPORTS_RT=n
>>>         depends on PROVE_LOCKING
>>> -       default y
>>> +       default y if ARCH_SUPPORTS_RT
>>>
>>> would have accomplished that while at the same time making it optional
>>> for non-RT architectures.
>>
>> I had actually thought about doing exactly that, but decide to keep 
>> the current mode for forcing  PROVE_RAW_LOCK_NESTING for arches that 
>> support PREEMPT_RT. I won't mind doing this alternative if others agree.
>>
>
> Forcing PROVE_RAW_LOCK_NESTING for arches that support PREEMPT_RT is 
> exactly
> what the above does.
>
>     bool "Enable raw_spinlock - spinlock nesting checks" if 
> ARCH_SUPPORTS_RT=n
>
> makes the flag visible (only) if ARCH_SUPPORTS_RT=n, and
>
>     default y if ARCH_SUPPORTS_RT
>
> (force-)enables it if ARCH_SUPPORTS_RT=y.

OK, I missed the "if" part after the string. Yes, that do force 
PREEMPT_RT supporting arches to set PROVE_RAW_LOCK_NESTING while 
enabling arches that do not support PREEMPT_RT to optionally set it. I 
will post a v2 patch with that change.

Thanks,
Longman
diff mbox series

Patch

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 50a0927a84a6f..12e0b78cded55 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -42,14 +42,14 @@  volatile int pci_poke_in_progress;
 volatile int pci_poke_cpu = -1;
 volatile int pci_poke_faulted;
 
-static DEFINE_SPINLOCK(pci_poke_lock);
+static DEFINE_RAW_SPINLOCK(pci_poke_lock);
 
 void pci_config_read8(u8 *addr, u8 *ret)
 {
 	unsigned long flags;
 	u8 byte;
 
-	spin_lock_irqsave(&pci_poke_lock, flags);
+	raw_spin_lock_irqsave(&pci_poke_lock, flags);
 	pci_poke_cpu = smp_processor_id();
 	pci_poke_in_progress = 1;
 	pci_poke_faulted = 0;
@@ -63,7 +63,7 @@  void pci_config_read8(u8 *addr, u8 *ret)
 	pci_poke_cpu = -1;
 	if (!pci_poke_faulted)
 		*ret = byte;
-	spin_unlock_irqrestore(&pci_poke_lock, flags);
+	raw_spin_unlock_irqrestore(&pci_poke_lock, flags);
 }
 
 void pci_config_read16(u16 *addr, u16 *ret)
@@ -71,7 +71,7 @@  void pci_config_read16(u16 *addr, u16 *ret)
 	unsigned long flags;
 	u16 word;
 
-	spin_lock_irqsave(&pci_poke_lock, flags);
+	raw_spin_lock_irqsave(&pci_poke_lock, flags);
 	pci_poke_cpu = smp_processor_id();
 	pci_poke_in_progress = 1;
 	pci_poke_faulted = 0;
@@ -85,7 +85,7 @@  void pci_config_read16(u16 *addr, u16 *ret)
 	pci_poke_cpu = -1;
 	if (!pci_poke_faulted)
 		*ret = word;
-	spin_unlock_irqrestore(&pci_poke_lock, flags);
+	raw_spin_unlock_irqrestore(&pci_poke_lock, flags);
 }
 
 void pci_config_read32(u32 *addr, u32 *ret)
@@ -93,7 +93,7 @@  void pci_config_read32(u32 *addr, u32 *ret)
 	unsigned long flags;
 	u32 dword;
 
-	spin_lock_irqsave(&pci_poke_lock, flags);
+	raw_spin_lock_irqsave(&pci_poke_lock, flags);
 	pci_poke_cpu = smp_processor_id();
 	pci_poke_in_progress = 1;
 	pci_poke_faulted = 0;
@@ -107,14 +107,14 @@  void pci_config_read32(u32 *addr, u32 *ret)
 	pci_poke_cpu = -1;
 	if (!pci_poke_faulted)
 		*ret = dword;
-	spin_unlock_irqrestore(&pci_poke_lock, flags);
+	raw_spin_unlock_irqrestore(&pci_poke_lock, flags);
 }
 
 void pci_config_write8(u8 *addr, u8 val)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&pci_poke_lock, flags);
+	raw_spin_lock_irqsave(&pci_poke_lock, flags);
 	pci_poke_cpu = smp_processor_id();
 	pci_poke_in_progress = 1;
 	pci_poke_faulted = 0;
@@ -126,14 +126,14 @@  void pci_config_write8(u8 *addr, u8 val)
 			     : "memory");
 	pci_poke_in_progress = 0;
 	pci_poke_cpu = -1;
-	spin_unlock_irqrestore(&pci_poke_lock, flags);
+	raw_spin_unlock_irqrestore(&pci_poke_lock, flags);
 }
 
 void pci_config_write16(u16 *addr, u16 val)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&pci_poke_lock, flags);
+	raw_spin_lock_irqsave(&pci_poke_lock, flags);
 	pci_poke_cpu = smp_processor_id();
 	pci_poke_in_progress = 1;
 	pci_poke_faulted = 0;
@@ -145,14 +145,14 @@  void pci_config_write16(u16 *addr, u16 val)
 			     : "memory");
 	pci_poke_in_progress = 0;
 	pci_poke_cpu = -1;
-	spin_unlock_irqrestore(&pci_poke_lock, flags);
+	raw_spin_unlock_irqrestore(&pci_poke_lock, flags);
 }
 
 void pci_config_write32(u32 *addr, u32 val)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&pci_poke_lock, flags);
+	raw_spin_lock_irqsave(&pci_poke_lock, flags);
 	pci_poke_cpu = smp_processor_id();
 	pci_poke_in_progress = 1;
 	pci_poke_faulted = 0;
@@ -164,7 +164,7 @@  void pci_config_write32(u32 *addr, u32 val)
 			     : "memory");
 	pci_poke_in_progress = 0;
 	pci_poke_cpu = -1;
-	spin_unlock_irqrestore(&pci_poke_lock, flags);
+	raw_spin_unlock_irqrestore(&pci_poke_lock, flags);
 }
 
 static int ofpci_verbose;