Message ID | CACRpkdZtVNZFWSUgb4=gUE2mQRb=aT_3=zRv1U71Vsq0Mm34eg@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] Pin control fixes for v6.7 | expand |
On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@linaro.org> wrote: > > The most interesting patch is the list iterator fix in the core by Maria > Yu, it took a while for me to realize what was going on there. That commit message still doesn't explain what the problem was. Why is p->state volatile there? It seems to be a serious locking bug if p->state can randomly change there, and the READ_ONCE() looks like a "this hides the problem" rather than an actual real fix. Linus
On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@linaro.org> wrote: > > > > The most interesting patch is the list iterator fix in the core by Maria > > Yu, it took a while for me to realize what was going on there. > > That commit message still doesn't explain what the problem was. > > Why is p->state volatile there? It seems to be a serious locking bug > if p->state can randomly change there, and the READ_ONCE() looks like > a "this hides the problem" rather than an actual real fix. Thanks for looking into it Linus, Maria can you look closer at this and try to pinpoint exactly what happens? Is the bug never manifesting with GCC for example? In the meantime I'll cook a fixes branch without this one commit. Yours, Linus Walleij
The pull request you sent on Wed, 29 Nov 2023 13:09:03 +0100:
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git tags/pinctrl-v6.7-2
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3b47bc037bd44f142ac09848e8d3ecccc726be99
Thank you!
On 11/29/2023 11:08 PM, Linus Walleij wrote: > On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@linaro.org> wrote: >>> >>> The most interesting patch is the list iterator fix in the core by Maria >>> Yu, it took a while for me to realize what was going on there. >> >> That commit message still doesn't explain what the problem was. >> >> Why is p->state volatile there? It seems to be a serious locking bug >> if p->state can randomly change there, and the READ_ONCE() looks like >> a "this hides the problem" rather than an actual real fix. This is indeed an interesting issue. Thx for the comment, Linus. **Let me explain how: "p->state becomes volatile in the list iterator", and "why READ_ONCE is suggested". The current critical code is: list_for_each_entry(setting, &p->state->settings, node) after elaborating the define list_for_each_entry, so above critical code will be: for (setting = list_head(&p->state->settings, typeof(*setting), node); \ &setting->node != (&p->state->settings); \ setting = list_next(setting , node)) The asm code(refer result from Clang version 10.0) can cleared explain the step of p->state reload actions: loop: ldr x22,[x22] ; x22=list_next(setting , node)) add x9,x8,#0x18 ; x9=&p->state->setting cmp x22,x9 ; setting,x9 b.eq 0xFFFFFF9B24483530 ldr w9,[x22,#0x10] ; w9,[setting,#16] cmp w9,#0x2 ; w9,#2 b.ne 0xFFFFFF9B24483504 mov x0,x22 ; x0,setting bl 0xFFFFFF9B24486048 ; pinmux_disable_setting ldr x8,[x19,#0x28] ; x19=p, x8=[p->state], *reload p->state* b loop The *reload p->state* inside the loop for checking is not needed and can cause possible infinite loop. So READ_ONCE is highly suggested even if p->state is not randomly changed. And then unnecessary "ldr x8,[x19,#0x28]" can be removed from above loop code. **Comments about the locking bug: currently pinctrl_select_state is an export symbol and don't have effective reentrance protect design. That's why current infinite loop issue was observed with customer's multi thread call with pinctrl_select_state without lock protection. pinctrl_select_state totally rely on driver module user side to ensure the reentrant state. Usually the suggested usage from driver side who are using pinctrl would be: LOCK; pinctrl_select_state(); gpio pulling; udelay(); check state; other hardware behaviors; UNLOCK; So the locking bug fix I have told customer side to fix from their own driver part. Since usually not only a simple pinctrl_select_state call can finish the hardware state transaction. I myself also is fine to have a small per pinctrl lock to only protect the current pinctrl_select_state->pinctrl_commit_state reentrance issues. Pls any pinctrl maintainer help to comment to suggest or not and I can prepare a change as well. > > Thanks for looking into it Linus, Maria can you look closer at this and > try to pinpoint exactly what happens? Sure, I am trying to explain from my end. If there is other thoughts pls feel free to chime in. > > Is the bug never manifesting with GCC for example? > > In the meantime I'll cook a fixes branch without this one commit. > > Yours, > Linus Walleij
Hi Nathan, Nick, (just picking some LLVM compiler people I know of... and trust) Context is this patch: https://lore.kernel.org/linux-gpio/20231115102824.23727-1-quic_aiquny@quicinc.com/ On Thu, Nov 30, 2023 at 6:37 AM Aiqun(Maria) Yu <quic_aiquny@quicinc.com> wrote: > On 11/29/2023 11:08 PM, Linus Walleij wrote: > > On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > >> On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@linaro.org> wrote: > >>> > >>> The most interesting patch is the list iterator fix in the core by Maria > >>> Yu, it took a while for me to realize what was going on there. > >> > >> That commit message still doesn't explain what the problem was. > >> > >> Why is p->state volatile there? It seems to be a serious locking bug > >> if p->state can randomly change there, and the READ_ONCE() looks like > >> a "this hides the problem" rather than an actual real fix. > > This is indeed an interesting issue. Thx for the comment, Linus. > **Let me explain how: "p->state becomes volatile in the list iterator", > and "why READ_ONCE is suggested". > > The current critical code is: > list_for_each_entry(setting, &p->state->settings, node) > > after elaborating the define list_for_each_entry, so above critical code > will be: > for (setting = list_head(&p->state->settings, typeof(*setting), node); \ > &setting->node != (&p->state->settings); \ > setting = list_next(setting , node)) > > The asm code(refer result from Clang version 10.0) can cleared explain > the step of p->state reload actions: > loop: > ldr x22,[x22] ; x22=list_next(setting , node)) > add x9,x8,#0x18 ; x9=&p->state->setting > cmp x22,x9 ; setting,x9 > b.eq 0xFFFFFF9B24483530 > > ldr w9,[x22,#0x10] ; w9,[setting,#16] > cmp w9,#0x2 ; w9,#2 > b.ne 0xFFFFFF9B24483504 > > mov x0,x22 ; x0,setting > bl 0xFFFFFF9B24486048 ; pinmux_disable_setting > > ldr x8,[x19,#0x28] ; x19=p, x8=[p->state], *reload p->state* > b loop > > The *reload p->state* inside the loop for checking is not needed and can > cause possible infinite loop. So READ_ONCE is highly suggested even if > p->state is not randomly changed. And then unnecessary "ldr > x8,[x19,#0x28]" can be removed from above loop code. > > **Comments about the locking bug: > currently pinctrl_select_state is an export symbol and don't have > effective reentrance protect design. That's why current infinite loop > issue was observed with customer's multi thread call with > pinctrl_select_state without lock protection. pinctrl_select_state > totally rely on driver module user side to ensure the reentrant state. > > Usually the suggested usage from driver side who are using pinctrl would be: > LOCK; > pinctrl_select_state(); > gpio pulling; > udelay(); > check state; > other hardware behaviors; > UNLOCK; > > So the locking bug fix I have told customer side to fix from their own > driver part. Since usually not only a simple pinctrl_select_state call > can finish the hardware state transaction. > > I myself also is fine to have a small per pinctrl lock to only protect > the current pinctrl_select_state->pinctrl_commit_state reentrance > issues. Pls any pinctrl maintainer help to comment to suggest or not and > I can prepare a change as well. Luckily I am the pin control maintainer :D And I also ha my morning coffee and looked over the patch again. So tilting the compiler to generate code that is less prone to race conditions with READ_ONCE() isn't really the solution is it? We need to introduce a proper lock that stops this from happening if it is a problem people are facing. Can you try to make a patch that removes READ_ONCE() and introduce a lock instead? Racing is rarely an issue in pin control for reasons explained in another context here: https://lore.kernel.org/linux-gpio/CACRpkdZ0cnJpYuzU=47-oW-7N_YGMo2vXpKOeXeNi5PhPY7QMA@mail.gmail.com/ ...but if people still manage to run into it, we better have a lock there. Just make sure it is not just an issue with outoftree code, but a real problem? The change that changes the code to use the old_state variable should stay in, it makes the code more readable, it's just the READ_ONCE() macro which is dubious. Yours, Linus Walleij
On Fri, Dec 1, 2023 at 9:10 AM Linus Walleij <linus.walleij@linaro.org> wrote: > Hi Nathan, Nick, > > (just picking some LLVM compiler people I know of... and trust) Forget that part of the mail, written out-of-context when I was running different disassembly of GCC vs LLVM code. I figured it out and concluded that the compilers are fine and doing the right thing, it's a none-issue in that regard. Yours, Linus Walleij
On 12/1/2023 4:10 PM, Linus Walleij wrote: > Hi Nathan, Nick, > > (just picking some LLVM compiler people I know of... and trust) > > Context is this patch: > https://lore.kernel.org/linux-gpio/20231115102824.23727-1-quic_aiquny@quicinc.com/ > > On Thu, Nov 30, 2023 at 6:37 AM Aiqun(Maria) Yu <quic_aiquny@quicinc.com> wrote: >> On 11/29/2023 11:08 PM, Linus Walleij wrote: >>> On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds >>> <torvalds@linux-foundation.org> wrote: >>>> On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@linaro.org> wrote: >>>>> >>>>> The most interesting patch is the list iterator fix in the core by Maria >>>>> Yu, it took a while for me to realize what was going on there. >>>> >>>> That commit message still doesn't explain what the problem was. >>>> >>>> Why is p->state volatile there? It seems to be a serious locking bug >>>> if p->state can randomly change there, and the READ_ONCE() looks like >>>> a "this hides the problem" rather than an actual real fix. >> >> This is indeed an interesting issue. Thx for the comment, Linus. >> **Let me explain how: "p->state becomes volatile in the list iterator", >> and "why READ_ONCE is suggested". >> >> The current critical code is: >> list_for_each_entry(setting, &p->state->settings, node) >> >> after elaborating the define list_for_each_entry, so above critical code >> will be: >> for (setting = list_head(&p->state->settings, typeof(*setting), node); \ >> &setting->node != (&p->state->settings); \ >> setting = list_next(setting , node)) >> >> The asm code(refer result from Clang version 10.0) can cleared explain >> the step of p->state reload actions: >> loop: >> ldr x22,[x22] ; x22=list_next(setting , node)) >> add x9,x8,#0x18 ; x9=&p->state->setting >> cmp x22,x9 ; setting,x9 >> b.eq 0xFFFFFF9B24483530 >> >> ldr w9,[x22,#0x10] ; w9,[setting,#16] >> cmp w9,#0x2 ; w9,#2 >> b.ne 0xFFFFFF9B24483504 >> >> mov x0,x22 ; x0,setting >> bl 0xFFFFFF9B24486048 ; pinmux_disable_setting >> >> ldr x8,[x19,#0x28] ; x19=p, x8=[p->state], *reload p->state* >> b loop >> >> The *reload p->state* inside the loop for checking is not needed and can >> cause possible infinite loop. So READ_ONCE is highly suggested even if >> p->state is not randomly changed. And then unnecessary "ldr >> x8,[x19,#0x28]" can be removed from above loop code. >> >> **Comments about the locking bug: >> currently pinctrl_select_state is an export symbol and don't have >> effective reentrance protect design. That's why current infinite loop >> issue was observed with customer's multi thread call with >> pinctrl_select_state without lock protection. pinctrl_select_state >> totally rely on driver module user side to ensure the reentrant state. >> >> Usually the suggested usage from driver side who are using pinctrl would be: >> LOCK; >> pinctrl_select_state(); >> gpio pulling; >> udelay(); >> check state; >> other hardware behaviors; >> UNLOCK; >> >> So the locking bug fix I have told customer side to fix from their own >> driver part. Since usually not only a simple pinctrl_select_state call >> can finish the hardware state transaction. >> >> I myself also is fine to have a small per pinctrl lock to only protect >> the current pinctrl_select_state->pinctrl_commit_state reentrance >> issues. Pls any pinctrl maintainer help to comment to suggest or not and >> I can prepare a change as well. > > Luckily I am the pin control maintainer :D > And I also ha my morning coffee and looked over the patch again. > > So tilting the compiler to generate code that is less prone to race > conditions with READ_ONCE() isn't really the solution is it? We need > to introduce a proper lock that stops this from happening if it is > a problem people are facing. > > Can you try to make a patch that removes READ_ONCE() > and introduce a lock instead? > > Racing is rarely an issue in pin control for reasons explained > in another context here: > https://lore.kernel.org/linux-gpio/CACRpkdZ0cnJpYuzU=47-oW-7N_YGMo2vXpKOeXeNi5PhPY7QMA@mail.gmail.com/ > > ...but if people still manage to run into it, we better have a lock > there. Just make sure it is not just an issue with outoftree code, > but a real problem? > > The change that changes the code to use the old_state variable > should stay in, it makes the code more readable, it's just the > READ_ONCE() macro which is dubious. Thx for confirm. I am preparing the new change now. :) READ_ONCE can only avoid the possible infinite loop and not crash the whole kernel, while the lock is needed to protect the multi parallel call of pinctrl_commit_state api have a consistent atomic hardware result as well. > > Yours, > Linus Walleij
On 12/1/2023 6:06 PM, Aiqun(Maria) Yu wrote: > On 12/1/2023 4:10 PM, Linus Walleij wrote: >> Hi Nathan, Nick, >> >> (just picking some LLVM compiler people I know of... and trust) >> >> Context is this patch: >> https://lore.kernel.org/linux-gpio/20231115102824.23727-1-quic_aiquny@quicinc.com/ >> >> On Thu, Nov 30, 2023 at 6:37 AM Aiqun(Maria) Yu >> <quic_aiquny@quicinc.com> wrote: >>> On 11/29/2023 11:08 PM, Linus Walleij wrote: >>>> On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds >>>> <torvalds@linux-foundation.org> wrote: >>>>> On Wed, 29 Nov 2023 at 04:09, Linus Walleij >>>>> <linus.walleij@linaro.org> wrote: >>>>>> >>>>>> The most interesting patch is the list iterator fix in the core by >>>>>> Maria >>>>>> Yu, it took a while for me to realize what was going on there. >>>>> >>>>> That commit message still doesn't explain what the problem was. >>>>> >>>>> Why is p->state volatile there? It seems to be a serious locking bug >>>>> if p->state can randomly change there, and the READ_ONCE() looks like >>>>> a "this hides the problem" rather than an actual real fix. >>> >>> This is indeed an interesting issue. Thx for the comment, Linus. >>> **Let me explain how: "p->state becomes volatile in the list iterator", >>> and "why READ_ONCE is suggested". >>> >>> The current critical code is: >>> list_for_each_entry(setting, &p->state->settings, node) >>> >>> after elaborating the define list_for_each_entry, so above critical code >>> will be: >>> for (setting = list_head(&p->state->settings, typeof(*setting), >>> node); \ >>> &setting->node != (&p->state->settings); \ >>> setting = list_next(setting , node)) >>> >>> The asm code(refer result from Clang version 10.0) can cleared explain >>> the step of p->state reload actions: >>> loop: >>> ldr x22,[x22] ; x22=list_next(setting , node)) >>> add x9,x8,#0x18 ; x9=&p->state->setting >>> cmp x22,x9 ; setting,x9 >>> b.eq 0xFFFFFF9B24483530 >>> >>> ldr w9,[x22,#0x10] ; w9,[setting,#16] >>> cmp w9,#0x2 ; w9,#2 >>> b.ne 0xFFFFFF9B24483504 >>> >>> mov x0,x22 ; x0,setting >>> bl 0xFFFFFF9B24486048 ; pinmux_disable_setting >>> >>> ldr x8,[x19,#0x28] ; x19=p, x8=[p->state], *reload p->state* >>> b loop >>> >>> The *reload p->state* inside the loop for checking is not needed and can >>> cause possible infinite loop. So READ_ONCE is highly suggested even if >>> p->state is not randomly changed. And then unnecessary "ldr >>> x8,[x19,#0x28]" can be removed from above loop code. >>> >>> **Comments about the locking bug: >>> currently pinctrl_select_state is an export symbol and don't have >>> effective reentrance protect design. That's why current infinite loop >>> issue was observed with customer's multi thread call with >>> pinctrl_select_state without lock protection. pinctrl_select_state >>> totally rely on driver module user side to ensure the reentrant state. >>> >>> Usually the suggested usage from driver side who are using pinctrl >>> would be: >>> LOCK; >>> pinctrl_select_state(); >>> gpio pulling; >>> udelay(); >>> check state; >>> other hardware behaviors; >>> UNLOCK; >>> >>> So the locking bug fix I have told customer side to fix from their own >>> driver part. Since usually not only a simple pinctrl_select_state call >>> can finish the hardware state transaction. >>> >>> I myself also is fine to have a small per pinctrl lock to only protect >>> the current pinctrl_select_state->pinctrl_commit_state reentrance >>> issues. Pls any pinctrl maintainer help to comment to suggest or not and >>> I can prepare a change as well. >> >> Luckily I am the pin control maintainer :D >> And I also ha my morning coffee and looked over the patch again. >> >> So tilting the compiler to generate code that is less prone to race >> conditions with READ_ONCE() isn't really the solution is it? We need >> to introduce a proper lock that stops this from happening if it is >> a problem people are facing. >> >> Can you try to make a patch that removes READ_ONCE() >> and introduce a lock instead? >> >> Racing is rarely an issue in pin control for reasons explained >> in another context here: >> https://lore.kernel.org/linux-gpio/CACRpkdZ0cnJpYuzU=47-oW-7N_YGMo2vXpKOeXeNi5PhPY7QMA@mail.gmail.com/ >> >> ...but if people still manage to run into it, we better have a lock >> there. Just make sure it is not just an issue with outoftree code, >> but a real problem? >> >> The change that changes the code to use the old_state variable >> should stay in, it makes the code more readable, it's just the >> READ_ONCE() macro which is dubious. > Thx for confirm. I am preparing the new change now. :) change uploaded link here: https://lore.kernel.org/linux-gpio/20231201152931.31161-1-quic_aiquny@quicinc.com/ > > READ_ONCE can only avoid the possible infinite loop and not crash the > whole kernel, while the lock is needed to protect the multi parallel > call of pinctrl_commit_state api have a consistent atomic hardware > result as well. >> >> Yours, >> Linus Walleij >