Message ID | 1441836918-24159-1-git-send-email-simon.guinot@sequanux.org |
---|---|
State | New |
Headers | show |
Hello, I finally got around to rebasing some patches, and realised that the patch from Simon Guinot below still gets rebased over torvalds' v4.4 . Any reason it was not applied ? Or was the issue fixed in another, non-git-conflicting way ? (I see nothing recent in git log kernel/resource.c) I do not find a trace of a mail confirming that I tested it and that it fixes the issue. So here goes: Tested-by: Vincent Pelletier <plr.vincent@gmail.com> Testing details: bug reproduced on 4.1, patch applied over 4.1 and bug disappeared. After rebasing this patch (along with others) over 4.4, bug does not reappear. I did not try to reproduce bug with 4.4, but if preferred I can give it a go. On Thu, 10 Sep 2015 00:15:18 +0200, Simon Guinot <simon.guinot@sequanux.org> wrote: > In __request_region, if a conflict with a BUSY and MUXED resource is > detected, then the caller goes to sleep and waits for the resource to > be released. A pointer on the conflicting resource is kept. At wake-up > this pointer is used as a parent to retry to request the region. A first > problem is that this pointer might well be invalid (if for example the > conflicting resource have already been freed). An another problem is > that the next call to __request_region() fails to detect a remaining > conflict. The previously conflicting resource is passed as a parameter > and __request_region() will look for a conflict among the children of > this resource and not at the resource itself. It is likely to succeed > anyway, even if there is still a conflict. Instead, the parent of the > conflicting resource should be passed to __request_region(). > > As a fix attempt, this patch don't update the parent resource pointer in > the case we have to wait for a muxed region right after. > > Reported-by: Vincent Pelletier <plr.vincent@gmail.com> > Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> > Tested-by: Vincent Donnefort <vdonnefort@gmail.com> > --- > kernel/resource.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index fed052a1bc9f..b8c84804db6a 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -1072,9 +1072,10 @@ struct resource * __request_region(struct resource *parent, > if (!conflict) > break; > if (conflict != parent) { > - parent = conflict; > - if (!(conflict->flags & IORESOURCE_BUSY)) > + if (!(conflict->flags & IORESOURCE_BUSY)) { > + parent = conflict; > continue; > + } > } > if (conflict->flags & flags & IORESOURCE_MUXED) { > add_wait_queue(&muxed_resource_wait, &wait); Regards,
+Linus (the de-facto resource guy). On 02/19/2016 01:10 PM, Vincent Pelletier wrote: > Hello, > > I finally got around to rebasing some patches, and realised that the > patch from Simon Guinot below still gets rebased over torvalds' v4.4 . > > Any reason it was not applied ? > Or was the issue fixed in another, non-git-conflicting way ? (I see > nothing recent in git log kernel/resource.c) > > I do not find a trace of a mail confirming that I tested it and that it > fixes the issue. So here goes: > Tested-by: Vincent Pelletier <plr.vincent@gmail.com> > > Testing details: bug reproduced on 4.1, patch applied over 4.1 and bug > disappeared. After rebasing this patch (along with others) over 4.4, > bug does not reappear. I did not try to reproduce bug with 4.4, but if > preferred I can give it a go. > > On Thu, 10 Sep 2015 00:15:18 +0200, Simon Guinot > <simon.guinot@sequanux.org> wrote: >> In __request_region, if a conflict with a BUSY and MUXED resource is >> detected, then the caller goes to sleep and waits for the resource to >> be released. A pointer on the conflicting resource is kept. At wake-up >> this pointer is used as a parent to retry to request the region. A first >> problem is that this pointer might well be invalid (if for example the >> conflicting resource have already been freed). An another problem is >> that the next call to __request_region() fails to detect a remaining >> conflict. The previously conflicting resource is passed as a parameter >> and __request_region() will look for a conflict among the children of >> this resource and not at the resource itself. It is likely to succeed >> anyway, even if there is still a conflict. Instead, the parent of the >> conflicting resource should be passed to __request_region(). >> >> As a fix attempt, this patch don't update the parent resource pointer in >> the case we have to wait for a muxed region right after. >> >> Reported-by: Vincent Pelletier <plr.vincent@gmail.com> >> Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> >> Tested-by: Vincent Donnefort <vdonnefort@gmail.com> >> --- >> kernel/resource.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/resource.c b/kernel/resource.c >> index fed052a1bc9f..b8c84804db6a 100644 >> --- a/kernel/resource.c >> +++ b/kernel/resource.c >> @@ -1072,9 +1072,10 @@ struct resource * __request_region(struct resource *parent, >> if (!conflict) >> break; >> if (conflict != parent) { >> - parent = conflict; >> - if (!(conflict->flags & IORESOURCE_BUSY)) >> + if (!(conflict->flags & IORESOURCE_BUSY)) { >> + parent = conflict; >> continue; >> + } >> } >> if (conflict->flags & flags & IORESOURCE_MUXED) { >> add_wait_queue(&muxed_resource_wait, &wait); > > Regards, > -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Feb 19, 2016 at 3:25 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > +Linus (the de-facto resource guy). > > On 02/19/2016 01:10 PM, Vincent Pelletier wrote: >> >> I finally got around to rebasing some patches, and realised that the >> patch from Simon Guinot below still gets rebased over torvalds' v4.4 . >> >> Any reason it was not applied ? >> Or was the issue fixed in another, non-git-conflicting way ? (I see >> nothing recent in git log kernel/resource.c) >> >> I do not find a trace of a mail confirming that I tested it and that it >> fixes the issue. So here goes: >> Tested-by: Vincent Pelletier <plr.vincent@gmail.com> Hmm. So I'm not entirely happy with the patch, because I think the problem with using a possibly free'd parent resource at restart still exists. As far as I can tell, if we hit the IORESOURCE_MUXED case *after* we have successfully delved into a resource that wasn't busy, we will have updated "parent" in a previous iteration of the loop, and we'll not use the original parent when we then re-start after the sleep. So quite frankly, I suspect any user of MUXED memory regions is still fundamentally buggy, and IORESOURCE_MUXED has always been a hacky and broken thing. That said, I ended up applying the patch anyway, even if I despise it. For all I know, muxed users never end up having those non-busy sub-resources in practice, and maybe there is some serialization at the top level for the drivers that use it. So if testing has shown that it helps some actual case, I'll believe the testing. But the code still looks rather debatable, and the whole IORESOURCE_MUXED approach looks broken. Jesse, that came in through you and the drm tree, I think. Alan is marked as author, and there are other people who actually use and can test the code. Can you guys think about the code a bit more. I'm wondering if the *real* fix ends up being to reset the 'parent' pointer to the original top-level parent after the sleep? To recap: the patch is in my tree, but I'm not all that happy about it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On February 20, 2016 9:12:01 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Feb 19, 2016 at 3:25 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: >> +Linus (the de-facto resource guy). >> >> On 02/19/2016 01:10 PM, Vincent Pelletier wrote: >>> Tested-by: Vincent Pelletier <plr.vincent@gmail.com> > > Hmm. > > So I'm not entirely happy with the patch, because I think the problem > with using a possibly free'd parent resource at restart still exists. > > As far as I can tell, if we hit the IORESOURCE_MUXED case *after* we > have successfully delved into a resource that wasn't busy, we will > have updated "parent" in a previous iteration of the loop, and we'll > not use the original parent when we then re-start after the sleep. So > quite frankly, I suspect any user of MUXED memory regions is still > fundamentally buggy, and IORESOURCE_MUXED has always been a hacky and > broken thing. > > That said, I ended up applying the patch anyway, even if I despise it. > For all I know, muxed users never end up having those non-busy > sub-resources in practice, and maybe there is some serialization at > the top level for the drivers that use it. So if testing has shown > that it helps some actual case, I'll believe the testing. But the code > still looks rather debatable, and the whole IORESOURCE_MUXED approach > looks broken. > > Jesse, that came in through you and the drm tree, I think. Alan is > marked as author, and there are other people who actually use and can > test the code. Can you guys think about the code a bit more. > > I'm wondering if the *real* fix ends up being to reset the 'parent' > pointer to the original top-level parent after the sleep? > > To recap: the patch is in my tree, but I'm not all that happy about it. Thanks, yeah i think testing wins in this case. I'll revisit the muxed stuff; I do remember being dubious at the time, but iirc Alan needed it for something, and others had been pushing for these sorts of usages for awhile even though we have some good alternatives in the form of bus and platform drivers that can manage the appropriate serialization and keep things from stomping on one another. (And sorry if this message comes across in some bullshit format, I'm trying out a new ChromeOS based mail client for fun here.) Thanks, Jesse -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> we have some good alternatives in the form of bus and platform > drivers that > can manage the appropriate serialization and keep things from > stomping > on one another. It's not used much, especially nowdays. The use case is basically multi I/O chips on the ISA/LPC bus with magic shared config register ports. We have sufficiently few of those we could give muxed the boot and special case them if preferred. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/22/2016 05:49 AM, Alan Cox wrote: >> we have some good alternatives in the form of bus and platform >> drivers that >> can manage the appropriate serialization and keep things from >> stomping >> on one another. > > It's not used much, especially nowdays. The use case is basically multi > I/O chips on the ISA/LPC bus with magic shared config register ports. > > We have sufficiently few of those we could give muxed the boot and > special case them if preferred. Ah that's right, now I remember the context. So where should we go from here then? Just leave the ugly fix in or hack on old stuff and hope not to break it? Jesse -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 22 Feb 2016 13:49:12 +0000, Alan Cox <alan@linux.intel.com> wrote: > It's not used much, especially nowdays. The use case is basically multi > I/O chips on the ISA/LPC bus with magic shared config register ports. This is precisely a super I/O driver (gpio-f7188x) which, when used with concurrent accesses on an SMP machine triggered the issue which prompted this patch. In case information on the original issue is desired: My original report (ignore attached patch, it was rejected as it breaks other chips supported by this driver): http://permalink.gmane.org/gmane.linux.kernel.gpio/10204 My test procedure (second half of the mail), which I used to validate the patch against 4.1: http://permalink.gmane.org/gmane.linux.kernel.gpio/10216 Simon Guinot & Vincent Donnefort debugging results: http://permalink.gmane.org/gmane.linux.kernel.gpio/10521 Regards,
On Mon, Feb 22, 2016 at 12:46:09PM -0800, Jesse Barnes wrote: > On 02/22/2016 05:49 AM, Alan Cox wrote: > >> we have some good alternatives in the form of bus and platform > >> drivers that > >> can manage the appropriate serialization and keep things from > >> stomping > >> on one another. > > > > It's not used much, especially nowdays. The use case is basically multi > > I/O chips on the ISA/LPC bus with magic shared config register ports. > > > > We have sufficiently few of those we could give muxed the boot and > > special case them if preferred. > > Ah that's right, now I remember the context. So where should we go from here then? Just leave the ugly fix in or hack on old stuff and hope not to break it? Hi Jesse, The fix is not ugly but only incomplete. And I have to say that the whole IORESOURCE_MUXED thing is not shiny either :) The main problem in __request_region() is that we are dropping the spinlock of the resource list while holding a reference on a resource, waiting for a muxed resource to become available. From here, I can see two bugs: 1 - At wake-up, the next __request_resource() iteration will not detect a remaining conflict. To work properly, __request_resource() needs to be called with a parent of the conflicting resource. Instead we are passing the conflicting resource itself... 2 - At wake-up the resource pointer we are holding could have been freed. Since we are just about to use this pointer to insert a new resource in the linked list, it is broken... My patch fixes 1 and makes things better for 2. But I think Linus is right. If at wake-up we use the original parent resource to check again for a conflict, then we are safe. If you want, I can propose a such patch. Note that IORESOURCE_MUXED is mostly used by Super-I/Os drivers. A Super-I/O is a legacy I/O controller embedded on x86 motherboards. It is used to connect the low-bandwidth devices such as parallel ports, serial ports, keyboard controllers, hardware monitoring controllers, GPIO controllers, etc. While each logical device have its own memory region, a shared memory region is used for some configuration purpose. IORESOURCE_MUXED is a convenient way to deal with that. For some code examples you can look at the superio_* functions in the IT87 drivers: gpio/gpio-it87.c, hwmon/it87.c and watchdog/it87_wdt.c. I am not aware of any other users for IORESOURCE_MUXED. Let me know how you want to go and if you need my help. Simon
On 02/23/2016 08:19 AM, Simon Guinot wrote: > On Mon, Feb 22, 2016 at 12:46:09PM -0800, Jesse Barnes wrote: >> On 02/22/2016 05:49 AM, Alan Cox wrote: >>>> we have some good alternatives in the form of bus and platform >>>> drivers that >>>> can manage the appropriate serialization and keep things from >>>> stomping >>>> on one another. >>> >>> It's not used much, especially nowdays. The use case is basically multi >>> I/O chips on the ISA/LPC bus with magic shared config register ports. >>> >>> We have sufficiently few of those we could give muxed the boot and >>> special case them if preferred. >> >> Ah that's right, now I remember the context. So where should we go from here then? Just leave the ugly fix in or hack on old stuff and hope not to break it? > > Hi Jesse, > > The fix is not ugly but only incomplete. And I have to say that the > whole IORESOURCE_MUXED thing is not shiny either :) Yeah definitely, I'm not casting stones at you, this whole problem is an ugly one. :) As Alan said, muxed is really intended for a pretty limited set of cases. The "right" solution is a lot more work than its worth, so we have this instead, which is fine. But obviously it's been a little trickier to put in place than we hoped. :) > The main problem in __request_region() is that we are dropping the > spinlock of the resource list while holding a reference on a resource, > waiting for a muxed resource to become available. > > From here, I can see two bugs: > > 1 - At wake-up, the next __request_resource() iteration will not detect > a remaining conflict. To work properly, __request_resource() needs > to be called with a parent of the conflicting resource. Instead we > are passing the conflicting resource itself... > 2 - At wake-up the resource pointer we are holding could have been > freed. Since we are just about to use this pointer to insert a new > resource in the linked list, it is broken... > > My patch fixes 1 and makes things better for 2. > > But I think Linus is right. If at wake-up we use the original parent > resource to check again for a conflict, then we are safe. > > If you want, I can propose a such patch. > > Note that IORESOURCE_MUXED is mostly used by Super-I/Os drivers. A > Super-I/O is a legacy I/O controller embedded on x86 motherboards. It is > used to connect the low-bandwidth devices such as parallel ports, serial > ports, keyboard controllers, hardware monitoring controllers, GPIO > controllers, etc. While each logical device have its own memory region, > a shared memory region is used for some configuration purpose. > IORESOURCE_MUXED is a convenient way to deal with that. For some code > examples you can look at the superio_* functions in the IT87 drivers: > gpio/gpio-it87.c, hwmon/it87.c and watchdog/it87_wdt.c. > > I am not aware of any other users for IORESOURCE_MUXED. > > Let me know how you want to go and if you need my help. I'd be happy if you proposed a patch. It would also be nice if we could somehow more formally limit the MUXED usage to these super I/O devices, just so other users don't get into trouble thinking it's more general than it is. Thanks, Jesse -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > IORESOURCE_MUXED is a convenient way to deal with that. For some code > > examples you can look at the superio_* functions in the IT87 drivers: > > gpio/gpio-it87.c, hwmon/it87.c and watchdog/it87_wdt.c. > > > > I am not aware of any other users for IORESOURCE_MUXED. > > > > Let me know how you want to go and if you need my help. > > I'd be happy if you proposed a patch. It would also be nice if we could > somehow more formally limit the MUXED usage to these super I/O devices, > just so other users don't get into trouble thinking it's more general > than it is. Today I think the problem wouldn't exist. We'd tell the authors of the drivers to create an mfd or similar to manage the resources and load the appropriate extra goodies. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-gpio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/resource.c b/kernel/resource.c index fed052a1bc9f..b8c84804db6a 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1072,9 +1072,10 @@ struct resource * __request_region(struct resource *parent, if (!conflict) break; if (conflict != parent) { - parent = conflict; - if (!(conflict->flags & IORESOURCE_BUSY)) + if (!(conflict->flags & IORESOURCE_BUSY)) { + parent = conflict; continue; + } } if (conflict->flags & flags & IORESOURCE_MUXED) { add_wait_queue(&muxed_resource_wait, &wait);