Message ID | 1545033396-24485-3-git-send-email-frowand.list@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | of: phandle_cache, fix refcounts, remove stale entry | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | next/apply_patch Successfully applied |
snowpatch_ozlabs/build-ppc64le | success | build succeeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/build-ppc64be | success | build succeeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/build-ppc64e | warning | build succeeded but added 1 new sparse warning(s) |
snowpatch_ozlabs/build-pmac32 | warning | build succeeded but added 1 new sparse warning(s) |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 65 lines checked |
Hi Frank, frowand.list@gmail.com writes: > From: Frank Rowand <frank.rowand@sony.com> > > Non-overlay dynamic devicetree node removal may leave the node in > the phandle cache. Subsequent calls to of_find_node_by_phandle() > will incorrectly find the stale entry. Remove the node from the > cache. > > Add paranoia checks in of_find_node_by_phandle() as a second level > of defense (do not return cached node if detached, do not add node > to cache if detached). > > Reported-by: Michael Bringmann <mwb@linux.vnet.ibm.com> > Signed-off-by: Frank Rowand <frank.rowand@sony.com> > --- Similarly here can we add: Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") Cc: stable@vger.kernel.org # v4.17+ Thanks for doing this series. Some minor comments below. > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 6c33d63361b8..ad71864cecf5 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -162,6 +162,27 @@ int of_free_phandle_cache(void) > late_initcall_sync(of_free_phandle_cache); > #endif > > +/* > + * Caller must hold devtree_lock. > + */ > +void __of_free_phandle_cache_entry(phandle handle) > +{ > + phandle masked_handle; > + > + if (!handle) > + return; We could fold the phandle_cache check into that if and return early for both cases couldn't we? > + masked_handle = handle & phandle_cache_mask; > + > + if (phandle_cache) { Meaning this wouldn't be necessary. > + if (phandle_cache[masked_handle] && > + handle == phandle_cache[masked_handle]->phandle) { > + of_node_put(phandle_cache[masked_handle]); > + phandle_cache[masked_handle] = NULL; > + } A temporary would help the readability here I think, eg: struct device_node *np; np = phandle_cache[masked_handle]; if (np && handle == np->phandle) { of_node_put(np); phandle_cache[masked_handle] = NULL; } > @@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle handle) > if (phandle_cache[masked_handle] && > handle == phandle_cache[masked_handle]->phandle) > np = phandle_cache[masked_handle]; > + if (np && of_node_check_flag(np, OF_DETACHED)) { > + WARN_ON(1); > + of_node_put(np); Do we really want to do the put here? We're here because something has gone wrong, possibly even memory corruption such that np is not even pointing at a device node anymore. So it seems like it would be safer to just leave the ref count alone, possibly leak a small amount of memory, and NULL out the reference. cheers
On 12/17/18 2:52 AM, Michael Ellerman wrote: > Hi Frank, > > frowand.list@gmail.com writes: >> From: Frank Rowand <frank.rowand@sony.com> >> >> Non-overlay dynamic devicetree node removal may leave the node in >> the phandle cache. Subsequent calls to of_find_node_by_phandle() >> will incorrectly find the stale entry. Remove the node from the >> cache. >> >> Add paranoia checks in of_find_node_by_phandle() as a second level >> of defense (do not return cached node if detached, do not add node >> to cache if detached). >> >> Reported-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >> --- > > Similarly here can we add: > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") Yes, thanks. > Cc: stable@vger.kernel.org # v4.17+ Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug fix). So the bug will not be in stable. I've debated with myself over this, because there is a possibility that 0b3ce78e90fc could somehow be put into a stable despite not being a bug fix. We can always explicitly request this patch series be added to stable in that case. > Thanks for doing this series. > > Some minor comments below. > >> diff --git a/drivers/of/base.c b/drivers/of/base.c >> index 6c33d63361b8..ad71864cecf5 100644 >> --- a/drivers/of/base.c >> +++ b/drivers/of/base.c >> @@ -162,6 +162,27 @@ int of_free_phandle_cache(void) >> late_initcall_sync(of_free_phandle_cache); >> #endif >> >> +/* >> + * Caller must hold devtree_lock. >> + */ >> +void __of_free_phandle_cache_entry(phandle handle) >> +{ >> + phandle masked_handle; >> + >> + if (!handle) >> + return; > > We could fold the phandle_cache check into that if and return early for > both cases couldn't we? We could, but that would make the reason for checking phandle_cache less obvious. I would rather leave that check > >> + masked_handle = handle & phandle_cache_mask; >> + >> + if (phandle_cache) { > > Meaning this wouldn't be necessary. > >> + if (phandle_cache[masked_handle] && >> + handle == phandle_cache[masked_handle]->phandle) { >> + of_node_put(phandle_cache[masked_handle]); >> + phandle_cache[masked_handle] = NULL; >> + } > > A temporary would help the readability here I think, eg: > > struct device_node *np; > np = phandle_cache[masked_handle]; > > if (np && handle == np->phandle) { > of_node_put(np); > phandle_cache[masked_handle] = NULL; > } Yes, much cleaner. >> @@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle handle) >> if (phandle_cache[masked_handle] && >> handle == phandle_cache[masked_handle]->phandle) >> np = phandle_cache[masked_handle]; >> + if (np && of_node_check_flag(np, OF_DETACHED)) { >> + WARN_ON(1); >> + of_node_put(np); > > Do we really want to do the put here? > > We're here because something has gone wrong, possibly even memory > corruption such that np is not even pointing at a device node anymore. > So it seems like it would be safer to just leave the ref count alone, > possibly leak a small amount of memory, and NULL out the reference. I like the concept of the code being a little bit paranoid. But the bug that this check is likely to cache is the bug that led to this series -- removing a devicetree node, but failing to remove it from the cache as part of the removal. So I think I'll leave it as is. > > > cheers > Thanks for the thoughts and suggestions! -Frank
On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand <frowand.list@gmail.com> wrote: > > On 12/17/18 2:52 AM, Michael Ellerman wrote: > > Hi Frank, > > > > frowand.list@gmail.com writes: > >> From: Frank Rowand <frank.rowand@sony.com> > >> > >> Non-overlay dynamic devicetree node removal may leave the node in > >> the phandle cache. Subsequent calls to of_find_node_by_phandle() > >> will incorrectly find the stale entry. Remove the node from the > >> cache. > >> > >> Add paranoia checks in of_find_node_by_phandle() as a second level > >> of defense (do not return cached node if detached, do not add node > >> to cache if detached). > >> > >> Reported-by: Michael Bringmann <mwb@linux.vnet.ibm.com> > >> Signed-off-by: Frank Rowand <frank.rowand@sony.com> > >> --- > > > > Similarly here can we add: > > > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") > > Yes, thanks. > > > > Cc: stable@vger.kernel.org # v4.17+ > > Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug > fix). So the bug will not be in stable. 0b3ce78e90fc landed in v4.17, so Michael's line above is correct. Annotating it with 4.17 only saves Greg from trying and then emailing us to backport this patch as it wouldn't apply. Rob
On 12/18/18 12:01 PM, Rob Herring wrote: > On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand <frowand.list@gmail.com> wrote: >> >> On 12/17/18 2:52 AM, Michael Ellerman wrote: >>> Hi Frank, >>> >>> frowand.list@gmail.com writes: >>>> From: Frank Rowand <frank.rowand@sony.com> >>>> >>>> Non-overlay dynamic devicetree node removal may leave the node in >>>> the phandle cache. Subsequent calls to of_find_node_by_phandle() >>>> will incorrectly find the stale entry. Remove the node from the >>>> cache. >>>> >>>> Add paranoia checks in of_find_node_by_phandle() as a second level >>>> of defense (do not return cached node if detached, do not add node >>>> to cache if detached). >>>> >>>> Reported-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >>>> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >>>> --- >>> >>> Similarly here can we add: >>> >>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") >> >> Yes, thanks. >> >> >>> Cc: stable@vger.kernel.org # v4.17+ >> >> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug >> fix). So the bug will not be in stable. > > 0b3ce78e90fc landed in v4.17, so Michael's line above is correct. > Annotating it with 4.17 only saves Greg from trying and then emailing > us to backport this patch as it wouldn't apply. Thanks for the correction. I was both under-thinking and over-thinking, ending up with an incorrect answer. Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do you want me to re-spin? -Frank > > Rob >
On 12/18/18 12:09 PM, Frank Rowand wrote: > On 12/18/18 12:01 PM, Rob Herring wrote: >> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand <frowand.list@gmail.com> wrote: >>> >>> On 12/17/18 2:52 AM, Michael Ellerman wrote: >>>> Hi Frank, >>>> >>>> frowand.list@gmail.com writes: >>>>> From: Frank Rowand <frank.rowand@sony.com> >>>>> >>>>> Non-overlay dynamic devicetree node removal may leave the node in >>>>> the phandle cache. Subsequent calls to of_find_node_by_phandle() >>>>> will incorrectly find the stale entry. Remove the node from the >>>>> cache. >>>>> >>>>> Add paranoia checks in of_find_node_by_phandle() as a second level >>>>> of defense (do not return cached node if detached, do not add node >>>>> to cache if detached). >>>>> >>>>> Reported-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >>>>> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >>>>> --- >>>> >>>> Similarly here can we add: >>>> >>>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") >>> >>> Yes, thanks. >>> >>> >>>> Cc: stable@vger.kernel.org # v4.17+ >>> >>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug >>> fix). So the bug will not be in stable. >> >> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct. >> Annotating it with 4.17 only saves Greg from trying and then emailing >> us to backport this patch as it wouldn't apply. > > Thanks for the correction. I was both under-thinking and over-thinking, > ending up with an incorrect answer. > > Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do > you want me to re-spin? Now that my thinking has been straightened out, a little bit more checking for the other pre-requisite patches show: v4.18: commit b9952b5218ad ("of: overlay: update phandle cache on overlay apply and remove") v4.19: commit e54192b48da7 ("of: fix phandle cache creation for DTs with no phandles") These can be addressed by changing the "Cc:" to ... # v4.19+ because stable v4.17.* and v4.18.* are end of life. Or the pre-requisites can be listed: # v4.17: b9952b5218ad of: overlay: update phandle cache # v4.17: e54192b48da7 of: fix phandle cache creation # v4.17 # v4.18: e54192b48da7 of: fix phandle cache creation # v4.18 # v4.19+ Do you have a preference? -Frank > > -Frank > >> >> Rob >> > >
On Tue, Dec 18, 2018 at 2:33 PM Frank Rowand <frowand.list@gmail.com> wrote: > > On 12/18/18 12:09 PM, Frank Rowand wrote: > > On 12/18/18 12:01 PM, Rob Herring wrote: > >> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand <frowand.list@gmail.com> wrote: > >>> > >>> On 12/17/18 2:52 AM, Michael Ellerman wrote: > >>>> Hi Frank, > >>>> > >>>> frowand.list@gmail.com writes: > >>>>> From: Frank Rowand <frank.rowand@sony.com> > >>>>> > >>>>> Non-overlay dynamic devicetree node removal may leave the node in > >>>>> the phandle cache. Subsequent calls to of_find_node_by_phandle() > >>>>> will incorrectly find the stale entry. Remove the node from the > >>>>> cache. > >>>>> > >>>>> Add paranoia checks in of_find_node_by_phandle() as a second level > >>>>> of defense (do not return cached node if detached, do not add node > >>>>> to cache if detached). > >>>>> > >>>>> Reported-by: Michael Bringmann <mwb@linux.vnet.ibm.com> > >>>>> Signed-off-by: Frank Rowand <frank.rowand@sony.com> > >>>>> --- > >>>> > >>>> Similarly here can we add: > >>>> > >>>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") > >>> > >>> Yes, thanks. > >>> > >>> > >>>> Cc: stable@vger.kernel.org # v4.17+ > >>> > >>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug > >>> fix). So the bug will not be in stable. > >> > >> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct. > >> Annotating it with 4.17 only saves Greg from trying and then emailing > >> us to backport this patch as it wouldn't apply. > > > > Thanks for the correction. I was both under-thinking and over-thinking, > > ending up with an incorrect answer. > > > > Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do > > you want me to re-spin? > > Now that my thinking has been straightened out, a little bit more checking > for the other pre-requisite patches show: > > v4.18: commit b9952b5218ad ("of: overlay: update phandle cache on overlay apply and remove") > v4.19: commit e54192b48da7 ("of: fix phandle cache creation for DTs with no phandles") > > These can be addressed by changing the "Cc:" to ... # v4.19+ > because stable v4.17.* and v4.18.* are end of life. EOL shouldn't factor into it. There's always the possibility someone else picks any kernel version. > Or the pre-requisites can be listed: > > # v4.17: b9952b5218ad of: overlay: update phandle cache > # v4.17: e54192b48da7 of: fix phandle cache creation > # v4.17 > > # v4.18: e54192b48da7 of: fix phandle cache creation > # v4.18 > > # v4.19+ > > Do you have a preference? I think we just list v4.17 and be done with it. Rob
Rob Herring <robh+dt@kernel.org> writes: > On Tue, Dec 18, 2018 at 2:33 PM Frank Rowand <frowand.list@gmail.com> wrote: >> >> On 12/18/18 12:09 PM, Frank Rowand wrote: >> > On 12/18/18 12:01 PM, Rob Herring wrote: >> >> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand <frowand.list@gmail.com> wrote: >> >>> >> >>> On 12/17/18 2:52 AM, Michael Ellerman wrote: >> >>>> Hi Frank, >> >>>> >> >>>> frowand.list@gmail.com writes: >> >>>>> From: Frank Rowand <frank.rowand@sony.com> >> >>>>> >> >>>>> Non-overlay dynamic devicetree node removal may leave the node in >> >>>>> the phandle cache. Subsequent calls to of_find_node_by_phandle() >> >>>>> will incorrectly find the stale entry. Remove the node from the >> >>>>> cache. >> >>>>> >> >>>>> Add paranoia checks in of_find_node_by_phandle() as a second level >> >>>>> of defense (do not return cached node if detached, do not add node >> >>>>> to cache if detached). >> >>>>> >> >>>>> Reported-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >> >>>>> Signed-off-by: Frank Rowand <frank.rowand@sony.com> >> >>>>> --- >> >>>> >> >>>> Similarly here can we add: >> >>>> >> >>>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") >> >>> >> >>> Yes, thanks. >> >>> >> >>> >> >>>> Cc: stable@vger.kernel.org # v4.17+ >> >>> >> >>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug >> >>> fix). So the bug will not be in stable. >> >> >> >> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct. >> >> Annotating it with 4.17 only saves Greg from trying and then emailing >> >> us to backport this patch as it wouldn't apply. >> > >> > Thanks for the correction. I was both under-thinking and over-thinking, >> > ending up with an incorrect answer. >> > >> > Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do >> > you want me to re-spin? >> >> Now that my thinking has been straightened out, a little bit more checking >> for the other pre-requisite patches show: >> >> v4.18: commit b9952b5218ad ("of: overlay: update phandle cache on overlay apply and remove") >> v4.19: commit e54192b48da7 ("of: fix phandle cache creation for DTs with no phandles") >> >> These can be addressed by changing the "Cc:" to ... # v4.19+ >> because stable v4.17.* and v4.18.* are end of life. > > EOL shouldn't factor into it. There's always the possibility someone > else picks any kernel version. Yeah, there are other stable branches out there, so the tag should point to the correct version regardless of whether it's currently EOL on kernel.org. >> Or the pre-requisites can be listed: >> >> # v4.17: b9952b5218ad of: overlay: update phandle cache >> # v4.17: e54192b48da7 of: fix phandle cache creation >> # v4.17 >> >> # v4.18: e54192b48da7 of: fix phandle cache creation >> # v4.18 >> >> # v4.19+ >> >> Do you have a preference? > > I think we just list v4.17 and be done with it. Yep, anyone who wants to backport it can work it out, or ask us. cheers
diff --git a/drivers/of/base.c b/drivers/of/base.c index 6c33d63361b8..ad71864cecf5 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -162,6 +162,27 @@ int of_free_phandle_cache(void) late_initcall_sync(of_free_phandle_cache); #endif +/* + * Caller must hold devtree_lock. + */ +void __of_free_phandle_cache_entry(phandle handle) +{ + phandle masked_handle; + + if (!handle) + return; + + masked_handle = handle & phandle_cache_mask; + + if (phandle_cache) { + if (phandle_cache[masked_handle] && + handle == phandle_cache[masked_handle]->phandle) { + of_node_put(phandle_cache[masked_handle]); + phandle_cache[masked_handle] = NULL; + } + } +} + void of_populate_phandle_cache(void) { unsigned long flags; @@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle handle) if (phandle_cache[masked_handle] && handle == phandle_cache[masked_handle]->phandle) np = phandle_cache[masked_handle]; + if (np && of_node_check_flag(np, OF_DETACHED)) { + WARN_ON(1); + of_node_put(np); + phandle_cache[masked_handle] = NULL; + np = NULL; + } } if (!np) { for_each_of_allnodes(np) - if (np->phandle == handle) { + if (np->phandle == handle && + !of_node_check_flag(np, OF_DETACHED)) { if (phandle_cache) { /* will put when removed from cache */ of_node_get(np); diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index f4f8ed9b5454..ecea92f68c87 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -268,6 +268,9 @@ void __of_detach_node(struct device_node *np) } of_node_set_flag(np, OF_DETACHED); + + /* race with of_find_node_by_phandle() prevented by devtree_lock */ + __of_free_phandle_cache_entry(np->phandle); } /** diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index 5d1567025358..24786818e32e 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -84,6 +84,10 @@ static inline void __of_detach_node_sysfs(struct device_node *np) {} int of_resolve_phandles(struct device_node *tree); #endif +#if defined(CONFIG_OF_DYNAMIC) +void __of_free_phandle_cache_entry(phandle handle); +#endif + #if defined(CONFIG_OF_OVERLAY) void of_overlay_mutex_lock(void); void of_overlay_mutex_unlock(void);