Message ID | 165603870776.551046.8709990108936497723.stgit@dwillia2-xfh |
---|---|
State | New |
Headers | show |
Series | CXL PMEM Region Provisioning | expand |
On Thu, 23 Jun 2022 19:45:07 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > This failing signature: > > [ 8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760 > [ 8.392670] cxl_port: probe of endpoint2 failed with error 970997760 > [ 8.392719] create_endpoint: cxl_mem mem0: add: endpoint2 > [ 8.392721] cxl_mem mem0: endpoint2 failed probe > [ 8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6 > > ...shows cxl_hdm_decode_init() resulting in a return code ("970997760") > that looks like stack corruption. The problem goes away if > cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init(). > > The corruption results from the mismatch that the calling convention for > cxl_hdm_decode_init() is: > > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > ...and __wrap_cxl_hdm_decode_init() is: > > bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool. > > Fix the convention and cleanup the organization to match > __wrap_cxl_await_media_ready() as the difference was a red herring that > distracted from finding the bug. > > Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()") > Signed-off-by: Dan Williams <dan.j.williams@intel.com> LGTM Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > tools/testing/cxl/test/mock.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c > index f1f8c40948c5..bce6a21df0d5 100644 > --- a/tools/testing/cxl/test/mock.c > +++ b/tools/testing/cxl/test/mock.c > @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds) > } > EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL); > > -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > - struct cxl_hdm *cxlhdm) > +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > + struct cxl_hdm *cxlhdm) > { > int rc = 0, index; > struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); > > - if (!ops || !ops->is_mock_dev(cxlds->dev)) > + if (ops && ops->is_mock_dev(cxlds->dev)) > + rc = 0; > + else > rc = cxl_hdm_decode_init(cxlds, cxlhdm); > put_cxl_mock_ops(index); > >
On Thu, Jun 23, 2022 at 07:45:07PM -0700, Dan Williams wrote: > This failing signature: > > [ 8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760 > [ 8.392670] cxl_port: probe of endpoint2 failed with error 970997760 > [ 8.392719] create_endpoint: cxl_mem mem0: add: endpoint2 > [ 8.392721] cxl_mem mem0: endpoint2 failed probe > [ 8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6 > > ...shows cxl_hdm_decode_init() resulting in a return code ("970997760") > that looks like stack corruption. The problem goes away if > cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init(). > > The corruption results from the mismatch that the calling convention for > cxl_hdm_decode_init() is: > > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > ...and __wrap_cxl_hdm_decode_init() is: > > bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool. > > Fix the convention and cleanup the organization to match > __wrap_cxl_await_media_ready() as the difference was a red herring that > distracted from finding the bug. > > Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()") > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > tools/testing/cxl/test/mock.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c > index f1f8c40948c5..bce6a21df0d5 100644 > --- a/tools/testing/cxl/test/mock.c > +++ b/tools/testing/cxl/test/mock.c > @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds) > } > EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL); > > -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > - struct cxl_hdm *cxlhdm) > +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > + struct cxl_hdm *cxlhdm) > { > int rc = 0, index; > struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); > > - if (!ops || !ops->is_mock_dev(cxlds->dev)) > + if (ops && ops->is_mock_dev(cxlds->dev)) > + rc = 0; > + else > rc = cxl_hdm_decode_init(cxlds, cxlhdm); > put_cxl_mock_ops(index); > > Looks good. Reviewed by: Adam Manzanares <a.manzanares@samsung.com> >
Adam Manzanares wrote: > On Thu, Jun 23, 2022 at 07:45:07PM -0700, Dan Williams wrote: > > This failing signature: > > > > [ 8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760 > > [ 8.392670] cxl_port: probe of endpoint2 failed with error 970997760 > > [ 8.392719] create_endpoint: cxl_mem mem0: add: endpoint2 > > [ 8.392721] cxl_mem mem0: endpoint2 failed probe > > [ 8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6 > > > > ...shows cxl_hdm_decode_init() resulting in a return code ("970997760") > > that looks like stack corruption. The problem goes away if > > cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init(). > > > > The corruption results from the mismatch that the calling convention for > > cxl_hdm_decode_init() is: > > > > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > > > ...and __wrap_cxl_hdm_decode_init() is: > > > > bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > > > ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool. > > > > Fix the convention and cleanup the organization to match > > __wrap_cxl_await_media_ready() as the difference was a red herring that > > distracted from finding the bug. > > > > Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()") > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > --- > > tools/testing/cxl/test/mock.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c > > index f1f8c40948c5..bce6a21df0d5 100644 > > --- a/tools/testing/cxl/test/mock.c > > +++ b/tools/testing/cxl/test/mock.c > > @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds) > > } > > EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL); > > > > -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > > - struct cxl_hdm *cxlhdm) > > +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > > + struct cxl_hdm *cxlhdm) > > { > > int rc = 0, index; > > struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); > > > > - if (!ops || !ops->is_mock_dev(cxlds->dev)) > > + if (ops && ops->is_mock_dev(cxlds->dev)) > > + rc = 0; > > + else > > rc = cxl_hdm_decode_init(cxlds, cxlhdm); > > put_cxl_mock_ops(index); > > > > > > > Looks good. > > Reviewed by: Adam Manzanares <a.manzanares@samsung.com> Just fyi, b4 did not auto-apply this tag due to the missing "-", caught it manually.
On Sat, Jul 09, 2022 at 01:06:36PM -0700, Dan Williams wrote: > Adam Manzanares wrote: > > On Thu, Jun 23, 2022 at 07:45:07PM -0700, Dan Williams wrote: > > > This failing signature: > > > > > > [ 8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760 > > > [ 8.392670] cxl_port: probe of endpoint2 failed with error 970997760 > > > [ 8.392719] create_endpoint: cxl_mem mem0: add: endpoint2 > > > [ 8.392721] cxl_mem mem0: endpoint2 failed probe > > > [ 8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6 > > > > > > ...shows cxl_hdm_decode_init() resulting in a return code ("970997760") > > > that looks like stack corruption. The problem goes away if > > > cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init(). > > > > > > The corruption results from the mismatch that the calling convention for > > > cxl_hdm_decode_init() is: > > > > > > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > > > > > ...and __wrap_cxl_hdm_decode_init() is: > > > > > > bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) > > > > > > ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool. > > > > > > Fix the convention and cleanup the organization to match > > > __wrap_cxl_await_media_ready() as the difference was a red herring that > > > distracted from finding the bug. > > > > > > Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()") > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > > --- > > > tools/testing/cxl/test/mock.c | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c > > > index f1f8c40948c5..bce6a21df0d5 100644 > > > --- a/tools/testing/cxl/test/mock.c > > > +++ b/tools/testing/cxl/test/mock.c > > > @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds) > > > } > > > EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL); > > > > > > -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > > > - struct cxl_hdm *cxlhdm) > > > +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, > > > + struct cxl_hdm *cxlhdm) > > > { > > > int rc = 0, index; > > > struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); > > > > > > - if (!ops || !ops->is_mock_dev(cxlds->dev)) > > > + if (ops && ops->is_mock_dev(cxlds->dev)) > > > + rc = 0; > > > + else > > > rc = cxl_hdm_decode_init(cxlds, cxlhdm); > > > put_cxl_mock_ops(index); > > > > > > > > > > > > Looks good. > > > > Reviewed by: Adam Manzanares <a.manzanares@samsung.com> > > Just fyi, b4 did not auto-apply this tag due to the missing "-", caught > it manually. Ouch, thanks for pointing this out. Updated my template.
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c index f1f8c40948c5..bce6a21df0d5 100644 --- a/tools/testing/cxl/test/mock.c +++ b/tools/testing/cxl/test/mock.c @@ -208,13 +208,15 @@ int __wrap_cxl_await_media_ready(struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(__wrap_cxl_await_media_ready, CXL); -bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, - struct cxl_hdm *cxlhdm) +int __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, + struct cxl_hdm *cxlhdm) { int rc = 0, index; struct cxl_mock_ops *ops = get_cxl_mock_ops(&index); - if (!ops || !ops->is_mock_dev(cxlds->dev)) + if (ops && ops->is_mock_dev(cxlds->dev)) + rc = 0; + else rc = cxl_hdm_decode_init(cxlds, cxlhdm); put_cxl_mock_ops(index);
This failing signature: [ 8.392669] cxl_bus_probe: cxl_port endpoint2: probe: 970997760 [ 8.392670] cxl_port: probe of endpoint2 failed with error 970997760 [ 8.392719] create_endpoint: cxl_mem mem0: add: endpoint2 [ 8.392721] cxl_mem mem0: endpoint2 failed probe [ 8.392725] cxl_bus_probe: cxl_mem mem0: probe: -6 ...shows cxl_hdm_decode_init() resulting in a return code ("970997760") that looks like stack corruption. The problem goes away if cxl_hdm_decode_init() is not mocked via __wrap_cxl_hdm_decode_init(). The corruption results from the mismatch that the calling convention for cxl_hdm_decode_init() is: int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) ...and __wrap_cxl_hdm_decode_init() is: bool __wrap_cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm) ...i.e. an int is expected but __wrap_hdm_decode_init() returns bool. Fix the convention and cleanup the organization to match __wrap_cxl_await_media_ready() as the difference was a red herring that distracted from finding the bug. Fixes: 92804edb11f0 ("cxl/pci: Drop @info argument to cxl_hdm_decode_init()") Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- tools/testing/cxl/test/mock.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)