Message ID | 20220517054600.286384-1-frank.heimes@canonical.com |
---|---|
Headers | show |
Series | Null Pointer issue in nfs code running Ubuntu on IBM Z (LP: 1968096) | expand |
Acked-by: Tim Gardner <tim.gardner@canonical.com> On 5/16/22 23:45, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/1968096 > > SRU Justification: > > [Impact] > > * The kernel crashed under load with a null pointer issue in nfs code: > [556585.270959] Krnl Code:#0000000000000000: 0000 illegal > >0000000000000002: 0000 illegal > 0000000000000004: 0000 illegal > 0000000000000006: 0000 illegal > 0000000000000008: 0000 illegal > 000000000000000a: 0000 illegal > 000000000000000c: 0000 illegal > 000000000000000e: 0000 illegal > [556585.270967] Call Trace: > [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) > [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] > [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] > [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] > [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 > [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 > [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 > [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 > > * Several dumps were generated and shared with Canonical. > > * Analysis (done by kernel and SEG) point to refcount leaks fixed, > that are already fixed in the following commit/fix: > > [Fix] > > * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" > > [Test Case] > > * There is unfortunately no reproducer or trigger available for this issue. > > * It just happens now and then under higher load. > > * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and > ran for more than a week in a special staging environment (at IBM) > without further crashes. > > * Hence the test and verification will be done by the IBM Z team. > > [Where problems could occur] > > * The inode handling can become broken, in case the changes > on the pointers are erroneous. > > * Problems with the authentication and/or the credentials could occur > due to the modifications in put_rpccred, rpc_cred and rpc_auth. > > * The expiration of the cached credentials could be harmed as well, > due to the changes in nfs_ctx_key_to_expire. > > * The different pointer artihmetic may cause further issues - wrong > or null pointer references. > > * Positive is that the original commit was brought upstream by nfs experts. > > * A patched test kernel sustained day long runs under load in a staging > and test environment. > > * The author of the upstream commit/patch is well known in the NFS area. > > [Other] > > * The Salesforce Case Number 00334334 is associated with this bug. > > * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. > > * But commit ca05cbae2a04 was unfortunately not tagged as stable, > hence it was not picked automatically. > > * Since kinetic's (22.10) target kernel is 5.18, > it will have the patch included, > hence no dedicated PATCH request for kinetic. > > Trond Myklebust (1): > NFS: Fix up nfs_ctx_key_to_expire() > > fs/nfs/inode.c | 4 ++-- > fs/nfs/write.c | 41 ++++++++++++++++++++++++++++------------- > include/linux/nfs_fs.h | 2 +- > 3 files changed, 31 insertions(+), 16 deletions(-) >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> On Tue, May 17 2022, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/1968096 > > SRU Justification: > > [Impact] > > * The kernel crashed under load with a null pointer issue in nfs code: > [556585.270959] Krnl Code:#0000000000000000: 0000 illegal > >0000000000000002: 0000 illegal > 0000000000000004: 0000 illegal > 0000000000000006: 0000 illegal > 0000000000000008: 0000 illegal > 000000000000000a: 0000 illegal > 000000000000000c: 0000 illegal > 000000000000000e: 0000 illegal > [556585.270967] Call Trace: > [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) > [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] > [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] > [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] > [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 > [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 > [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 > [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 > > * Several dumps were generated and shared with Canonical. > > * Analysis (done by kernel and SEG) point to refcount leaks fixed, > that are already fixed in the following commit/fix: > > [Fix] > > * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" > > [Test Case] > > * There is unfortunately no reproducer or trigger available for this issue. > > * It just happens now and then under higher load. > > * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and > ran for more than a week in a special staging environment (at IBM) > without further crashes. > > * Hence the test and verification will be done by the IBM Z team. > > [Where problems could occur] > > * The inode handling can become broken, in case the changes > on the pointers are erroneous. > > * Problems with the authentication and/or the credentials could occur > due to the modifications in put_rpccred, rpc_cred and rpc_auth. > > * The expiration of the cached credentials could be harmed as well, > due to the changes in nfs_ctx_key_to_expire. > > * The different pointer artihmetic may cause further issues - wrong > or null pointer references. > > * Positive is that the original commit was brought upstream by nfs experts. > > * A patched test kernel sustained day long runs under load in a staging > and test environment. > > * The author of the upstream commit/patch is well known in the NFS area. > > [Other] > > * The Salesforce Case Number 00334334 is associated with this bug. > > * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. > > * But commit ca05cbae2a04 was unfortunately not tagged as stable, > hence it was not picked automatically. > > * Since kinetic's (22.10) target kernel is 5.18, > it will have the patch included, > hence no dedicated PATCH request for kinetic. > > Trond Myklebust (1): > NFS: Fix up nfs_ctx_key_to_expire() > > fs/nfs/inode.c | 4 ++-- > fs/nfs/write.c | 41 ++++++++++++++++++++++++++++------------- > include/linux/nfs_fs.h | 2 +- > 3 files changed, 31 insertions(+), 16 deletions(-) > > -- > 2.25.1 - -- Regards, Marcelo -----BEGIN PGP SIGNATURE----- iQGzBAEBCgAdFiEExJjLjAfVL0XbfEr56e82LoessAkFAmKFIiEACgkQ6e82Loes sAmDggwAmqhQrDu02RT/3yu3tSe1x9GMNKic9aT/DXjrSPwRqRzs8v3fprVkt7l6 PvSyg4Hu9A66X1EYCN3Bp2WT/AhdWSUm+kqDuSjaXeQb6lPbWaDcvziDXQD4iJn4 wV7KTEK0VtNXGEYTqrE6UxzP9XzWk07XAlSncyiYzmgkh8URavYeLLOqUjxVRhxG 7NJrXsPP7rh96yqouX2I0eG5SIzsdZ5jHE5QjkBL0C8vY1KwhC8FodNM3SEPP6eL HPoweX5Lu3lcjjnvtHltWhsFZICPJSS36ZtQPfU/L4yL5rw1zTjjSwjhi2mylp+2 9+y6YF6jtj42mMhibpR0KATcCnSeEQ4qQWJ5beZdXkXBEdj5ePGipbMOCkniGN3T 6lcNY1e7Cs6yE4M3VVP0eP2WWSTqG/NwMETd1UXuooEH141zzudn+UOP0UyLrtb3 jnFBcM3go9RY/PfIQY0b+wEbV7Ys9yprIESlJZzE7MeWwNj7Y7spK/NbyskcVJvE NCboyP12 =M8hQ -----END PGP SIGNATURE-----
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com> On Tue, May 17, 2022 at 7:47 AM <frank.heimes@canonical.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/1968096 > > SRU Justification: > > [Impact] > > * The kernel crashed under load with a null pointer issue in nfs code: > [556585.270959] Krnl Code:#0000000000000000: 0000 illegal > >0000000000000002: 0000 illegal > 0000000000000004: 0000 illegal > 0000000000000006: 0000 illegal > 0000000000000008: 0000 illegal > 000000000000000a: 0000 illegal > 000000000000000c: 0000 illegal > 000000000000000e: 0000 illegal > [556585.270967] Call Trace: > [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) > [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] > [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] > [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] > [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 > [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 > [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 > [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 > > * Several dumps were generated and shared with Canonical. > > * Analysis (done by kernel and SEG) point to refcount leaks fixed, > that are already fixed in the following commit/fix: > > [Fix] > > * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" > > [Test Case] > > * There is unfortunately no reproducer or trigger available for this issue. > > * It just happens now and then under higher load. > > * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and > ran for more than a week in a special staging environment (at IBM) > without further crashes. > > * Hence the test and verification will be done by the IBM Z team. > > [Where problems could occur] > > * The inode handling can become broken, in case the changes > on the pointers are erroneous. > > * Problems with the authentication and/or the credentials could occur > due to the modifications in put_rpccred, rpc_cred and rpc_auth. > > * The expiration of the cached credentials could be harmed as well, > due to the changes in nfs_ctx_key_to_expire. > > * The different pointer artihmetic may cause further issues - wrong > or null pointer references. > > * Positive is that the original commit was brought upstream by nfs experts. > > * A patched test kernel sustained day long runs under load in a staging > and test environment. > > * The author of the upstream commit/patch is well known in the NFS area. > > [Other] > > * The Salesforce Case Number 00334334 is associated with this bug. > > * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. > > * But commit ca05cbae2a04 was unfortunately not tagged as stable, > hence it was not picked automatically. > > * Since kinetic's (22.10) target kernel is 5.18, > it will have the patch included, > hence no dedicated PATCH request for kinetic. > > Trond Myklebust (1): > NFS: Fix up nfs_ctx_key_to_expire() > > fs/nfs/inode.c | 4 ++-- > fs/nfs/write.c | 41 ++++++++++++++++++++++++++++------------- > include/linux/nfs_fs.h | 2 +- > 3 files changed, 31 insertions(+), 16 deletions(-) > > -- > 2.25.1
On 17.05.22 07:45, frank.heimes@canonical.com wrote: > BugLink: https://bugs.launchpad.net/bugs/1968096 > > SRU Justification: > > [Impact] > > * The kernel crashed under load with a null pointer issue in nfs code: > [556585.270959] Krnl Code:#0000000000000000: 0000 illegal > >0000000000000002: 0000 illegal > 0000000000000004: 0000 illegal > 0000000000000006: 0000 illegal > 0000000000000008: 0000 illegal > 000000000000000a: 0000 illegal > 000000000000000c: 0000 illegal > 000000000000000e: 0000 illegal > [556585.270967] Call Trace: > [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) > [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] > [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] > [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] > [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 > [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 > [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 > [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 > > * Several dumps were generated and shared with Canonical. > > * Analysis (done by kernel and SEG) point to refcount leaks fixed, > that are already fixed in the following commit/fix: > > [Fix] > > * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" > > [Test Case] > > * There is unfortunately no reproducer or trigger available for this issue. > > * It just happens now and then under higher load. > > * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and > ran for more than a week in a special staging environment (at IBM) > without further crashes. > > * Hence the test and verification will be done by the IBM Z team. > > [Where problems could occur] > > * The inode handling can become broken, in case the changes > on the pointers are erroneous. > > * Problems with the authentication and/or the credentials could occur > due to the modifications in put_rpccred, rpc_cred and rpc_auth. > > * The expiration of the cached credentials could be harmed as well, > due to the changes in nfs_ctx_key_to_expire. > > * The different pointer artihmetic may cause further issues - wrong > or null pointer references. > > * Positive is that the original commit was brought upstream by nfs experts. > > * A patched test kernel sustained day long runs under load in a staging > and test environment. > > * The author of the upstream commit/patch is well known in the NFS area. > > [Other] > > * The Salesforce Case Number 00334334 is associated with this bug. > > * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. > > * But commit ca05cbae2a04 was unfortunately not tagged as stable, > hence it was not picked automatically. > > * Since kinetic's (22.10) target kernel is 5.18, > it will have the patch included, > hence no dedicated PATCH request for kinetic. > > Trond Myklebust (1): > NFS: Fix up nfs_ctx_key_to_expire() > > fs/nfs/inode.c | 4 ++-- > fs/nfs/write.c | 41 ++++++++++++++++++++++++++++------------- > include/linux/nfs_fs.h | 2 +- > 3 files changed, 31 insertions(+), 16 deletions(-) > Applied to focal/impish/jammy:linux. Thanks, Kleber
BugLink: https://bugs.launchpad.net/bugs/1968096 SRU Justification: [Impact] * The kernel crashed under load with a null pointer issue in nfs code: [556585.270959] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 * Several dumps were generated and shared with Canonical. * Analysis (done by kernel and SEG) point to refcount leaks fixed, that are already fixed in the following commit/fix: [Fix] * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()" [Test Case] * There is unfortunately no reproducer or trigger available for this issue. * It just happens now and then under higher load. * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and ran for more than a week in a special staging environment (at IBM) without further crashes. * Hence the test and verification will be done by the IBM Z team. [Where problems could occur] * The inode handling can become broken, in case the changes on the pointers are erroneous. * Problems with the authentication and/or the credentials could occur due to the modifications in put_rpccred, rpc_cred and rpc_auth. * The expiration of the cached credentials could be harmed as well, due to the changes in nfs_ctx_key_to_expire. * The different pointer artihmetic may cause further issues - wrong or null pointer references. * Positive is that the original commit was brought upstream by nfs experts. * A patched test kernel sustained day long runs under load in a staging and test environment. * The author of the upstream commit/patch is well known in the NFS area. [Other] * The Salesforce Case Number 00334334 is associated with this bug. * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1. * But commit ca05cbae2a04 was unfortunately not tagged as stable, hence it was not picked automatically. * Since kinetic's (22.10) target kernel is 5.18, it will have the patch included, hence no dedicated PATCH request for kinetic. Trond Myklebust (1): NFS: Fix up nfs_ctx_key_to_expire() fs/nfs/inode.c | 4 ++-- fs/nfs/write.c | 41 ++++++++++++++++++++++++++++------------- include/linux/nfs_fs.h | 2 +- 3 files changed, 31 insertions(+), 16 deletions(-)