mbox series

[SRU,J,I,F,0/1] Null Pointer issue in nfs code running Ubuntu on IBM Z (LP: 1968096)

Message ID 20220517054600.286384-1-frank.heimes@canonical.com
Headers show
Series Null Pointer issue in nfs code running Ubuntu on IBM Z (LP: 1968096) | expand

Message

Frank Heimes May 17, 2022, 5:45 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1968096

SRU Justification:

[Impact]

* The kernel crashed under load with a null pointer issue in nfs code:
    [556585.270959] Krnl Code:#0000000000000000: 0000 illegal
                              >0000000000000002: 0000 illegal
                               0000000000000004: 0000 illegal
                               0000000000000006: 0000 illegal
                               0000000000000008: 0000 illegal
                               000000000000000a: 0000 illegal
                               000000000000000c: 0000 illegal
                               000000000000000e: 0000 illegal
    [556585.270967] Call Trace:
    [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc])
    [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs]
    [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs]
    [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs]
    [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0
    [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0
    [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0
    [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8

* Several dumps were generated and shared with Canonical.

* Analysis (done by kernel and SEG) point to refcount leaks fixed,
  that are already fixed in the following commit/fix:

[Fix]

* ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()"

[Test Case]

* There is unfortunately no reproducer or trigger available for this issue.

* It just happens now and then under higher load.

* A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and
  ran for more than a week in a special staging environment (at IBM)
  without further crashes.

* Hence the test and verification will be done by the IBM Z team.

[Where problems could occur]

* The inode handling can become broken, in case the changes
  on the pointers are erroneous.

* Problems with the authentication and/or the credentials could occur
  due to the modifications in put_rpccred, rpc_cred and rpc_auth.

* The expiration of the cached credentials could be harmed as well,
  due to the changes in nfs_ctx_key_to_expire.

* The different pointer artihmetic may cause further issues - wrong
  or null pointer references.

* Positive is that the original commit was brought upstream by nfs experts.

* A patched test kernel sustained day long runs under load in a staging
  and test environment.

* The author of the upstream commit/patch is well known in the NFS area.

[Other]

* The Salesforce Case Number 00334334 is associated with this bug.

* Commit ca05cbae2a04 was upstream accepted with 5.16-rc1.

* But commit ca05cbae2a04 was unfortunately not tagged as stable,
  hence it was not picked automatically.

* Since kinetic's (22.10) target kernel is 5.18,
  it will have the patch included,
  hence no dedicated PATCH request for kinetic.

Trond Myklebust (1):
  NFS: Fix up nfs_ctx_key_to_expire()

 fs/nfs/inode.c         |  4 ++--
 fs/nfs/write.c         | 41 ++++++++++++++++++++++++++++-------------
 include/linux/nfs_fs.h |  2 +-
 3 files changed, 31 insertions(+), 16 deletions(-)

Comments

Tim Gardner May 17, 2022, 12:18 p.m. UTC | #1
Acked-by: Tim Gardner <tim.gardner@canonical.com>

On 5/16/22 23:45, frank.heimes@canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/1968096
> 
> SRU Justification:
> 
> [Impact]
> 
> * The kernel crashed under load with a null pointer issue in nfs code:
>      [556585.270959] Krnl Code:#0000000000000000: 0000 illegal
>                                >0000000000000002: 0000 illegal
>                                 0000000000000004: 0000 illegal
>                                 0000000000000006: 0000 illegal
>                                 0000000000000008: 0000 illegal
>                                 000000000000000a: 0000 illegal
>                                 000000000000000c: 0000 illegal
>                                 000000000000000e: 0000 illegal
>      [556585.270967] Call Trace:
>      [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc])
>      [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs]
>      [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs]
>      [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs]
>      [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0
>      [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0
>      [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0
>      [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8
> 
> * Several dumps were generated and shared with Canonical.
> 
> * Analysis (done by kernel and SEG) point to refcount leaks fixed,
>    that are already fixed in the following commit/fix:
> 
> [Fix]
> 
> * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()"
> 
> [Test Case]
> 
> * There is unfortunately no reproducer or trigger available for this issue.
> 
> * It just happens now and then under higher load.
> 
> * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and
>    ran for more than a week in a special staging environment (at IBM)
>    without further crashes.
> 
> * Hence the test and verification will be done by the IBM Z team.
> 
> [Where problems could occur]
> 
> * The inode handling can become broken, in case the changes
>    on the pointers are erroneous.
> 
> * Problems with the authentication and/or the credentials could occur
>    due to the modifications in put_rpccred, rpc_cred and rpc_auth.
> 
> * The expiration of the cached credentials could be harmed as well,
>    due to the changes in nfs_ctx_key_to_expire.
> 
> * The different pointer artihmetic may cause further issues - wrong
>    or null pointer references.
> 
> * Positive is that the original commit was brought upstream by nfs experts.
> 
> * A patched test kernel sustained day long runs under load in a staging
>    and test environment.
> 
> * The author of the upstream commit/patch is well known in the NFS area.
> 
> [Other]
> 
> * The Salesforce Case Number 00334334 is associated with this bug.
> 
> * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1.
> 
> * But commit ca05cbae2a04 was unfortunately not tagged as stable,
>    hence it was not picked automatically.
> 
> * Since kinetic's (22.10) target kernel is 5.18,
>    it will have the patch included,
>    hence no dedicated PATCH request for kinetic.
> 
> Trond Myklebust (1):
>    NFS: Fix up nfs_ctx_key_to_expire()
> 
>   fs/nfs/inode.c         |  4 ++--
>   fs/nfs/write.c         | 41 ++++++++++++++++++++++++++++-------------
>   include/linux/nfs_fs.h |  2 +-
>   3 files changed, 31 insertions(+), 16 deletions(-)
>
Marcelo Henrique Cerri May 18, 2022, 4:43 p.m. UTC | #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>

On Tue, May 17 2022, frank.heimes@canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/1968096
>
> SRU Justification:
>
> [Impact]
>
> * The kernel crashed under load with a null pointer issue in nfs code:
>     [556585.270959] Krnl Code:#0000000000000000: 0000 illegal
>                               >0000000000000002: 0000 illegal
>                                0000000000000004: 0000 illegal
>                                0000000000000006: 0000 illegal
>                                0000000000000008: 0000 illegal
>                                000000000000000a: 0000 illegal
>                                000000000000000c: 0000 illegal
>                                000000000000000e: 0000 illegal
>     [556585.270967] Call Trace:
>     [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc])
>     [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs]
>     [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs]
>     [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs]
>     [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0
>     [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0
>     [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0
>     [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8
>
> * Several dumps were generated and shared with Canonical.
>
> * Analysis (done by kernel and SEG) point to refcount leaks fixed,
>   that are already fixed in the following commit/fix:
>
> [Fix]
>
> * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()"
>
> [Test Case]
>
> * There is unfortunately no reproducer or trigger available for this issue.
>
> * It just happens now and then under higher load.
>
> * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and
>   ran for more than a week in a special staging environment (at IBM)
>   without further crashes.
>
> * Hence the test and verification will be done by the IBM Z team.
>
> [Where problems could occur]
>
> * The inode handling can become broken, in case the changes
>   on the pointers are erroneous.
>
> * Problems with the authentication and/or the credentials could occur
>   due to the modifications in put_rpccred, rpc_cred and rpc_auth.
>
> * The expiration of the cached credentials could be harmed as well,
>   due to the changes in nfs_ctx_key_to_expire.
>
> * The different pointer artihmetic may cause further issues - wrong
>   or null pointer references.
>
> * Positive is that the original commit was brought upstream by nfs experts.
>
> * A patched test kernel sustained day long runs under load in a staging
>   and test environment.
>
> * The author of the upstream commit/patch is well known in the NFS area.
>
> [Other]
>
> * The Salesforce Case Number 00334334 is associated with this bug.
>
> * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1.
>
> * But commit ca05cbae2a04 was unfortunately not tagged as stable,
>   hence it was not picked automatically.
>
> * Since kinetic's (22.10) target kernel is 5.18,
>   it will have the patch included,
>   hence no dedicated PATCH request for kinetic.
>
> Trond Myklebust (1):
>   NFS: Fix up nfs_ctx_key_to_expire()
>
>  fs/nfs/inode.c         |  4 ++--
>  fs/nfs/write.c         | 41 ++++++++++++++++++++++++++++-------------
>  include/linux/nfs_fs.h |  2 +-
>  3 files changed, 31 insertions(+), 16 deletions(-)
>
> --
> 2.25.1


- --
Regards,
Marcelo
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEExJjLjAfVL0XbfEr56e82LoessAkFAmKFIiEACgkQ6e82Loes
sAmDggwAmqhQrDu02RT/3yu3tSe1x9GMNKic9aT/DXjrSPwRqRzs8v3fprVkt7l6
PvSyg4Hu9A66X1EYCN3Bp2WT/AhdWSUm+kqDuSjaXeQb6lPbWaDcvziDXQD4iJn4
wV7KTEK0VtNXGEYTqrE6UxzP9XzWk07XAlSncyiYzmgkh8URavYeLLOqUjxVRhxG
7NJrXsPP7rh96yqouX2I0eG5SIzsdZ5jHE5QjkBL0C8vY1KwhC8FodNM3SEPP6eL
HPoweX5Lu3lcjjnvtHltWhsFZICPJSS36ZtQPfU/L4yL5rw1zTjjSwjhi2mylp+2
9+y6YF6jtj42mMhibpR0KATcCnSeEQ4qQWJ5beZdXkXBEdj5ePGipbMOCkniGN3T
6lcNY1e7Cs6yE4M3VVP0eP2WWSTqG/NwMETd1UXuooEH141zzudn+UOP0UyLrtb3
jnFBcM3go9RY/PfIQY0b+wEbV7Ys9yprIESlJZzE7MeWwNj7Y7spK/NbyskcVJvE
NCboyP12
=M8hQ
-----END PGP SIGNATURE-----
Bartlomiej Zolnierkiewicz May 18, 2022, 4:43 p.m. UTC | #3
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com>

On Tue, May 17, 2022 at 7:47 AM <frank.heimes@canonical.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/1968096
>
> SRU Justification:
>
> [Impact]
>
> * The kernel crashed under load with a null pointer issue in nfs code:
>     [556585.270959] Krnl Code:#0000000000000000: 0000 illegal
>                               >0000000000000002: 0000 illegal
>                                0000000000000004: 0000 illegal
>                                0000000000000006: 0000 illegal
>                                0000000000000008: 0000 illegal
>                                000000000000000a: 0000 illegal
>                                000000000000000c: 0000 illegal
>                                000000000000000e: 0000 illegal
>     [556585.270967] Call Trace:
>     [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc])
>     [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs]
>     [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs]
>     [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs]
>     [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0
>     [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0
>     [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0
>     [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8
>
> * Several dumps were generated and shared with Canonical.
>
> * Analysis (done by kernel and SEG) point to refcount leaks fixed,
>   that are already fixed in the following commit/fix:
>
> [Fix]
>
> * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()"
>
> [Test Case]
>
> * There is unfortunately no reproducer or trigger available for this issue.
>
> * It just happens now and then under higher load.
>
> * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and
>   ran for more than a week in a special staging environment (at IBM)
>   without further crashes.
>
> * Hence the test and verification will be done by the IBM Z team.
>
> [Where problems could occur]
>
> * The inode handling can become broken, in case the changes
>   on the pointers are erroneous.
>
> * Problems with the authentication and/or the credentials could occur
>   due to the modifications in put_rpccred, rpc_cred and rpc_auth.
>
> * The expiration of the cached credentials could be harmed as well,
>   due to the changes in nfs_ctx_key_to_expire.
>
> * The different pointer artihmetic may cause further issues - wrong
>   or null pointer references.
>
> * Positive is that the original commit was brought upstream by nfs experts.
>
> * A patched test kernel sustained day long runs under load in a staging
>   and test environment.
>
> * The author of the upstream commit/patch is well known in the NFS area.
>
> [Other]
>
> * The Salesforce Case Number 00334334 is associated with this bug.
>
> * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1.
>
> * But commit ca05cbae2a04 was unfortunately not tagged as stable,
>   hence it was not picked automatically.
>
> * Since kinetic's (22.10) target kernel is 5.18,
>   it will have the patch included,
>   hence no dedicated PATCH request for kinetic.
>
> Trond Myklebust (1):
>   NFS: Fix up nfs_ctx_key_to_expire()
>
>  fs/nfs/inode.c         |  4 ++--
>  fs/nfs/write.c         | 41 ++++++++++++++++++++++++++++-------------
>  include/linux/nfs_fs.h |  2 +-
>  3 files changed, 31 insertions(+), 16 deletions(-)
>
> --
> 2.25.1
Kleber Souza May 27, 2022, 8:36 a.m. UTC | #4
On 17.05.22 07:45, frank.heimes@canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/1968096
> 
> SRU Justification:
> 
> [Impact]
> 
> * The kernel crashed under load with a null pointer issue in nfs code:
>      [556585.270959] Krnl Code:#0000000000000000: 0000 illegal
>                                >0000000000000002: 0000 illegal
>                                 0000000000000004: 0000 illegal
>                                 0000000000000006: 0000 illegal
>                                 0000000000000008: 0000 illegal
>                                 000000000000000a: 0000 illegal
>                                 000000000000000c: 0000 illegal
>                                 000000000000000e: 0000 illegal
>      [556585.270967] Call Trace:
>      [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc])
>      [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs]
>      [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs]
>      [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs]
>      [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0
>      [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0
>      [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0
>      [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8
> 
> * Several dumps were generated and shared with Canonical.
> 
> * Analysis (done by kernel and SEG) point to refcount leaks fixed,
>    that are already fixed in the following commit/fix:
> 
> [Fix]
> 
> * ca05cbae2a0468e5d78e9b4605936a8bf5da328b ca05cbae2a04 "NFS: Fix up nfs_ctx_key_to_expire()"
> 
> [Test Case]
> 
> * There is unfortunately no reproducer or trigger available for this issue.
> 
> * It just happens now and then under higher load.
> 
> * A patched kernel (focal 5.4 and bionic 5.4-hwe) were created and
>    ran for more than a week in a special staging environment (at IBM)
>    without further crashes.
> 
> * Hence the test and verification will be done by the IBM Z team.
> 
> [Where problems could occur]
> 
> * The inode handling can become broken, in case the changes
>    on the pointers are erroneous.
> 
> * Problems with the authentication and/or the credentials could occur
>    due to the modifications in put_rpccred, rpc_cred and rpc_auth.
> 
> * The expiration of the cached credentials could be harmed as well,
>    due to the changes in nfs_ctx_key_to_expire.
> 
> * The different pointer artihmetic may cause further issues - wrong
>    or null pointer references.
> 
> * Positive is that the original commit was brought upstream by nfs experts.
> 
> * A patched test kernel sustained day long runs under load in a staging
>    and test environment.
> 
> * The author of the upstream commit/patch is well known in the NFS area.
> 
> [Other]
> 
> * The Salesforce Case Number 00334334 is associated with this bug.
> 
> * Commit ca05cbae2a04 was upstream accepted with 5.16-rc1.
> 
> * But commit ca05cbae2a04 was unfortunately not tagged as stable,
>    hence it was not picked automatically.
> 
> * Since kinetic's (22.10) target kernel is 5.18,
>    it will have the patch included,
>    hence no dedicated PATCH request for kinetic.
> 
> Trond Myklebust (1):
>    NFS: Fix up nfs_ctx_key_to_expire()
> 
>   fs/nfs/inode.c         |  4 ++--
>   fs/nfs/write.c         | 41 ++++++++++++++++++++++++++++-------------
>   include/linux/nfs_fs.h |  2 +-
>   3 files changed, 31 insertions(+), 16 deletions(-)
> 

Applied to focal/impish/jammy:linux.

Thanks,
Kleber