Message ID | 20191030035006.31696-1-matthew.ruffell@canonical.com |
---|---|
Headers | show |
Series | SUNRPC: Use after free when GSSD credentials are invalid causes oops | expand |
On Wed, Oct 30, 2019 at 04:50:04PM +1300, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1842037 > > [Impact] > > There is a use after free which normally causes a null pointer dereference when > NFS clients send invalid credentials via GSSD to a NFS server which has shares > protected by kerberos krb5* security. > > The call trace is below: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 > Call Trace: > rpc_check_timeout+0x22/0xe0 [sunrpc] > call_decode+0x12c/0x190 [sunrpc] > __rpc_execute+0x7a/0x340 [sunrpc] > rpc_execute+0xa3/0xb0 [sunrpc] > rpc_run_task+0xf7/0x140 [sunrpc] > nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4] > nfs41_setup_state_renewal+0x3d/0x90 [nfsv4] > ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4] > ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4] > nfs41_finish_session_reset+0x26/0x30 [nfsv4] > nfs41_init_clientid+0x44/0x70 [nfsv4] > nfs4_establish_lease+0x61/0xa0 [nfsv4] > ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4] > nfs4_state_manager+0x1b1/0x750 [nfsv4] > ? kernel_sigaction+0x43/0xe0 > nfs4_run_state_manager+0x24/0x40 [nfsv4] > kthread+0x120/0x140 > ? nfs4_state_manager+0x750/0x750 [nfsv4] > ? __kthread_parkme+0x70/0x70 > ret_from_fork+0x35/0x40 > > This then causes outages on a heavily trafficked NFS server and no one can > access their shares until the server is rebooted. > > [Fix] > > The fix comes in the following two commits: > > commit cea57789e4081870ac3498fbefabbbd0d0fd8434 > Author: Trond Myklebust <trond.myklebust@hammerspace.com> > Date: Sat Mar 9 16:06:47 2019 -0500 > Subject: SUNRPC: Clean up > > commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 > Author: Trond Myklebust <trond.myklebust@hammerspace.com> > Date: Wed May 29 12:49:52 2019 -0400 > Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS > credential > > There is a small subtlety to be addressed here. > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes" > cea57789e4081870ac3498fbefabbbd0d0fd8434, and > cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel. > The code path which triggers the use after free can still be reached without > cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both > commits resolves the problem, while still maintaining as little backporting as > necessary. > > Please note, both commits have been backported. > > cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final > comment, as well as some minor context adjustments in the final hunk. > > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting. > The upstream patch uses a switch statement based on error codes, while the disco > kernel uses a goto and label type architecture. The commits in the middle were > numerous and completely unrelated, so I backported the commit to use goto and > labels. Please review this backport a little more closely than you normally do, > but it has been tested, and I believe the code to be sound. > > cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream. > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream. > > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable, > in 5.1.9, and was omitted from disco stable updates process, probably because of > the patch not cleanly applying and requiring the backport and infrastructure > provided in cea57789e4081870ac3498fbefabbbd0d0fd8434. > > [Testcase] > > This is difficult to reproduce on test systems, and has instead been verified on > a production NFS v4.1 system in a customer environment. This server is heavily > trafficked and has a large number of different NFS clients connected to it. > > I have built a test kernel that contains the above patch, and also patches for > Bug 1828978. It is available here: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test > > Note that the above kernel is for bionic HWE, and not explicitly disco. > > Discussion about the patch validation can be found at the bottom of Bug 1842037. > > On unpatched kernels, expect to see the symptoms mentioned in Impact, and on > patched systems, everything working as intended. > > [Regression Potential] > > The changes are limited to users of sunrpc and the change itself is limited to > cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios, > the code should only be triggered when misbehaving clients do not keep their > authentication tickets up to date. > > In case of regression, misbehaving clients may cause outages on services which > use sunrpc. In which case, the server administrator would have to revert to an > earlier kernel in order to restore those services, as you cannot stop > misbehaving clients from connecting. > > Since the fix was selected for upstream stable, it has been vetted and widely > accepted by the community. The backport I performed should have no functional > difference at all. > > Trond Myklebust (2): > SUNRPC: Clean up > SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS > credential > > net/sunrpc/clnt.c | 76 +++++++++++++++++++---------------------------- > 1 file changed, 30 insertions(+), 46 deletions(-) Makes sense to me and backports look coorect. Acked-by: Andrea Righi <andrea.righi@canonical.com>
On 30.10.19 04:50, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1842037 > > [Impact] > > There is a use after free which normally causes a null pointer dereference when > NFS clients send invalid credentials via GSSD to a NFS server which has shares > protected by kerberos krb5* security. > > The call trace is below: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 > Call Trace: > rpc_check_timeout+0x22/0xe0 [sunrpc] > call_decode+0x12c/0x190 [sunrpc] > __rpc_execute+0x7a/0x340 [sunrpc] > rpc_execute+0xa3/0xb0 [sunrpc] > rpc_run_task+0xf7/0x140 [sunrpc] > nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4] > nfs41_setup_state_renewal+0x3d/0x90 [nfsv4] > ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4] > ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4] > nfs41_finish_session_reset+0x26/0x30 [nfsv4] > nfs41_init_clientid+0x44/0x70 [nfsv4] > nfs4_establish_lease+0x61/0xa0 [nfsv4] > ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4] > nfs4_state_manager+0x1b1/0x750 [nfsv4] > ? kernel_sigaction+0x43/0xe0 > nfs4_run_state_manager+0x24/0x40 [nfsv4] > kthread+0x120/0x140 > ? nfs4_state_manager+0x750/0x750 [nfsv4] > ? __kthread_parkme+0x70/0x70 > ret_from_fork+0x35/0x40 > > This then causes outages on a heavily trafficked NFS server and no one can > access their shares until the server is rebooted. > > [Fix] > > The fix comes in the following two commits: > > commit cea57789e4081870ac3498fbefabbbd0d0fd8434 > Author: Trond Myklebust <trond.myklebust@hammerspace.com> > Date: Sat Mar 9 16:06:47 2019 -0500 > Subject: SUNRPC: Clean up > > commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 > Author: Trond Myklebust <trond.myklebust@hammerspace.com> > Date: Wed May 29 12:49:52 2019 -0400 > Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS > credential > > There is a small subtlety to be addressed here. > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes" > cea57789e4081870ac3498fbefabbbd0d0fd8434, and > cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel. > The code path which triggers the use after free can still be reached without > cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both > commits resolves the problem, while still maintaining as little backporting as > necessary. > > Please note, both commits have been backported. > > cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final > comment, as well as some minor context adjustments in the final hunk. > > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting. > The upstream patch uses a switch statement based on error codes, while the disco > kernel uses a goto and label type architecture. The commits in the middle were > numerous and completely unrelated, so I backported the commit to use goto and > labels. Please review this backport a little more closely than you normally do, > but it has been tested, and I believe the code to be sound. > > cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream. > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream. > > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable, > in 5.1.9, and was omitted from disco stable updates process, probably because of > the patch not cleanly applying and requiring the backport and infrastructure > provided in cea57789e4081870ac3498fbefabbbd0d0fd8434. > > [Testcase] > > This is difficult to reproduce on test systems, and has instead been verified on > a production NFS v4.1 system in a customer environment. This server is heavily > trafficked and has a large number of different NFS clients connected to it. > > I have built a test kernel that contains the above patch, and also patches for > Bug 1828978. It is available here: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test > > Note that the above kernel is for bionic HWE, and not explicitly disco. > > Discussion about the patch validation can be found at the bottom of Bug 1842037. > > On unpatched kernels, expect to see the symptoms mentioned in Impact, and on > patched systems, everything working as intended. > > [Regression Potential] > > The changes are limited to users of sunrpc and the change itself is limited to > cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios, > the code should only be triggered when misbehaving clients do not keep their > authentication tickets up to date. > > In case of regression, misbehaving clients may cause outages on services which > use sunrpc. In which case, the server administrator would have to revert to an > earlier kernel in order to restore those services, as you cannot stop > misbehaving clients from connecting. > > Since the fix was selected for upstream stable, it has been vetted and widely > accepted by the community. The backport I performed should have no functional > difference at all. > > Trond Myklebust (2): > SUNRPC: Clean up > SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS > credential > > net/sunrpc/clnt.c | 76 +++++++++++++++++++---------------------------- > 1 file changed, 30 insertions(+), 46 deletions(-) > Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 2019-10-30 16:50:04 , Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1842037 > > [Impact] > > There is a use after free which normally causes a null pointer dereference when > NFS clients send invalid credentials via GSSD to a NFS server which has shares > protected by kerberos krb5* security. > > The call trace is below: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 > Call Trace: > rpc_check_timeout+0x22/0xe0 [sunrpc] > call_decode+0x12c/0x190 [sunrpc] > __rpc_execute+0x7a/0x340 [sunrpc] > rpc_execute+0xa3/0xb0 [sunrpc] > rpc_run_task+0xf7/0x140 [sunrpc] > nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4] > nfs41_setup_state_renewal+0x3d/0x90 [nfsv4] > ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4] > ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4] > nfs41_finish_session_reset+0x26/0x30 [nfsv4] > nfs41_init_clientid+0x44/0x70 [nfsv4] > nfs4_establish_lease+0x61/0xa0 [nfsv4] > ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4] > nfs4_state_manager+0x1b1/0x750 [nfsv4] > ? kernel_sigaction+0x43/0xe0 > nfs4_run_state_manager+0x24/0x40 [nfsv4] > kthread+0x120/0x140 > ? nfs4_state_manager+0x750/0x750 [nfsv4] > ? __kthread_parkme+0x70/0x70 > ret_from_fork+0x35/0x40 > > This then causes outages on a heavily trafficked NFS server and no one can > access their shares until the server is rebooted. > > [Fix] > > The fix comes in the following two commits: > > commit cea57789e4081870ac3498fbefabbbd0d0fd8434 > Author: Trond Myklebust <trond.myklebust@hammerspace.com> > Date: Sat Mar 9 16:06:47 2019 -0500 > Subject: SUNRPC: Clean up > > commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 > Author: Trond Myklebust <trond.myklebust@hammerspace.com> > Date: Wed May 29 12:49:52 2019 -0400 > Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS > credential > > There is a small subtlety to be addressed here. > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes" > cea57789e4081870ac3498fbefabbbd0d0fd8434, and > cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel. > The code path which triggers the use after free can still be reached without > cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both > commits resolves the problem, while still maintaining as little backporting as > necessary. > > Please note, both commits have been backported. > > cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final > comment, as well as some minor context adjustments in the final hunk. > > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting. > The upstream patch uses a switch statement based on error codes, while the disco > kernel uses a goto and label type architecture. The commits in the middle were > numerous and completely unrelated, so I backported the commit to use goto and > labels. Please review this backport a little more closely than you normally do, > but it has been tested, and I believe the code to be sound. > > cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream. > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream. > > 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable, > in 5.1.9, and was omitted from disco stable updates process, probably because of > the patch not cleanly applying and requiring the backport and infrastructure > provided in cea57789e4081870ac3498fbefabbbd0d0fd8434. > > [Testcase] > > This is difficult to reproduce on test systems, and has instead been verified on > a production NFS v4.1 system in a customer environment. This server is heavily > trafficked and has a large number of different NFS clients connected to it. > > I have built a test kernel that contains the above patch, and also patches for > Bug 1828978. It is available here: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test > > Note that the above kernel is for bionic HWE, and not explicitly disco. > > Discussion about the patch validation can be found at the bottom of Bug 1842037. > > On unpatched kernels, expect to see the symptoms mentioned in Impact, and on > patched systems, everything working as intended. > > [Regression Potential] > > The changes are limited to users of sunrpc and the change itself is limited to > cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios, > the code should only be triggered when misbehaving clients do not keep their > authentication tickets up to date. > > In case of regression, misbehaving clients may cause outages on services which > use sunrpc. In which case, the server administrator would have to revert to an > earlier kernel in order to restore those services, as you cannot stop > misbehaving clients from connecting. > > Since the fix was selected for upstream stable, it has been vetted and widely > accepted by the community. The backport I performed should have no functional > difference at all. > > Trond Myklebust (2): > SUNRPC: Clean up > SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS > credential > > net/sunrpc/clnt.c | 76 +++++++++++++++++++---------------------------- > 1 file changed, 30 insertions(+), 46 deletions(-) > > -- > 2.20.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team