cifs: missing null pointer check in cifs_mount

Message ID	CAH2r5mvxp8OZthKPQGCv82xEkNW+z7SN_QhdRUMnHJ2Fm4pJqA@mail.gmail.com
State	New
Headers	show Return-Path: <linux-cifs-owner@vger.kernel.org> MIME-Version: 1.0 From: Steve French <smfrench@gmail.com> Date: Tue, 22 Jun 2021 20:17:50 -0500 Message-ID: <CAH2r5mvxp8OZthKPQGCv82xEkNW+z7SN_QhdRUMnHJ2Fm4pJqA@mail.gmail.com> Subject: [PATCH] cifs: missing null pointer check in cifs_mount To: CIFS <linux-cifs@vger.kernel.org> Cc: Paulo Alcantara <pc@cjr.nz>, ronnie sahlberg <ronniesahlberg@gmail.com> Content-Type: multipart/mixed; boundary="00000000000077d19b05c564adc6" Precedence: bulk
Series	cifs: missing null pointer check in cifs_mount \| expand cifs: missing null pointer check in cifs_mount

Steve French June 23, 2021, 1:17 a.m. UTC

We weren't checking if tcon is null before setting dfs path,
although we check for null tcon in an earlier assignment statement.

Addresses-Coverity: 1476411 ("Dereference after null check")
Signed-off-by: Steve French <stfrench@microsoft.com>
---
 fs/cifs/connect.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Aurélien Aptel June 23, 2021, 11:48 a.m. UTC | #1

Steve French <smfrench@gmail.com> writes:
> We weren't checking if tcon is null before setting dfs path,
> although we check for null tcon in an earlier assignment statement.

If tcon is NULL there is no point in continuing in that function, we
should have exited earlier.

If tcon is NULL it means mount_get_conns() failed so presumably rc will
be != 0 and we would goto error.

I don't think this is needed. We could change the existing check after
the loop to this you really want to be safe:

	if (rc || !tcon)
		goto error;

Cheers,

Paulo Alcantara June 23, 2021, 12:17 p.m. UTC | #2

Agreed.

On June 23, 2021 8:48:24 AM GMT-03:00, "Aurélien Aptel" <aaptel@suse.com> wrote:
>Steve French <smfrench@gmail.com> writes:
>> We weren't checking if tcon is null before setting dfs path,
>> although we check for null tcon in an earlier assignment statement.
>
>If tcon is NULL there is no point in continuing in that function, we
>should have exited earlier.
>
>If tcon is NULL it means mount_get_conns() failed so presumably rc will
>be != 0 and we would goto error.
>
>I don't think this is needed. We could change the existing check after
>the loop to this you really want to be safe:
>
>	if (rc || !tcon)
>		goto error;
>
>
>Cheers,

Steve French June 24, 2021, 12:34 a.m. UTC | #3

updated patch attached with Aurelien's suggestion.

On Wed, Jun 23, 2021 at 7:17 AM Paulo Alcantara <pc@cjr.nz> wrote:
>
> Agreed.
>
> On June 23, 2021 8:48:24 AM GMT-03:00, "Aurélien Aptel" <aaptel@suse.com> wrote:
> >Steve French <smfrench@gmail.com> writes:
> >> We weren't checking if tcon is null before setting dfs path,
> >> although we check for null tcon in an earlier assignment statement.
> >
> >If tcon is NULL there is no point in continuing in that function, we
> >should have exited earlier.
> >
> >If tcon is NULL it means mount_get_conns() failed so presumably rc will
> >be != 0 and we would goto error.
> >
> >I don't think this is needed. We could change the existing check after
> >the loop to this you really want to be safe:
> >
> >       if (rc || !tcon)
> >               goto error;
> >
> >
> >Cheers,

Jeff Layton Aug. 11, 2023, 1:16 p.m. UTC | #4

On Wed, 2021-06-23 at 19:34 -0500, Steve French wrote:
> updated patch attached with Aurelien's suggestion.
> 
> On Wed, Jun 23, 2021 at 7:17 AM Paulo Alcantara <pc@cjr.nz> wrote:
> > 
> > Agreed.
> > 
> > On June 23, 2021 8:48:24 AM GMT-03:00, "Aurélien Aptel" <aaptel@suse.com> wrote:
> > > Steve French <smfrench@gmail.com> writes:
> > > > We weren't checking if tcon is null before setting dfs path,
> > > > although we check for null tcon in an earlier assignment statement.
> > > 
> > > If tcon is NULL there is no point in continuing in that function, we
> > > should have exited earlier.
> > > 
> > > If tcon is NULL it means mount_get_conns() failed so presumably rc will
> > > be != 0 and we would goto error.
> > > 
> > > I don't think this is needed. We could change the existing check after
> > > the loop to this you really want to be safe:
> > > 
> > >       if (rc || !tcon)
> > >               goto error;
> > > 
> > > 
> > > Cheers,
> 
> 
> 

I know this patch is ancient and the mainline code has marched on, but
it seems really suspicious to me.

With this, we have cifs_mount returning 0, even though the superblock
hasn't been properly initialized. Is that expected? Shouldn't it return
an error in that case?

The mount handling has morphed considerably since this patch went in, so
I can't really tell whether this was later fixed or not.

Paulo Alcantara Aug. 11, 2023, 3:15 p.m. UTC | #5

Jeff Layton <jlayton@kernel.org> writes:

> On Wed, 2021-06-23 at 19:34 -0500, Steve French wrote:
>> updated patch attached with Aurelien's suggestion.
>> 
>> On Wed, Jun 23, 2021 at 7:17 AM Paulo Alcantara <pc@cjr.nz> wrote:
>> > 
>> > Agreed.
>> > 
>> > On June 23, 2021 8:48:24 AM GMT-03:00, "Aurélien Aptel" <aaptel@suse.com> wrote:
>> > > Steve French <smfrench@gmail.com> writes:
>> > > > We weren't checking if tcon is null before setting dfs path,
>> > > > although we check for null tcon in an earlier assignment statement.
>> > > 
>> > > If tcon is NULL there is no point in continuing in that function, we
>> > > should have exited earlier.
>> > > 
>> > > If tcon is NULL it means mount_get_conns() failed so presumably rc will
>> > > be != 0 and we would goto error.
>> > > 
>> > > I don't think this is needed. We could change the existing check after
>> > > the loop to this you really want to be safe:
>> > > 
>> > >       if (rc || !tcon)
>> > >               goto error;
>> > > 
>> > > 
>> > > Cheers,
>> 
>> 
>> 
>
> I know this patch is ancient and the mainline code has marched on, but
> it seems really suspicious to me.

Yes, it is.

> With this, we have cifs_mount returning 0, even though the superblock
> hasn't been properly initialized. Is that expected? Shouldn't it return
> an error in that case?

No, that isn't expected.  And yes, if @tcon would ever be NULL at that
point, we should be returning an error instead.  Otherwise we'd end up
dereferencing a NULL @tcon while trying to get an inode for the root
dentry later.

However, by quickly looking at the old code -- on top of 162004a2f7ef --
I don't see how we'd end up having a NULL @tcon with rc == 0 as
mount_get_conns() would return -errno if it couldn't get a tcon.  Please
correct me if I'm missing something.  Whether it is possibile or not,
the NULL @tcon check is certainly missing a 'rc = -ENOENT' or some other
error before bailing out as you've pointed out.

> The mount handling has morphed considerably since this patch went in, so
> I can't really tell whether this was later fixed or not.

I don't think there was a follow-up patch for that.

Jeff Layton Aug. 11, 2023, 4:26 p.m. UTC | #6

On Fri, 2023-08-11 at 12:15 -0300, Paulo Alcantara wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > On Wed, 2021-06-23 at 19:34 -0500, Steve French wrote:
> > > updated patch attached with Aurelien's suggestion.
> > > 
> > > On Wed, Jun 23, 2021 at 7:17 AM Paulo Alcantara <pc@cjr.nz> wrote:
> > > > 
> > > > Agreed.
> > > > 
> > > > On June 23, 2021 8:48:24 AM GMT-03:00, "Aurélien Aptel" <aaptel@suse.com> wrote:
> > > > > Steve French <smfrench@gmail.com> writes:
> > > > > > We weren't checking if tcon is null before setting dfs path,
> > > > > > although we check for null tcon in an earlier assignment statement.
> > > > > 
> > > > > If tcon is NULL there is no point in continuing in that function, we
> > > > > should have exited earlier.
> > > > > 
> > > > > If tcon is NULL it means mount_get_conns() failed so presumably rc will
> > > > > be != 0 and we would goto error.
> > > > > 
> > > > > I don't think this is needed. We could change the existing check after
> > > > > the loop to this you really want to be safe:
> > > > > 
> > > > >       if (rc || !tcon)
> > > > >               goto error;
> > > > > 
> > > > > 
> > > > > Cheers,
> > > 
> > > 
> > > 
> > 
> > I know this patch is ancient and the mainline code has marched on, but
> > it seems really suspicious to me.
> 
> Yes, it is.
> 
> > With this, we have cifs_mount returning 0, even though the superblock
> > hasn't been properly initialized. Is that expected? Shouldn't it return
> > an error in that case?
> 
> No, that isn't expected.  And yes, if @tcon would ever be NULL at that
> point, we should be returning an error instead.  Otherwise we'd end up
> dereferencing a NULL @tcon while trying to get an inode for the root
> dentry later.
> 
> However, by quickly looking at the old code -- on top of 162004a2f7ef --
> I don't see how we'd end up having a NULL @tcon with rc == 0 as
> mount_get_conns() would return -errno if it couldn't get a tcon.  Please
> correct me if I'm missing something.  Whether it is possibile or not,
> the NULL @tcon check is certainly missing a 'rc = -ENOENT' or some other
> error before bailing out as you've pointed out.

Thanks for the confirmation. There were some oopses on some RHEL8 (5.14
based kernels). The stack looked something like this:

PID: 2415716  TASK: ffff937139090000  CPU: 3    COMMAND: "ls"
 #0 [ffff9ef946b23728] machine_kexec at ffffffffac867cfe
 #1 [ffff9ef946b23780] __crash_kexec at ffffffffac9ad94d
 #2 [ffff9ef946b23848] crash_kexec at ffffffffac9ae881
 #3 [ffff9ef946b23860] oops_end at ffffffffac8274f1
 #4 [ffff9ef946b23880] no_context at ffffffffac879a03
 #5 [ffff9ef946b238d8] __bad_area_nosemaphore at ffffffffac879d64
 #6 [ffff9ef946b23920] do_page_fault at ffffffffac87a617
 #7 [ffff9ef946b23950] page_fault at ffffffffad20111e
    [exception RIP: cifs_mount+1126]
    RIP: ffffffffc08e8826  RSP: ffff9ef946b23a00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff936c8221ea00  RCX: ffff936f8018b320
    RDX: 0000000000000001  RSI: ffff936f8018b420  RDI: ffff936c8221ea00
    RBP: ffff9ef946b23a90   R8: 5346445756535c5c   R9: 6765642e50313031
    R10: 65622e666f6f7267  R11: 0063696c6275505c  R12: ffff9370b192fc00
    R13: ffff936f8018b420  R14: 00000000003097ad  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff9ef946b23a98] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
 #9 [ffff9ef946b23aa0] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
#10 [ffff9ef946b23b08] smb3_get_tree at ffffffffc0930ae0 [cifs]
#11 [ffff9ef946b23b30] vfs_get_tree at ffffffffacb52365
#12 [ffff9ef946b23b50] fc_mount at ffffffffacb7485e
#13 [ffff9ef946b23b60] vfs_kern_mount at ffffffffacb748ec
#14 [ffff9ef946b23b80] cifs_dfs_do_automount at ffffffffc093552e [cifs]
#15 [ffff9ef946b23bc0] cifs_dfs_d_automount at ffffffffc0935880 [cifs]
#16 [ffff9ef946b23bd0] follow_managed at ffffffffacb5bdaf
#17 [ffff9ef946b23c10] lookup_fast at ffffffffacb5c7e5
#18 [ffff9ef946b23c68] walk_component at ffffffffacb5d258
#19 [ffff9ef946b23cc8] path_lookupat at ffffffffacb5e215
#20 [ffff9ef946b23d28] filename_lookup at ffffffffacb62710
#21 [ffff9ef946b23e40] vfs_statx at ffffffffacb55874
#22 [ffff9ef946b23e98] __do_sys_statx at ffffffffacb5692b
#23 [ffff9ef946b23f38] do_syscall_64 at ffffffffac8043ab
#24 [ffff9ef946b23f50] entry_SYSCALL_64_after_hwframe at
ffffffffad2000a9
    RIP: 00007ff6a2637edf  RSP: 00007ffe040017d0  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffe04001910  RCX: 00007ff6a2637edf
    RDX: 0000000000000100  RSI: 00007ffe04001910  RDI: 00000000ffffff9c
    RBP: 0000000000000100   R8: 00007ffe040017f0   R9: 00000000ffffff9c
    R10: 0000000000000002  R11: 0000000000000246  R12: 00007ffe040017f0
    R13: 0000000000000000  R14: 0000000000000003  R15: 000055ce1a3ae1b8
    ORIG_RAX: 000000000000014c  CS: 0033  SS: 002b

Analysis of the vmcore by Roberto showed that we had ended up past that
point with tcon==NULL and rc==0.

Steve's patch would have fixed the panic there, but I think the host
would have ended up with a successful mount, but with a broken
superblock. The current code seems a bit less fragile, and I didn't see
any similar brokenness there, but I didn't look too hard either.

In any case, we'll plan to fix this up with a one-off in RHEL/Centos.
Thanks again for the sanity check!

Paulo Alcantara Aug. 11, 2023, 4:49 p.m. UTC | #7

Jeff Layton <jlayton@kernel.org> writes:

> Thanks for the confirmation. There were some oopses on some RHEL8 (5.14
> based kernels). The stack looked something like this:
>
> PID: 2415716  TASK: ffff937139090000  CPU: 3    COMMAND: "ls"
>  #0 [ffff9ef946b23728] machine_kexec at ffffffffac867cfe
>  #1 [ffff9ef946b23780] __crash_kexec at ffffffffac9ad94d
>  #2 [ffff9ef946b23848] crash_kexec at ffffffffac9ae881
>  #3 [ffff9ef946b23860] oops_end at ffffffffac8274f1
>  #4 [ffff9ef946b23880] no_context at ffffffffac879a03
>  #5 [ffff9ef946b238d8] __bad_area_nosemaphore at ffffffffac879d64
>  #6 [ffff9ef946b23920] do_page_fault at ffffffffac87a617
>  #7 [ffff9ef946b23950] page_fault at ffffffffad20111e
>     [exception RIP: cifs_mount+1126]
>     RIP: ffffffffc08e8826  RSP: ffff9ef946b23a00  RFLAGS: 00010246
>     RAX: 0000000000000000  RBX: ffff936c8221ea00  RCX: ffff936f8018b320
>     RDX: 0000000000000001  RSI: ffff936f8018b420  RDI: ffff936c8221ea00
>     RBP: ffff9ef946b23a90   R8: 5346445756535c5c   R9: 6765642e50313031
>     R10: 65622e666f6f7267  R11: 0063696c6275505c  R12: ffff9370b192fc00
>     R13: ffff936f8018b420  R14: 00000000003097ad  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #8 [ffff9ef946b23a98] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
>  #9 [ffff9ef946b23aa0] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
> #10 [ffff9ef946b23b08] smb3_get_tree at ffffffffc0930ae0 [cifs]
> #11 [ffff9ef946b23b30] vfs_get_tree at ffffffffacb52365
> #12 [ffff9ef946b23b50] fc_mount at ffffffffacb7485e
> #13 [ffff9ef946b23b60] vfs_kern_mount at ffffffffacb748ec
> #14 [ffff9ef946b23b80] cifs_dfs_do_automount at ffffffffc093552e [cifs]
> #15 [ffff9ef946b23bc0] cifs_dfs_d_automount at ffffffffc0935880 [cifs]
> #16 [ffff9ef946b23bd0] follow_managed at ffffffffacb5bdaf
> #17 [ffff9ef946b23c10] lookup_fast at ffffffffacb5c7e5
> #18 [ffff9ef946b23c68] walk_component at ffffffffacb5d258
> #19 [ffff9ef946b23cc8] path_lookupat at ffffffffacb5e215
> #20 [ffff9ef946b23d28] filename_lookup at ffffffffacb62710
> #21 [ffff9ef946b23e40] vfs_statx at ffffffffacb55874
> #22 [ffff9ef946b23e98] __do_sys_statx at ffffffffacb5692b
> #23 [ffff9ef946b23f38] do_syscall_64 at ffffffffac8043ab
> #24 [ffff9ef946b23f50] entry_SYSCALL_64_after_hwframe at
> ffffffffad2000a9
>     RIP: 00007ff6a2637edf  RSP: 00007ffe040017d0  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: 00007ffe04001910  RCX: 00007ff6a2637edf
>     RDX: 0000000000000100  RSI: 00007ffe04001910  RDI: 00000000ffffff9c
>     RBP: 0000000000000100   R8: 00007ffe040017f0   R9: 00000000ffffff9c
>     R10: 0000000000000002  R11: 0000000000000246  R12: 00007ffe040017f0
>     R13: 0000000000000000  R14: 0000000000000003  R15: 000055ce1a3ae1b8
>     ORIG_RAX: 000000000000014c  CS: 0033  SS: 002b
>
> Analysis of the vmcore by Roberto showed that we had ended up past that
> point with tcon==NULL and rc==0.

Interesting.  Thanks for sharing the backtrace!

So it actually ended up with NULL @tcon and rc == 0 while mounting a DFS
link.  Nice catch!

> Steve's patch would have fixed the panic there, but I think the host
> would have ended up with a successful mount, but with a broken
> superblock. The current code seems a bit less fragile, and I didn't see
> any similar brokenness there, but I didn't look too hard either.

Yeah, makes sense.

> In any case, we'll plan to fix this up with a one-off in RHEL/Centos.
> Thanks again for the sanity check!

Would you mind to propose a patch that fixes the above and mark it for
v5.14..v5.15?

Jeff Layton Aug. 11, 2023, 4:58 p.m. UTC | #8

On Fri, 2023-08-11 at 13:49 -0300, Paulo Alcantara wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > Thanks for the confirmation. There were some oopses on some RHEL8 (5.14
> > based kernels). The stack looked something like this:
> > 
> > PID: 2415716  TASK: ffff937139090000  CPU: 3    COMMAND: "ls"
> >  #0 [ffff9ef946b23728] machine_kexec at ffffffffac867cfe
> >  #1 [ffff9ef946b23780] __crash_kexec at ffffffffac9ad94d
> >  #2 [ffff9ef946b23848] crash_kexec at ffffffffac9ae881
> >  #3 [ffff9ef946b23860] oops_end at ffffffffac8274f1
> >  #4 [ffff9ef946b23880] no_context at ffffffffac879a03
> >  #5 [ffff9ef946b238d8] __bad_area_nosemaphore at ffffffffac879d64
> >  #6 [ffff9ef946b23920] do_page_fault at ffffffffac87a617
> >  #7 [ffff9ef946b23950] page_fault at ffffffffad20111e
> >     [exception RIP: cifs_mount+1126]
> >     RIP: ffffffffc08e8826  RSP: ffff9ef946b23a00  RFLAGS: 00010246
> >     RAX: 0000000000000000  RBX: ffff936c8221ea00  RCX: ffff936f8018b320
> >     RDX: 0000000000000001  RSI: ffff936f8018b420  RDI: ffff936c8221ea00
> >     RBP: ffff9ef946b23a90   R8: 5346445756535c5c   R9: 6765642e50313031
> >     R10: 65622e666f6f7267  R11: 0063696c6275505c  R12: ffff9370b192fc00
> >     R13: ffff936f8018b420  R14: 00000000003097ad  R15: 0000000000000000
> >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >  #8 [ffff9ef946b23a98] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
> >  #9 [ffff9ef946b23aa0] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
> > #10 [ffff9ef946b23b08] smb3_get_tree at ffffffffc0930ae0 [cifs]
> > #11 [ffff9ef946b23b30] vfs_get_tree at ffffffffacb52365
> > #12 [ffff9ef946b23b50] fc_mount at ffffffffacb7485e
> > #13 [ffff9ef946b23b60] vfs_kern_mount at ffffffffacb748ec
> > #14 [ffff9ef946b23b80] cifs_dfs_do_automount at ffffffffc093552e [cifs]
> > #15 [ffff9ef946b23bc0] cifs_dfs_d_automount at ffffffffc0935880 [cifs]
> > #16 [ffff9ef946b23bd0] follow_managed at ffffffffacb5bdaf
> > #17 [ffff9ef946b23c10] lookup_fast at ffffffffacb5c7e5
> > #18 [ffff9ef946b23c68] walk_component at ffffffffacb5d258
> > #19 [ffff9ef946b23cc8] path_lookupat at ffffffffacb5e215
> > #20 [ffff9ef946b23d28] filename_lookup at ffffffffacb62710
> > #21 [ffff9ef946b23e40] vfs_statx at ffffffffacb55874
> > #22 [ffff9ef946b23e98] __do_sys_statx at ffffffffacb5692b
> > #23 [ffff9ef946b23f38] do_syscall_64 at ffffffffac8043ab
> > #24 [ffff9ef946b23f50] entry_SYSCALL_64_after_hwframe at
> > ffffffffad2000a9
> >     RIP: 00007ff6a2637edf  RSP: 00007ffe040017d0  RFLAGS: 00000246
> >     RAX: ffffffffffffffda  RBX: 00007ffe04001910  RCX: 00007ff6a2637edf
> >     RDX: 0000000000000100  RSI: 00007ffe04001910  RDI: 00000000ffffff9c
> >     RBP: 0000000000000100   R8: 00007ffe040017f0   R9: 00000000ffffff9c
> >     R10: 0000000000000002  R11: 0000000000000246  R12: 00007ffe040017f0
> >     R13: 0000000000000000  R14: 0000000000000003  R15: 000055ce1a3ae1b8
> >     ORIG_RAX: 000000000000014c  CS: 0033  SS: 002b
> > 
> > Analysis of the vmcore by Roberto showed that we had ended up past that
> > point with tcon==NULL and rc==0.
> 
> Interesting.  Thanks for sharing the backtrace!
> 
> So it actually ended up with NULL @tcon and rc == 0 while mounting a DFS
> link.  Nice catch!
> 
> > Steve's patch would have fixed the panic there, but I think the host
> > would have ended up with a successful mount, but with a broken
> > superblock. The current code seems a bit less fragile, and I didn't see
> > any similar brokenness there, but I didn't look too hard either.
> 
> Yeah, makes sense.
> 
> > In any case, we'll plan to fix this up with a one-off in RHEL/Centos.
> > Thanks again for the sanity check!
> 
> Would you mind to propose a patch that fixes the above and mark it for
> v5.14..v5.15?

Sounds good. One of us will make sure that happens too.

Thanks!

Paulo Alcantara Aug. 11, 2023, 5:06 p.m. UTC | #9

Jeff Layton <jlayton@kernel.org> writes:

> On Fri, 2023-08-11 at 13:49 -0300, Paulo Alcantara wrote:
>> > In any case, we'll plan to fix this up with a one-off in RHEL/Centos.
>> > Thanks again for the sanity check!
>> 
>> Would you mind to propose a patch that fixes the above and mark it for
>> v5.14..v5.15?
>
> Sounds good. One of us will make sure that happens too.

Thanks!

Jeff Layton Aug. 15, 2023, 2:25 p.m. UTC | #10

On Fri, 2023-08-11 at 12:58 -0400, Jeff Layton wrote:
> On Fri, 2023-08-11 at 13:49 -0300, Paulo Alcantara wrote:
> > Jeff Layton <jlayton@kernel.org> writes:
> > 
> > > Thanks for the confirmation. There were some oopses on some RHEL8 (5.14
> > > based kernels). The stack looked something like this:
> > > 
> > > PID: 2415716  TASK: ffff937139090000  CPU: 3    COMMAND: "ls"
> > >  #0 [ffff9ef946b23728] machine_kexec at ffffffffac867cfe
> > >  #1 [ffff9ef946b23780] __crash_kexec at ffffffffac9ad94d
> > >  #2 [ffff9ef946b23848] crash_kexec at ffffffffac9ae881
> > >  #3 [ffff9ef946b23860] oops_end at ffffffffac8274f1
> > >  #4 [ffff9ef946b23880] no_context at ffffffffac879a03
> > >  #5 [ffff9ef946b238d8] __bad_area_nosemaphore at ffffffffac879d64
> > >  #6 [ffff9ef946b23920] do_page_fault at ffffffffac87a617
> > >  #7 [ffff9ef946b23950] page_fault at ffffffffad20111e
> > >     [exception RIP: cifs_mount+1126]
> > >     RIP: ffffffffc08e8826  RSP: ffff9ef946b23a00  RFLAGS: 00010246
> > >     RAX: 0000000000000000  RBX: ffff936c8221ea00  RCX: ffff936f8018b320
> > >     RDX: 0000000000000001  RSI: ffff936f8018b420  RDI: ffff936c8221ea00
> > >     RBP: ffff9ef946b23a90   R8: 5346445756535c5c   R9: 6765642e50313031
> > >     R10: 65622e666f6f7267  R11: 0063696c6275505c  R12: ffff9370b192fc00
> > >     R13: ffff936f8018b420  R14: 00000000003097ad  R15: 0000000000000000
> > >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > >  #8 [ffff9ef946b23a98] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
> > >  #9 [ffff9ef946b23aa0] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
> > > #10 [ffff9ef946b23b08] smb3_get_tree at ffffffffc0930ae0 [cifs]
> > > #11 [ffff9ef946b23b30] vfs_get_tree at ffffffffacb52365
> > > #12 [ffff9ef946b23b50] fc_mount at ffffffffacb7485e
> > > #13 [ffff9ef946b23b60] vfs_kern_mount at ffffffffacb748ec
> > > #14 [ffff9ef946b23b80] cifs_dfs_do_automount at ffffffffc093552e [cifs]
> > > #15 [ffff9ef946b23bc0] cifs_dfs_d_automount at ffffffffc0935880 [cifs]
> > > #16 [ffff9ef946b23bd0] follow_managed at ffffffffacb5bdaf
> > > #17 [ffff9ef946b23c10] lookup_fast at ffffffffacb5c7e5
> > > #18 [ffff9ef946b23c68] walk_component at ffffffffacb5d258
> > > #19 [ffff9ef946b23cc8] path_lookupat at ffffffffacb5e215
> > > #20 [ffff9ef946b23d28] filename_lookup at ffffffffacb62710
> > > #21 [ffff9ef946b23e40] vfs_statx at ffffffffacb55874
> > > #22 [ffff9ef946b23e98] __do_sys_statx at ffffffffacb5692b
> > > #23 [ffff9ef946b23f38] do_syscall_64 at ffffffffac8043ab
> > > #24 [ffff9ef946b23f50] entry_SYSCALL_64_after_hwframe at
> > > ffffffffad2000a9
> > >     RIP: 00007ff6a2637edf  RSP: 00007ffe040017d0  RFLAGS: 00000246
> > >     RAX: ffffffffffffffda  RBX: 00007ffe04001910  RCX: 00007ff6a2637edf
> > >     RDX: 0000000000000100  RSI: 00007ffe04001910  RDI: 00000000ffffff9c
> > >     RBP: 0000000000000100   R8: 00007ffe040017f0   R9: 00000000ffffff9c
> > >     R10: 0000000000000002  R11: 0000000000000246  R12: 00007ffe040017f0
> > >     R13: 0000000000000000  R14: 0000000000000003  R15: 000055ce1a3ae1b8
> > >     ORIG_RAX: 000000000000014c  CS: 0033  SS: 002b
> > > 
> > > Analysis of the vmcore by Roberto showed that we had ended up past that
> > > point with tcon==NULL and rc==0.
> > 
> > Interesting.  Thanks for sharing the backtrace!
> > 
> > So it actually ended up with NULL @tcon and rc == 0 while mounting a DFS
> > link.  Nice catch!
> > 
> > > Steve's patch would have fixed the panic there, but I think the host
> > > would have ended up with a successful mount, but with a broken
> > > superblock. The current code seems a bit less fragile, and I didn't see
> > > any similar brokenness there, but I didn't look too hard either.
> > 
> > Yeah, makes sense.
> > 
> > > In any case, we'll plan to fix this up with a one-off in RHEL/Centos.
> > > Thanks again for the sanity check!
> > 
> > Would you mind to propose a patch that fixes the above and mark it for
> > v5.14..v5.15?
> 
> Sounds good. One of us will make sure that happens too.
> 

FWIW, I took a look at v5.15.125 and I don't see the same bug there. It
probably got fixed inadvertently with some other backporting. Looks like
this is only a problem for older, non-stable-series kernels.

The patch I created for RHEL8 is attached though, if you're interested.

Paulo Alcantara Aug. 15, 2023, 6:35 p.m. UTC | #11

Jeff Layton <jlayton@kernel.org> writes:

> FWIW, I took a look at v5.15.125 and I don't see the same bug there. It
> probably got fixed inadvertently with some other backporting. Looks like
> this is only a problem for older, non-stable-series kernels.

Thanks for looking into that!  Really appreciate it.

> The patch I created for RHEL8 is attached though, if you're
> interested.

LGTM.

cifs: missing null pointer check in cifs_mount

Commit Message

Comments

Patch