Message ID | CAH2r5muCMfv4HuPw6sEgKj3Ude3cz+-NRdxDXpJr3CNtUAnm7A@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [SMB3,client] fix oplock breaks when using multichannel | expand |
On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: > > If a mount to a server is using multichannel, an oplock break arriving > on a secondary channel won't find the open file (since it won't find the > tcon for it), and this will cause each oplock break on secondary channels > to time out, slowing performance drastically. > > Fix smb2_is_valid_oplock_break so that if it is a secondary channel and > an oplock break was not found, check for tcons (and the files hanging > off the tcons) on the primary channel. Does this also happen against windows or is is only against ksmbd this triggers? What does MS-SMB2.pdf say about channels and oplocks? > > Fixes xfstest generic/013 to ksmbd > > Cc: <stable@vger.kernel.org> > > > -- > Thanks, > > Steve
On Fri, 28 Oct 2022 at 16:53, Steve French <smfrench@gmail.com> wrote: > > I haven't tried a scenario to windows where we turn off leases and force server to use oplocks but with ksmbd that is the default. > Worth also investigating how primary vs secondary works for finding leases for windows case Yes. Until we know what/how windows does things and what ms-smb2.pdf says we can not know if this is a cifs client bug or a ksmbd bug. > > On Fri, Oct 28, 2022, 01:48 ronnie sahlberg <ronniesahlberg@gmail.com> wrote: >> >> On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: >> > >> > If a mount to a server is using multichannel, an oplock break arriving >> > on a secondary channel won't find the open file (since it won't find the >> > tcon for it), and this will cause each oplock break on secondary channels >> > to time out, slowing performance drastically. >> > >> > Fix smb2_is_valid_oplock_break so that if it is a secondary channel and >> > an oplock break was not found, check for tcons (and the files hanging >> > off the tcons) on the primary channel. >> >> Does this also happen against windows or is is only against ksmbd this triggers? >> What does MS-SMB2.pdf say about channels and oplocks? >> >> > >> > Fixes xfstest generic/013 to ksmbd >> > >> > Cc: <stable@vger.kernel.org> >> > >> > >> > -- >> > Thanks, >> > >> > Steve
On Fri, 28 Oct 2022 at 16:55, ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > > On Fri, 28 Oct 2022 at 16:53, Steve French <smfrench@gmail.com> wrote: > > > > I haven't tried a scenario to windows where we turn off leases and force server to use oplocks but with ksmbd that is the default. > > Worth also investigating how primary vs secondary works for finding leases for windows case > > Yes. Until we know what/how windows does things and what ms-smb2.pdf > says we can not know if this is a cifs client bug or a ksmbd bug. So lets wait with this patch until we know where the bug is. I will review it later for locking correctness, but lets make sure it is not a ksmbd bug first. > > > > > > On Fri, Oct 28, 2022, 01:48 ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > >> > >> On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: > >> > > >> > If a mount to a server is using multichannel, an oplock break arriving > >> > on a secondary channel won't find the open file (since it won't find the > >> > tcon for it), and this will cause each oplock break on secondary channels > >> > to time out, slowing performance drastically. > >> > > >> > Fix smb2_is_valid_oplock_break so that if it is a secondary channel and > >> > an oplock break was not found, check for tcons (and the files hanging > >> > off the tcons) on the primary channel. > >> > >> Does this also happen against windows or is is only against ksmbd this triggers? > >> What does MS-SMB2.pdf say about channels and oplocks? > >> > >> > > >> > Fixes xfstest generic/013 to ksmbd > >> > > >> > Cc: <stable@vger.kernel.org> > >> > > >> > > >> > -- > >> > Thanks, > >> > > >> > Steve
On Fri, Oct 28, 2022 at 2:49 PM ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > > On Fri, 28 Oct 2022 at 16:55, ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > > > > On Fri, 28 Oct 2022 at 16:53, Steve French <smfrench@gmail.com> wrote: > > > > > > I haven't tried a scenario to windows where we turn off leases and force server to use oplocks but with ksmbd that is the default. > > > Worth also investigating how primary vs secondary works for finding leases for windows case > > > > Yes. Until we know what/how windows does things and what ms-smb2.pdf > > says we can not know if this is a cifs client bug or a ksmbd bug. > > So lets wait with this patch until we know where the bug is. > I will review it later for locking correctness, but lets make sure it > is not a ksmbd bug first. > > > > > > > > > > > > On Fri, Oct 28, 2022, 01:48 ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > > >> > > >> On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: > > >> > > > >> > If a mount to a server is using multichannel, an oplock break arriving > > >> > on a secondary channel won't find the open file (since it won't find the > > >> > tcon for it), and this will cause each oplock break on secondary channels > > >> > to time out, slowing performance drastically. > > >> > > > >> > Fix smb2_is_valid_oplock_break so that if it is a secondary channel and > > >> > an oplock break was not found, check for tcons (and the files hanging > > >> > off the tcons) on the primary channel. > > >> > > >> Does this also happen against windows or is is only against ksmbd this triggers? > > >> What does MS-SMB2.pdf say about channels and oplocks? > > >> > > >> > > > >> > Fixes xfstest generic/013 to ksmbd > > >> > > > >> > Cc: <stable@vger.kernel.org> > > >> > > > >> > > > >> > -- > > >> > Thanks, > > >> > > > >> > Steve This is a valid bug, and needs to be fixed. Here's my version of the fix. Few more places needed corrections. Please review: https://github.com/sprasad-microsoft/smb3-kernel-client/commit/2b57cec4d03464f4875671d4146e75fb41476941.patch I think I know why this is an issue only for oplocks and not leases. Both is_valid_oplock_break and smb2_is_valid_oplock_break pass server (on which the notification was received) as an arg. However, for some strange reason, smb2_is_valid_lease_break does not. That is what saved lease break from this bug. I think passing the server as an arg even for lease break check will save some unnecessary iterations. I'm seeing similar behaviour in couple of other places. Fixed them here: https://github.com/sprasad-microsoft/smb3-kernel-client/commit/6bfba4f4939afb36364e6c4f4f8ca979cd526729.patch Thoughts?
2022-10-28 18:19 GMT+09:00, ronnie sahlberg <ronniesahlberg@gmail.com>: > On Fri, 28 Oct 2022 at 16:55, ronnie sahlberg <ronniesahlberg@gmail.com> > wrote: >> >> On Fri, 28 Oct 2022 at 16:53, Steve French <smfrench@gmail.com> wrote: >> > >> > I haven't tried a scenario to windows where we turn off leases and force >> > server to use oplocks but with ksmbd that is the default. >> > Worth also investigating how primary vs secondary works for finding >> > leases for windows case >> >> Yes. Until we know what/how windows does things and what ms-smb2.pdf >> says we can not know if this is a cifs client bug or a ksmbd bug. > > So lets wait with this patch until we know where the bug is. > I will review it later for locking correctness, but lets make sure it > is not a ksmbd bug first. We can reproduce it against samba with the following parameters. server multi channel support = yes oplock break wait time = 35000 smb2 leases = no > > >> >> >> > >> > On Fri, Oct 28, 2022, 01:48 ronnie sahlberg <ronniesahlberg@gmail.com> >> > wrote: >> >> >> >> On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: >> >> > >> >> > If a mount to a server is using multichannel, an oplock break >> >> > arriving >> >> > on a secondary channel won't find the open file (since it won't find >> >> > the >> >> > tcon for it), and this will cause each oplock break on secondary >> >> > channels >> >> > to time out, slowing performance drastically. >> >> > >> >> > Fix smb2_is_valid_oplock_break so that if it is a secondary channel >> >> > and >> >> > an oplock break was not found, check for tcons (and the files >> >> > hanging >> >> > off the tcons) on the primary channel. >> >> >> >> Does this also happen against windows or is is only against ksmbd this >> >> triggers? >> >> What does MS-SMB2.pdf say about channels and oplocks? >> >> >> >> > >> >> > Fixes xfstest generic/013 to ksmbd >> >> > >> >> > Cc: <stable@vger.kernel.org> >> >> > >> >> > >> >> > -- >> >> > Thanks, >> >> > >> >> > Steve >
thx for testing this - Shyam's fix for it looks promising (needs review though and some testing) On Fri, Oct 28, 2022 at 5:30 AM Namjae Jeon <linkinjeon@kernel.org> wrote: > > 2022-10-28 18:19 GMT+09:00, ronnie sahlberg <ronniesahlberg@gmail.com>: > > On Fri, 28 Oct 2022 at 16:55, ronnie sahlberg <ronniesahlberg@gmail.com> > > wrote: > >> > >> On Fri, 28 Oct 2022 at 16:53, Steve French <smfrench@gmail.com> wrote: > >> > > >> > I haven't tried a scenario to windows where we turn off leases and force > >> > server to use oplocks but with ksmbd that is the default. > >> > Worth also investigating how primary vs secondary works for finding > >> > leases for windows case > >> > >> Yes. Until we know what/how windows does things and what ms-smb2.pdf > >> says we can not know if this is a cifs client bug or a ksmbd bug. > > > > So lets wait with this patch until we know where the bug is. > > I will review it later for locking correctness, but lets make sure it > > is not a ksmbd bug first. > We can reproduce it against samba with the following parameters. > > server multi channel support = yes > oplock break wait time = 35000 > smb2 leases = no > > > > > > >> > >> > >> > > >> > On Fri, Oct 28, 2022, 01:48 ronnie sahlberg <ronniesahlberg@gmail.com> > >> > wrote: > >> >> > >> >> On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: > >> >> > > >> >> > If a mount to a server is using multichannel, an oplock break > >> >> > arriving > >> >> > on a secondary channel won't find the open file (since it won't find > >> >> > the > >> >> > tcon for it), and this will cause each oplock break on secondary > >> >> > channels > >> >> > to time out, slowing performance drastically. > >> >> > > >> >> > Fix smb2_is_valid_oplock_break so that if it is a secondary channel > >> >> > and > >> >> > an oplock break was not found, check for tcons (and the files > >> >> > hanging > >> >> > off the tcons) on the primary channel. > >> >> > >> >> Does this also happen against windows or is is only against ksmbd this > >> >> triggers? > >> >> What does MS-SMB2.pdf say about channels and oplocks? > >> >> > >> >> > > >> >> > Fixes xfstest generic/013 to ksmbd > >> >> > > >> >> > Cc: <stable@vger.kernel.org> > >> >> > > >> >> > > >> >> > -- > >> >> > Thanks, > >> >> > > >> >> > Steve > >
On 10/28/2022 10:29 PM, Steve French wrote: > thx for testing this - Shyam's fix for it looks promising (needs > review though and some testing) Good, because the original fix confuses me deeply. A tcon is not tied to a connection, or at least, it never should be. It's a property of the session, which is shared among multichannel connections. So if we have to poke at other connections to find a tcon, that's a fundamental issue. Shyam's fix always simply looks on the primary channel, which works, but it's weird relative to the protocol. The real fix would be to hang the tcon off the proper object. Ronnie asked: > What does MS-SMB2.pdf say about channels and oplocks? Oplocks are not tied to channels, they're tied to sessions, tree connects and files. An oplock can be granted on one channel and be valid client-wide. And broken on any channel too btw. Tom. > On Fri, Oct 28, 2022 at 5:30 AM Namjae Jeon <linkinjeon@kernel.org> wrote: >> >> 2022-10-28 18:19 GMT+09:00, ronnie sahlberg <ronniesahlberg@gmail.com>: >>> On Fri, 28 Oct 2022 at 16:55, ronnie sahlberg <ronniesahlberg@gmail.com> >>> wrote: >>>> >>>> On Fri, 28 Oct 2022 at 16:53, Steve French <smfrench@gmail.com> wrote: >>>>> >>>>> I haven't tried a scenario to windows where we turn off leases and force >>>>> server to use oplocks but with ksmbd that is the default. >>>>> Worth also investigating how primary vs secondary works for finding >>>>> leases for windows case >>>> >>>> Yes. Until we know what/how windows does things and what ms-smb2.pdf >>>> says we can not know if this is a cifs client bug or a ksmbd bug. >>> >>> So lets wait with this patch until we know where the bug is. >>> I will review it later for locking correctness, but lets make sure it >>> is not a ksmbd bug first. >> We can reproduce it against samba with the following parameters. >> >> server multi channel support = yes >> oplock break wait time = 35000 >> smb2 leases = no >> >>> >>> >>>> >>>> >>>>> >>>>> On Fri, Oct 28, 2022, 01:48 ronnie sahlberg <ronniesahlberg@gmail.com> >>>>> wrote: >>>>>> >>>>>> On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: >>>>>>> >>>>>>> If a mount to a server is using multichannel, an oplock break >>>>>>> arriving >>>>>>> on a secondary channel won't find the open file (since it won't find >>>>>>> the >>>>>>> tcon for it), and this will cause each oplock break on secondary >>>>>>> channels >>>>>>> to time out, slowing performance drastically. >>>>>>> >>>>>>> Fix smb2_is_valid_oplock_break so that if it is a secondary channel >>>>>>> and >>>>>>> an oplock break was not found, check for tcons (and the files >>>>>>> hanging >>>>>>> off the tcons) on the primary channel. >>>>>> >>>>>> Does this also happen against windows or is is only against ksmbd this >>>>>> triggers? >>>>>> What does MS-SMB2.pdf say about channels and oplocks? >>>>>> >>>>>>> >>>>>>> Fixes xfstest generic/013 to ksmbd >>>>>>> >>>>>>> Cc: <stable@vger.kernel.org> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> >>>>>>> Steve >>> > > >
On Sat, Oct 29, 2022 at 6:18 PM Tom Talpey <tom@talpey.com> wrote: > > On 10/28/2022 10:29 PM, Steve French wrote: > > thx for testing this - Shyam's fix for it looks promising (needs > > review though and some testing) > > Good, because the original fix confuses me deeply. A tcon is not tied > to a connection, or at least, it never should be. It's a property of > the session, which is shared among multichannel connections. So if we > have to poke at other connections to find a tcon, that's a fundamental > issue. > > Shyam's fix always simply looks on the primary channel, which works, > but it's weird relative to the protocol. The real fix would be to hang > the tcon off the proper object. The reference to the primary channel there is to just get to the session (and from there, the tcon) lists, which only hang off the primary channel today. I'm not sure why this choice was made by Aurelien originally, but that is how it can be fixed with minimal changes today. Ideally, each channel should point to the session list (taking the necessary reference), and hence the tcons. Also, we should be able to share connections between sessions. Today, the secondary channels are used exclusively by one session. I plan to get to that next. > > Ronnie asked: > > > What does MS-SMB2.pdf say about channels and oplocks? > > Oplocks are not tied to channels, they're tied to sessions, tree > connects and files. An oplock can be granted on one channel and be > valid client-wide. And broken on any channel too btw. > > Tom. > > > On Fri, Oct 28, 2022 at 5:30 AM Namjae Jeon <linkinjeon@kernel.org> wrote: > >> > >> 2022-10-28 18:19 GMT+09:00, ronnie sahlberg <ronniesahlberg@gmail.com>: > >>> On Fri, 28 Oct 2022 at 16:55, ronnie sahlberg <ronniesahlberg@gmail.com> > >>> wrote: > >>>> > >>>> On Fri, 28 Oct 2022 at 16:53, Steve French <smfrench@gmail.com> wrote: > >>>>> > >>>>> I haven't tried a scenario to windows where we turn off leases and force > >>>>> server to use oplocks but with ksmbd that is the default. > >>>>> Worth also investigating how primary vs secondary works for finding > >>>>> leases for windows case > >>>> > >>>> Yes. Until we know what/how windows does things and what ms-smb2.pdf > >>>> says we can not know if this is a cifs client bug or a ksmbd bug. > >>> > >>> So lets wait with this patch until we know where the bug is. > >>> I will review it later for locking correctness, but lets make sure it > >>> is not a ksmbd bug first. > >> We can reproduce it against samba with the following parameters. > >> > >> server multi channel support = yes > >> oplock break wait time = 35000 > >> smb2 leases = no > >> > >>> > >>> > >>>> > >>>> > >>>>> > >>>>> On Fri, Oct 28, 2022, 01:48 ronnie sahlberg <ronniesahlberg@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> On Fri, 28 Oct 2022 at 16:25, Steve French <smfrench@gmail.com> wrote: > >>>>>>> > >>>>>>> If a mount to a server is using multichannel, an oplock break > >>>>>>> arriving > >>>>>>> on a secondary channel won't find the open file (since it won't find > >>>>>>> the > >>>>>>> tcon for it), and this will cause each oplock break on secondary > >>>>>>> channels > >>>>>>> to time out, slowing performance drastically. > >>>>>>> > >>>>>>> Fix smb2_is_valid_oplock_break so that if it is a secondary channel > >>>>>>> and > >>>>>>> an oplock break was not found, check for tcons (and the files > >>>>>>> hanging > >>>>>>> off the tcons) on the primary channel. > >>>>>> > >>>>>> Does this also happen against windows or is is only against ksmbd this > >>>>>> triggers? > >>>>>> What does MS-SMB2.pdf say about channels and oplocks? > >>>>>> > >>>>>>> > >>>>>>> Fixes xfstest generic/013 to ksmbd > >>>>>>> > >>>>>>> Cc: <stable@vger.kernel.org> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Steve > >>> > > > > > >
It's been a while since I worked on this and I'm not sure what exactly you are referring to but: There is a fundamental problem with intern linked lists (the way they are used in cifs and in the kernel in general). One element cannot be part of multiple lists. Within each element, you need to add a list_head field per list its part of. I didn't want to introduce cycles or redundant pointers (each new pointer is a pointer that has to be updated along with all its copies at the same time, can go stale, need to think about refcount, etc) In MS-SMB2, the channel list is an attribute of the session, so that's where I put it. Cheers,
On 11/9/2022 4:08 AM, Aurélien Aptel wrote: > It's been a while since I worked on this and I'm not sure what exactly > you are referring to but: > > There is a fundamental problem with intern linked lists (the way they > are used in cifs and in the kernel in general). > One element cannot be part of multiple lists. > Within each element, you need to add a list_head field per list its part of. > > I didn't want to introduce cycles or redundant pointers (each new > pointer is a pointer that has to be updated along with all its copies > at the same time, can go stale, need to think about refcount, etc) > > In MS-SMB2, the channel list is an attribute of the session, so that's > where I put it. The problem is, a single connection can be associated with many sessions. The "channel" represents only one such association. It's actually a many-to-many relationship. A connection may be used by many sessions, and session may use many connections. If the code is assuming a 1:1 relationship, it's broken. Tom.
From 0186c0d4c418251f066cc8c8d80f16783f9ded00 Mon Sep 17 00:00:00 2001 From: Steve French <stfrench@microsoft.com> Date: Fri, 28 Oct 2022 01:13:16 -0500 Subject: [PATCH] smb3: fix oplock break when using multichannel If a mount to a server is using multichannel, an oplock break arriving on a secondary channel won't find the open file (since it won't find the tcon for it). Fix smb2_is_valid_oplock_break so that if it is a secondary channel and an oplock break was not found, check for tcons (and the files hanging off the tcons) on the primary chanel. Fixes xfstest generic/013 to ksmbd Cc: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> --- fs/cifs/smb2misc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index a38720477966..ccd294e6c9d9 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -689,13 +689,56 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server) return false; } - cifs_dbg(FYI, "oplock level 0x%x\n", rsp->OplockLevel); + cifs_dbg(FYI, "oplock level 0x%x fid 0x%llx\n", + rsp->OplockLevel, rsp->PersistentFid); /* look up tcon based on tid & uid */ spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { + spin_lock(&tcon->open_file_lock); + list_for_each_entry(cfile, &tcon->openFileList, tlist) { + if (rsp->PersistentFid != + cfile->fid.persistent_fid || + rsp->VolatileFid != + cfile->fid.volatile_fid) + continue; + + cifs_dbg(FYI, "file id match, oplock break\n"); + cifs_stats_inc( + &tcon->stats.cifs_stats.num_oplock_brks); + cinode = CIFS_I(d_inode(cfile->dentry)); + spin_lock(&cfile->file_info_lock); + if (!CIFS_CACHE_WRITE(cinode) && + rsp->OplockLevel == SMB2_OPLOCK_LEVEL_NONE) + cfile->oplock_break_cancelled = true; + else + cfile->oplock_break_cancelled = false; + set_bit(CIFS_INODE_PENDING_OPLOCK_BREAK, + &cinode->flags); + + cfile->oplock_epoch = 0; + cfile->oplock_level = rsp->OplockLevel; + + spin_unlock(&cfile->file_info_lock); + + cifs_queue_oplock_break(cfile); + + spin_unlock(&tcon->open_file_lock); + spin_unlock(&cifs_tcp_ses_lock); + return true; + } + spin_unlock(&tcon->open_file_lock); + } + } + + /* if file not found and secondary, check primary channel */ + if (server->primary_server == NULL) + goto valid_oplock_not_found; + + list_for_each_entry(ses, &server->primary_server->smb_ses_list, smb_ses_list) { + list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { spin_lock(&tcon->open_file_lock); list_for_each_entry(cfile, &tcon->openFileList, tlist) { if (rsp->PersistentFid != @@ -732,6 +775,8 @@ smb2_is_valid_oplock_break(char *buffer, struct TCP_Server_Info *server) spin_unlock(&tcon->open_file_lock); } } + +valid_oplock_not_found: spin_unlock(&cifs_tcp_ses_lock); cifs_dbg(FYI, "No file id matched, oplock break ignored\n"); trace_smb3_oplock_not_found(0 /* no xid */, rsp->PersistentFid, -- 2.34.1