mbox series

[SRU,mantic,0/1] cifs: Copying file to same directory results in page fault

Message ID 20240422023027.21592-1-matthew.ruffell@canonical.com
Headers show
Series cifs: Copying file to same directory results in page fault | expand

Message

Matthew Ruffell April 22, 2024, 2:30 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2060919

[Impact]

Copying or modifying a file to the same directory within a cifs mount results in
a page fault, and the process that initiated the copy being killed. This could 
be cp, nautilus, etc.

This results in the following oops:

BUG: unable to handle page fault for address: fffffffffffffffe
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD f45a3f067 P4D f45a3f067 PUD f45a41067 PMD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 28103 Comm: Thread (pooled) Tainted: P OE 6.5.0-27-generic #28-Ubuntu
RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 79 b3 23 f1 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
RSP: 0018:ffffaab6865ffbf8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffaab6865ffc28 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000023854 R11: 0000000000000000 R12: 0000000000000000
R13: ffffaab6865ffc78 R14: ffff906675d8aed0 R15: ffffaab6865ffc70
FS: 00007bd4d594b6c0(0000) GS:ffff90753f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffffe CR3: 000000017022a000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? show_regs+0x6d/0x80
? __die+0x24/0x80
? page_fault_oops+0x99/0x1b0
? kernelmode_fixup_or_oops+0xb2/0x140
? __bad_area_nosemaphore+0x1a5/0x2c0
? bad_area_nosemaphore+0x16/0x30
? do_kern_addr_fault+0x7b/0xa0
? exc_page_fault+0x1a4/0x1b0
? asm_exc_page_fault+0x27/0x30
? cifs_flush_folio+0x41/0xf0 [cifs]
? cifs_flush_folio+0x37/0xf0 [cifs]
cifs_remap_file_range+0x172/0x660 [cifs]
do_clone_file_range+0x101/0x2d0
vfs_clone_file_range+0x3f/0x150
ioctl_file_clone+0x52/0xc0
do_vfs_ioctl+0x68f/0x910
? __fget_light+0xa5/0x120
__x64_sys_ioctl+0x7d/0xf0
do_syscall_64+0x59/0x90
? kmem_cache_free+0x22/0x3e0
? putname+0x5b/0x80
? exit_to_user_mode_prepare+0x30/0xb0
? syscall_exit_to_user_mode+0x37/0x60
? do_syscall_64+0x68/0x90
? do_syscall_64+0x68/0x90
? do_syscall_64+0x68/0x90

There is no known workaround.

[Fix]

The stacktrace is very similar to a regression reported to upstream 6.1.y:

https://lore.kernel.org/linux-mm/a76b370f93cb928c049b94e1fde0d2da506dfcb2.camel@amazon.com/T/

The thread mentions that:

commit 7b2404a886f8b91250c31855d287e632123e1746
Author: David Howells <dhowells@redhat.com>
Date: Fri Dec 1 00:22:00 2023 +0000
Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b2404a886f8b91250c31855d287e632123e1746

introduced the issue to Debian's 6.1 kernel.

This got backported to Ubuntu in:

commit 3adbe2ccd8b9b8fde93e03958d6176945794d288
Author: David Howells <dhowells@redhat.com>
Date: Fri Dec 1 00:22:00 2023 +0000
Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()

$ git describe --contains 3adbe2ccd8b9b8fde93e03958d6176945794d288
Ubuntu-6.5.0-20.20~107

Which we have been using for some time now, and is not the culprit.

Reading the regression mailing list thread, they mention that things work 
differently in 6.1:

> Yeah.  __filemap_get_folio() works differently in v6.1.y. There it returns a
> folio or NULL.  In 6.7 it returns a folio or a negative error code.  The error
> check in cifs_flush_folio() needs to change to something like:
>
>	folio = filemap_get_folio(inode->i_mapping, index);
>	if (!folio)
>		return -ENOMEM;
>
> David 

6.1.y then got a specific patch to fix the issue in 6.1, which is:

commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
Author: Steve French <stfrench@microsoft.com>
Date: Fri Jan 12 23:08:51 2024 -0600
Subject: cifs: fix flushing folio regression for 6.1 backport
Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1

$ git describe --contains 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
Ubuntu-6.5.0-27.28~162

Since the Ubuntu mantic kernel consumes both 6.1.y and 6.7.y / 6.8.y stable 
patches, this patch was applied to mantic's 6.5 kernel by mistake, and contains
the wrong logic for how __filemap_get_folio() works in 6.5.

The fix is to revert "cifs: fix flushing folio regression for 6.1 backport" as
a SAUCE patch.

[Testcase]

Start two VMs. One is recommended to be Debian 12, which is what some users
have had luck with in the past, as the server, and the client can be mantic.

Server
------

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install samba
$ sudo vim /etc/samba/smb.conf
server min protocol = NT1
[sambashare]
    comment = Samba on Ubuntu
    path = /home/ubuntu/sambashare
    read only = no
    browsable = yes
$ mkdir ~/sambashare
$ sudo smbpasswd -a ubuntu

Client
------

$ sudo apt update
$ sudo apt install cifs-utils
$ mkdir ~/share
$ sudo mount -t cifs -o username=ubuntu //192.168.122.185/sambashare ~/share
Password for ubuntu@//192.168.122.185/sambashare:
$ mount -l
...
//192.168.122.185/sambashare on /home/ubuntu/share type cifs (rw,relatime,vers=3.1.1,cache=strict,username=ubuntu,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.122.185,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)

$ ls
hallo.txt hello.txt sample.txt sample2.txt
$ sudo cp hello.txt hello.txt.1
Killed

If you install the test kernel available from the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test

The copy will work as expected.

[Where problems could occur]

Reverting the patch restores logic back to how it was between 6.5.0-20-generic
through to 6.5.0-26-generic, which functions, and is well tested by the
community.

If a regression were to occur, it would impact all writes to cifs mounts,
particularly to the same destination directory as the origin file. There is no
known workarounds.

Matthew Ruffell (1):
  UBUNTU: SAUCE: Revert "cifs: fix flushing folio regression for 6.1
    backport"

 fs/smb/client/cifsfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Stefan Bader April 24, 2024, 3:36 p.m. UTC | #1
On 22.04.24 04:30, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2060919
> 
> [Impact]
> 
> Copying or modifying a file to the same directory within a cifs mount results in
> a page fault, and the process that initiated the copy being killed. This could
> be cp, nautilus, etc.
> 
> This results in the following oops:
> 
> BUG: unable to handle page fault for address: fffffffffffffffe
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD f45a3f067 P4D f45a3f067 PUD f45a41067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 28103 Comm: Thread (pooled) Tainted: P OE 6.5.0-27-generic #28-Ubuntu
> RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
> Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 79 b3 23 f1 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
> RSP: 0018:ffffaab6865ffbf8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffaab6865ffc28 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000023854 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffaab6865ffc78 R14: ffff906675d8aed0 R15: ffffaab6865ffc70
> FS: 00007bd4d594b6c0(0000) GS:ffff90753f800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: fffffffffffffffe CR3: 000000017022a000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? show_regs+0x6d/0x80
> ? __die+0x24/0x80
> ? page_fault_oops+0x99/0x1b0
> ? kernelmode_fixup_or_oops+0xb2/0x140
> ? __bad_area_nosemaphore+0x1a5/0x2c0
> ? bad_area_nosemaphore+0x16/0x30
> ? do_kern_addr_fault+0x7b/0xa0
> ? exc_page_fault+0x1a4/0x1b0
> ? asm_exc_page_fault+0x27/0x30
> ? cifs_flush_folio+0x41/0xf0 [cifs]
> ? cifs_flush_folio+0x37/0xf0 [cifs]
> cifs_remap_file_range+0x172/0x660 [cifs]
> do_clone_file_range+0x101/0x2d0
> vfs_clone_file_range+0x3f/0x150
> ioctl_file_clone+0x52/0xc0
> do_vfs_ioctl+0x68f/0x910
> ? __fget_light+0xa5/0x120
> __x64_sys_ioctl+0x7d/0xf0
> do_syscall_64+0x59/0x90
> ? kmem_cache_free+0x22/0x3e0
> ? putname+0x5b/0x80
> ? exit_to_user_mode_prepare+0x30/0xb0
> ? syscall_exit_to_user_mode+0x37/0x60
> ? do_syscall_64+0x68/0x90
> ? do_syscall_64+0x68/0x90
> ? do_syscall_64+0x68/0x90
> 
> There is no known workaround.
> 
> [Fix]
> 
> The stacktrace is very similar to a regression reported to upstream 6.1.y:
> 
> https://lore.kernel.org/linux-mm/a76b370f93cb928c049b94e1fde0d2da506dfcb2.camel@amazon.com/T/
> 
> The thread mentions that:
> 
> commit 7b2404a886f8b91250c31855d287e632123e1746
> Author: David Howells <dhowells@redhat.com>
> Date: Fri Dec 1 00:22:00 2023 +0000
> Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b2404a886f8b91250c31855d287e632123e1746
> 
> introduced the issue to Debian's 6.1 kernel.
> 
> This got backported to Ubuntu in:
> 
> commit 3adbe2ccd8b9b8fde93e03958d6176945794d288
> Author: David Howells <dhowells@redhat.com>
> Date: Fri Dec 1 00:22:00 2023 +0000
> Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
> 
> $ git describe --contains 3adbe2ccd8b9b8fde93e03958d6176945794d288
> Ubuntu-6.5.0-20.20~107
> 
> Which we have been using for some time now, and is not the culprit.
> 
> Reading the regression mailing list thread, they mention that things work
> differently in 6.1:
> 
>> Yeah.  __filemap_get_folio() works differently in v6.1.y. There it returns a
>> folio or NULL.  In 6.7 it returns a folio or a negative error code.  The error
>> check in cifs_flush_folio() needs to change to something like:
>>
>> 	folio = filemap_get_folio(inode->i_mapping, index);
>> 	if (!folio)
>> 		return -ENOMEM;
>>
>> David
> 
> 6.1.y then got a specific patch to fix the issue in 6.1, which is:
> 
> commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
> Author: Steve French <stfrench@microsoft.com>
> Date: Fri Jan 12 23:08:51 2024 -0600
> Subject: cifs: fix flushing folio regression for 6.1 backport
> Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
> 
> $ git describe --contains 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
> Ubuntu-6.5.0-27.28~162
> 
> Since the Ubuntu mantic kernel consumes both 6.1.y and 6.7.y / 6.8.y stable
> patches, this patch was applied to mantic's 6.5 kernel by mistake, and contains
> the wrong logic for how __filemap_get_folio() works in 6.5.
> 
> The fix is to revert "cifs: fix flushing folio regression for 6.1 backport" as
> a SAUCE patch.
> 
> [Testcase]
> 
> Start two VMs. One is recommended to be Debian 12, which is what some users
> have had luck with in the past, as the server, and the client can be mantic.
> 
> Server
> ------
> 
> $ sudo apt update
> $ sudo apt upgrade
> $ sudo apt install samba
> $ sudo vim /etc/samba/smb.conf
> server min protocol = NT1
> [sambashare]
>      comment = Samba on Ubuntu
>      path = /home/ubuntu/sambashare
>      read only = no
>      browsable = yes
> $ mkdir ~/sambashare
> $ sudo smbpasswd -a ubuntu
> 
> Client
> ------
> 
> $ sudo apt update
> $ sudo apt install cifs-utils
> $ mkdir ~/share
> $ sudo mount -t cifs -o username=ubuntu //192.168.122.185/sambashare ~/share
> Password for ubuntu@//192.168.122.185/sambashare:
> $ mount -l
> ...
> //192.168.122.185/sambashare on /home/ubuntu/share type cifs (rw,relatime,vers=3.1.1,cache=strict,username=ubuntu,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.122.185,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
> 
> $ ls
> hallo.txt hello.txt sample.txt sample2.txt
> $ sudo cp hello.txt hello.txt.1
> Killed
> 
> If you install the test kernel available from the following ppa:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test
> 
> The copy will work as expected.
> 
> [Where problems could occur]
> 
> Reverting the patch restores logic back to how it was between 6.5.0-20-generic
> through to 6.5.0-26-generic, which functions, and is well tested by the
> community.
> 
> If a regression were to occur, it would impact all writes to cifs mounts,
> particularly to the same destination directory as the origin file. There is no
> known workarounds.
> 
> Matthew Ruffell (1):
>    UBUNTU: SAUCE: Revert "cifs: fix flushing folio regression for 6.1
>      backport"
> 
>   fs/smb/client/cifsfs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: Stefan Bader <stefan.bader@canonical.com>
Roxana Nicolescu April 25, 2024, 6:46 p.m. UTC | #2
On 22/04/2024 04:30, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2060919
>
> [Impact]
>
> Copying or modifying a file to the same directory within a cifs mount results in
> a page fault, and the process that initiated the copy being killed. This could
> be cp, nautilus, etc.
>
> This results in the following oops:
>
> BUG: unable to handle page fault for address: fffffffffffffffe
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD f45a3f067 P4D f45a3f067 PUD f45a41067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 28103 Comm: Thread (pooled) Tainted: P OE 6.5.0-27-generic #28-Ubuntu
> RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
> Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 79 b3 23 f1 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
> RSP: 0018:ffffaab6865ffbf8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffaab6865ffc28 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000023854 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffaab6865ffc78 R14: ffff906675d8aed0 R15: ffffaab6865ffc70
> FS: 00007bd4d594b6c0(0000) GS:ffff90753f800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: fffffffffffffffe CR3: 000000017022a000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? show_regs+0x6d/0x80
> ? __die+0x24/0x80
> ? page_fault_oops+0x99/0x1b0
> ? kernelmode_fixup_or_oops+0xb2/0x140
> ? __bad_area_nosemaphore+0x1a5/0x2c0
> ? bad_area_nosemaphore+0x16/0x30
> ? do_kern_addr_fault+0x7b/0xa0
> ? exc_page_fault+0x1a4/0x1b0
> ? asm_exc_page_fault+0x27/0x30
> ? cifs_flush_folio+0x41/0xf0 [cifs]
> ? cifs_flush_folio+0x37/0xf0 [cifs]
> cifs_remap_file_range+0x172/0x660 [cifs]
> do_clone_file_range+0x101/0x2d0
> vfs_clone_file_range+0x3f/0x150
> ioctl_file_clone+0x52/0xc0
> do_vfs_ioctl+0x68f/0x910
> ? __fget_light+0xa5/0x120
> __x64_sys_ioctl+0x7d/0xf0
> do_syscall_64+0x59/0x90
> ? kmem_cache_free+0x22/0x3e0
> ? putname+0x5b/0x80
> ? exit_to_user_mode_prepare+0x30/0xb0
> ? syscall_exit_to_user_mode+0x37/0x60
> ? do_syscall_64+0x68/0x90
> ? do_syscall_64+0x68/0x90
> ? do_syscall_64+0x68/0x90
>
> There is no known workaround.
>
> [Fix]
>
> The stacktrace is very similar to a regression reported to upstream 6.1.y:
>
> https://lore.kernel.org/linux-mm/a76b370f93cb928c049b94e1fde0d2da506dfcb2.camel@amazon.com/T/
>
> The thread mentions that:
>
> commit 7b2404a886f8b91250c31855d287e632123e1746
> Author: David Howells <dhowells@redhat.com>
> Date: Fri Dec 1 00:22:00 2023 +0000
> Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b2404a886f8b91250c31855d287e632123e1746
>
> introduced the issue to Debian's 6.1 kernel.
>
> This got backported to Ubuntu in:
>
> commit 3adbe2ccd8b9b8fde93e03958d6176945794d288
> Author: David Howells <dhowells@redhat.com>
> Date: Fri Dec 1 00:22:00 2023 +0000
> Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
>
> $ git describe --contains 3adbe2ccd8b9b8fde93e03958d6176945794d288
> Ubuntu-6.5.0-20.20~107
>
> Which we have been using for some time now, and is not the culprit.
>
> Reading the regression mailing list thread, they mention that things work
> differently in 6.1:
>
>> Yeah.  __filemap_get_folio() works differently in v6.1.y. There it returns a
>> folio or NULL.  In 6.7 it returns a folio or a negative error code.  The error
>> check in cifs_flush_folio() needs to change to something like:
>>
>> 	folio = filemap_get_folio(inode->i_mapping, index);
>> 	if (!folio)
>> 		return -ENOMEM;
>>
>> David
> 6.1.y then got a specific patch to fix the issue in 6.1, which is:
>
> commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
> Author: Steve French <stfrench@microsoft.com>
> Date: Fri Jan 12 23:08:51 2024 -0600
> Subject: cifs: fix flushing folio regression for 6.1 backport
> Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
>
> $ git describe --contains 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
> Ubuntu-6.5.0-27.28~162
>
> Since the Ubuntu mantic kernel consumes both 6.1.y and 6.7.y / 6.8.y stable
> patches, this patch was applied to mantic's 6.5 kernel by mistake, and contains
> the wrong logic for how __filemap_get_folio() works in 6.5.
>
> The fix is to revert "cifs: fix flushing folio regression for 6.1 backport" as
> a SAUCE patch.
>
> [Testcase]
>
> Start two VMs. One is recommended to be Debian 12, which is what some users
> have had luck with in the past, as the server, and the client can be mantic.
>
> Server
> ------
>
> $ sudo apt update
> $ sudo apt upgrade
> $ sudo apt install samba
> $ sudo vim /etc/samba/smb.conf
> server min protocol = NT1
> [sambashare]
>      comment = Samba on Ubuntu
>      path = /home/ubuntu/sambashare
>      read only = no
>      browsable = yes
> $ mkdir ~/sambashare
> $ sudo smbpasswd -a ubuntu
>
> Client
> ------
>
> $ sudo apt update
> $ sudo apt install cifs-utils
> $ mkdir ~/share
> $ sudo mount -t cifs -o username=ubuntu //192.168.122.185/sambashare ~/share
> Password for ubuntu@//192.168.122.185/sambashare:
> $ mount -l
> ...
> //192.168.122.185/sambashare on /home/ubuntu/share type cifs (rw,relatime,vers=3.1.1,cache=strict,username=ubuntu,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.122.185,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
>
> $ ls
> hallo.txt hello.txt sample.txt sample2.txt
> $ sudo cp hello.txt hello.txt.1
> Killed
>
> If you install the test kernel available from the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test
>
> The copy will work as expected.
>
> [Where problems could occur]
>
> Reverting the patch restores logic back to how it was between 6.5.0-20-generic
> through to 6.5.0-26-generic, which functions, and is well tested by the
> community.
>
> If a regression were to occur, it would impact all writes to cifs mounts,
> particularly to the same destination directory as the origin file. There is no
> known workarounds.
>
> Matthew Ruffell (1):
>    UBUNTU: SAUCE: Revert "cifs: fix flushing folio regression for 6.1
>      backport"
>
>   fs/smb/client/cifsfs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Roxana Nicolescu April 25, 2024, 6:49 p.m. UTC | #3
On 22/04/2024 04:30, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2060919
>
> [Impact]
>
> Copying or modifying a file to the same directory within a cifs mount results in
> a page fault, and the process that initiated the copy being killed. This could
> be cp, nautilus, etc.
>
> This results in the following oops:
>
> BUG: unable to handle page fault for address: fffffffffffffffe
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD f45a3f067 P4D f45a3f067 PUD f45a41067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 28103 Comm: Thread (pooled) Tainted: P OE 6.5.0-27-generic #28-Ubuntu
> RIP: 0010:cifs_flush_folio+0x41/0xf0 [cifs]
> Code: 49 89 cd 31 c9 41 54 49 89 f4 48 c1 ee 0c 53 48 83 ec 08 48 8b 7f 30 44 89 45 d4 e8 79 b3 23 f1 48 89 c3 31 c0 48 85 db 74 77 <48> 8b 13 b8 00 10 00 00 f7 c2 00 00 01 00 74 10 0f b6 4b 51 48 d3
> RSP: 0018:ffffaab6865ffbf8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: fffffffffffffffe RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffaab6865ffc28 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000023854 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffaab6865ffc78 R14: ffff906675d8aed0 R15: ffffaab6865ffc70
> FS: 00007bd4d594b6c0(0000) GS:ffff90753f800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: fffffffffffffffe CR3: 000000017022a000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? show_regs+0x6d/0x80
> ? __die+0x24/0x80
> ? page_fault_oops+0x99/0x1b0
> ? kernelmode_fixup_or_oops+0xb2/0x140
> ? __bad_area_nosemaphore+0x1a5/0x2c0
> ? bad_area_nosemaphore+0x16/0x30
> ? do_kern_addr_fault+0x7b/0xa0
> ? exc_page_fault+0x1a4/0x1b0
> ? asm_exc_page_fault+0x27/0x30
> ? cifs_flush_folio+0x41/0xf0 [cifs]
> ? cifs_flush_folio+0x37/0xf0 [cifs]
> cifs_remap_file_range+0x172/0x660 [cifs]
> do_clone_file_range+0x101/0x2d0
> vfs_clone_file_range+0x3f/0x150
> ioctl_file_clone+0x52/0xc0
> do_vfs_ioctl+0x68f/0x910
> ? __fget_light+0xa5/0x120
> __x64_sys_ioctl+0x7d/0xf0
> do_syscall_64+0x59/0x90
> ? kmem_cache_free+0x22/0x3e0
> ? putname+0x5b/0x80
> ? exit_to_user_mode_prepare+0x30/0xb0
> ? syscall_exit_to_user_mode+0x37/0x60
> ? do_syscall_64+0x68/0x90
> ? do_syscall_64+0x68/0x90
> ? do_syscall_64+0x68/0x90
>
> There is no known workaround.
>
> [Fix]
>
> The stacktrace is very similar to a regression reported to upstream 6.1.y:
>
> https://lore.kernel.org/linux-mm/a76b370f93cb928c049b94e1fde0d2da506dfcb2.camel@amazon.com/T/
>
> The thread mentions that:
>
> commit 7b2404a886f8b91250c31855d287e632123e1746
> Author: David Howells <dhowells@redhat.com>
> Date: Fri Dec 1 00:22:00 2023 +0000
> Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b2404a886f8b91250c31855d287e632123e1746
>
> introduced the issue to Debian's 6.1 kernel.
>
> This got backported to Ubuntu in:
>
> commit 3adbe2ccd8b9b8fde93e03958d6176945794d288
> Author: David Howells <dhowells@redhat.com>
> Date: Fri Dec 1 00:22:00 2023 +0000
> Subject: cifs: Fix flushing, invalidation and file size with copy_file_range()
>
> $ git describe --contains 3adbe2ccd8b9b8fde93e03958d6176945794d288
> Ubuntu-6.5.0-20.20~107
>
> Which we have been using for some time now, and is not the culprit.
>
> Reading the regression mailing list thread, they mention that things work
> differently in 6.1:
>
>> Yeah.  __filemap_get_folio() works differently in v6.1.y. There it returns a
>> folio or NULL.  In 6.7 it returns a folio or a negative error code.  The error
>> check in cifs_flush_folio() needs to change to something like:
>>
>> 	folio = filemap_get_folio(inode->i_mapping, index);
>> 	if (!folio)
>> 		return -ENOMEM;
>>
>> David
> 6.1.y then got a specific patch to fix the issue in 6.1, which is:
>
> commit 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
> Author: Steve French <stfrench@microsoft.com>
> Date: Fri Jan 12 23:08:51 2024 -0600
> Subject: cifs: fix flushing folio regression for 6.1 backport
> Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
>
> $ git describe --contains 21bb2ba4f1ac1e3a57594be62dd74e7b1401b2b1
> Ubuntu-6.5.0-27.28~162
>
> Since the Ubuntu mantic kernel consumes both 6.1.y and 6.7.y / 6.8.y stable
> patches, this patch was applied to mantic's 6.5 kernel by mistake, and contains
> the wrong logic for how __filemap_get_folio() works in 6.5.
>
> The fix is to revert "cifs: fix flushing folio regression for 6.1 backport" as
> a SAUCE patch.
>
> [Testcase]
>
> Start two VMs. One is recommended to be Debian 12, which is what some users
> have had luck with in the past, as the server, and the client can be mantic.
>
> Server
> ------
>
> $ sudo apt update
> $ sudo apt upgrade
> $ sudo apt install samba
> $ sudo vim /etc/samba/smb.conf
> server min protocol = NT1
> [sambashare]
>      comment = Samba on Ubuntu
>      path = /home/ubuntu/sambashare
>      read only = no
>      browsable = yes
> $ mkdir ~/sambashare
> $ sudo smbpasswd -a ubuntu
>
> Client
> ------
>
> $ sudo apt update
> $ sudo apt install cifs-utils
> $ mkdir ~/share
> $ sudo mount -t cifs -o username=ubuntu //192.168.122.185/sambashare ~/share
> Password for ubuntu@//192.168.122.185/sambashare:
> $ mount -l
> ...
> //192.168.122.185/sambashare on /home/ubuntu/share type cifs (rw,relatime,vers=3.1.1,cache=strict,username=ubuntu,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.122.185,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
>
> $ ls
> hallo.txt hello.txt sample.txt sample2.txt
> $ sudo cp hello.txt hello.txt.1
> Killed
>
> If you install the test kernel available from the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/lp2060919-test
>
> The copy will work as expected.
>
> [Where problems could occur]
>
> Reverting the patch restores logic back to how it was between 6.5.0-20-generic
> through to 6.5.0-26-generic, which functions, and is well tested by the
> community.
>
> If a regression were to occur, it would impact all writes to cifs mounts,
> particularly to the same destination directory as the origin file. There is no
> known workarounds.
>
> Matthew Ruffell (1):
>    UBUNTU: SAUCE: Revert "cifs: fix flushing folio regression for 6.1
>      backport"
>
>   fs/smb/client/cifsfs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
Applied to mantic master-next branch. Thanks!