Message ID | 1385979146-13825-1-git-send-email-ydroneaud@opteya.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
From: Yann Droneaud <ydroneaud@opteya.com> Date: Mon, 2 Dec 2013 11:12:26 +0100 > socketpair() error paths can be simplified to not call > heavy-weight sys_close(). > > This patch makes socketpair() use of error paths which do not > rely on heavy-weight call to sys_lose(): it's better to try > to push the file descriptor to userspace before installing > the socket file to the file descriptor, so that errors are > catched earlier and being easier to handle. > > Three distinct error paths are needed since calling fput() > on file structure returned by sock_alloc_file() will > implicitly call sock_release() on the associated socket > structure. > > Cc: David S. Miller <davem@davemloft.net> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> > Link: http://marc.info/?i=1385977019-12282-1-git-send-email-ydroneaud@opteya.com I know that sys_close() is expensive, _but_ erroring out on the put_user() is extremely rare and logically it makes a ton more sense to fully install a file descriptor before writing it's numeric value into userspace. Sorry, I think the current code is fine and I'm not going to apply this, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Le jeudi 05 décembre 2013 à 16:23 -0500, David Miller a écrit : > From: Yann Droneaud <ydroneaud@opteya.com> > Date: Mon, 2 Dec 2013 11:12:26 +0100 > > > socketpair() error paths can be simplified to not call > > heavy-weight sys_close(). > > > > This patch makes socketpair() use of error paths which do not > > rely on heavy-weight call to sys_lose(): it's better to try > > to push the file descriptor to userspace before installing > > the socket file to the file descriptor, so that errors are > > catched earlier and being easier to handle. > > > > Three distinct error paths are needed since calling fput() > > on file structure returned by sock_alloc_file() will > > implicitly call sock_release() on the associated socket > > structure. > > > > Cc: David S. Miller <davem@davemloft.net> > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> > > Link: http://marc.info/?i=1385977019-12282-1-git-send-email-ydroneaud@opteya.com > > I know that sys_close() is expensive, _but_ erroring out on > the put_user() is extremely rare and logically it makes > a ton more sense to fully install a file descriptor before > writing it's numeric value into userspace. > > Sorry, I think the current code is fine and I'm not going > to apply this, thanks. Thanks for the review. AFAIK, using sys_close() seems to be the exception, and writing the file descriptor before installing it is the more or less the norm. I've made a review of each subsystem using get_unused_fd{,_flags} and anon_inode_get{fd,file}: most of the time, error handling involve fput() and put_unused_fd() and not sys_close(). Looking at the places where sys_close() is used make it pretty obvious: - autofs_dev_ioctl_closemount(), dev-ioctl.c:298 [1] [just a "fancy" way of doing close() through an ioctl] - load_misc_binary(), binfmt_misc.c:208 [2] [could probably made use fput(),put_unused_fd()] - change_floppy(), do_mounts.c:490,501 [3] - handle_initrd(), do_mounts_initrd.c:113 [4] - md_setup_drive(), do_mounts_md.c:193,245,249 [5] - autodetect_raid(), do_mounts_md.c:299 [6] - rd_load_image(), do_mounts_rd.c:266,292,294 [7] - do_copy(), initramfs.c:350 [8] - clean_rootfs(), initramfs.c:549,577 [9] - populate_rootfs(), initramfs.c:607 [10] - socketpair(), socket.c:1486,1487 [11] [our target] The majority of sys_close() users are in init code, which is code behaving like userspace code. This make socketpair() usage of sys_close() quite unusual. It deserve to be replaced by the more common pattern of fput() / put_unused_fd(). Regards. Links: [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/autofs4/dev-ioctl.c?id=v3.13-rc2#n298 [2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_misc.c?id=v3.13-rc2#n208 [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts.c?id=v3.13-rc2#n490 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts.c?id=v3.13-rc2#n501 [4] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_initrd.c?id=v3.13-rc2#n113 [5] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n193 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n245 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n249 [6] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_md.c?id=v3.13-rc2#n299 [7] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_rd.c?id=v3.13-rc2#n266 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_rd.c?id=v3.13-rc2#n292 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/do_mounts_rd.c?id=v3.13-rc2#n294 [8] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n350 [9] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n549 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n577 [10] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/init/initramfs.c?id=v3.13-rc2#n607 [11] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/socket.c?id=v3.13-rc2#n1486 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/socket.c?id=v3.13-rc2#n1487 Aside: I will submit "soon" a patchset that add some sys_close() in error paths of code relying on anon_inode_getfd() or using some layer which call get_unused_fd{,flags}()/fd_install() deep in the call chain, mostly in kvm, drm, iio. Regards.
From: Yann Droneaud <ydroneaud@opteya.com> Date: Fri, 06 Dec 2013 00:15:31 +0100 > AFAIK, using sys_close() seems to be the exception, and writing the file > descriptor before installing it is the more or less the norm. What other system call in the kernel writes a file descriptor's value into the address space of a user process before the file descriptor is actually usable? That's really terrible semantically. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Le jeudi 05 décembre 2013 à 19:43 -0500, David Miller a écrit : > From: Yann Droneaud <ydroneaud@opteya.com> > Date: Fri, 06 Dec 2013 00:15:31 +0100 > > > AFAIK, using sys_close() seems to be the exception, and writing the file > > descriptor before installing it is the more or less the norm. > > What other system call in the kernel writes a file descriptor's value > into the address space of a user process before the file descriptor > is actually usable? > Some carefully chosen examples: - recvmsg() through - netlink_recvmsg() / unix_dgram_recvmsg() / unix_stream_recvmsg() - scm_recv() - scm_detach_fds(), scm.c:280 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/core/scm.c?id=v3.13-rc2#n280 - scm_detach_fds_compat(), compat.c:296 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/compat.c?id=v3.13-rc2#n296 - pipe() / pipe2(), pipe.c:969, through http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/pipe.c?id=v3.13-rc2#n969 - __do_pipe_flags, pipe.c:919 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/pipe.c?id=v3.13-rc2#n919 > That's really terrible semantically. In fact, it's better: 1) get_unused_fd_flags() mark a file descriptor as 'reserved', so no other syscall would be able to allocate it. At this point, even if another userspace thread guess the correct file descriptor number, trying to use it will give it -EBADFD, since the file descriptor is not tied to a file (see __close_fd(), open.c:589 for example) 2) file descriptor is wrote in userspace memory, using put_user()/copy_to_user(). At this point, the userspace (another thread) could potentially read the memory location and use the file descriptor, but here again, it will give -EBADFD for the very same reason. 3) fd_install() link the file descriptor to the file. Now, the userspace (another thread) could read the memory location and use the file descriptor even if the syscall which create it haven't return to the userspace caller yet. If we arrange to put the call to fd_install() in the very last step of the syscall, the window where the file descriptor is usable by userspace but not yet validly returned to userspace is very tiny. In the other hand, when writing the file descriptor in the last step, eg. doing 1), 3) and 2), it makes possible for userspace to guess and manipulate the file descriptor while the syscall is not near completion, eg. kernel could have some things to do on the file, device, socket, ... before returning to userspace. In such case, bad things might happen if another userspace thread is trying to play a bad game with the file descriptor. BTW I'm not aware of any security implication of this issue, but I think it's better (safer) to have fd_install() being the very last step of a syscall. It should be the default good habit when dealing with file descriptor creation. Regards.
On Thu, Dec 05, 2013 at 07:43:55PM -0500, David Miller wrote: > From: Yann Droneaud <ydroneaud@opteya.com> > Date: Fri, 06 Dec 2013 00:15:31 +0100 > > > AFAIK, using sys_close() seems to be the exception, and writing the file > > descriptor before installing it is the more or less the norm. > > What other system call in the kernel writes a file descriptor's value > into the address space of a user process before the file descriptor > is actually usable? > > That's really terrible semantically. What's the problem with that? If nothing else, shared descriptor table is a lot more visible to other threads than two-element array, most likely in stack frame of whoever makes that syscall... As for your question, how about pipe(2)? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Al Viro <viro@ZenIV.linux.org.uk> Date: Fri, 6 Dec 2013 10:48:05 +0000 > On Thu, Dec 05, 2013 at 07:43:55PM -0500, David Miller wrote: >> From: Yann Droneaud <ydroneaud@opteya.com> >> Date: Fri, 06 Dec 2013 00:15:31 +0100 >> >> > AFAIK, using sys_close() seems to be the exception, and writing the file >> > descriptor before installing it is the more or less the norm. >> >> What other system call in the kernel writes a file descriptor's value >> into the address space of a user process before the file descriptor >> is actually usable? >> >> That's really terrible semantically. > > What's the problem with that? If nothing else, shared descriptor table is > a lot more visible to other threads than two-element array, most likely > in stack frame of whoever makes that syscall... > > As for your question, how about pipe(2)? Fair enough, Yann please resubmit your patch. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/socket.c b/net/socket.c index 0b18693f2be6..72bf3c41f4f0 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1445,48 +1445,61 @@ SYSCALL_DEFINE4(socketpair, int, family, int, type, int, protocol, err = fd1; goto out_release_both; } + fd2 = get_unused_fd_flags(flags); if (unlikely(fd2 < 0)) { err = fd2; - put_unused_fd(fd1); - goto out_release_both; + goto out_put_unused_1; } newfile1 = sock_alloc_file(sock1, flags, NULL); if (unlikely(IS_ERR(newfile1))) { err = PTR_ERR(newfile1); - put_unused_fd(fd1); - put_unused_fd(fd2); - goto out_release_both; + goto out_put_unused_both; } newfile2 = sock_alloc_file(sock2, flags, NULL); if (IS_ERR(newfile2)) { err = PTR_ERR(newfile2); - fput(newfile1); - put_unused_fd(fd1); - put_unused_fd(fd2); - sock_release(sock2); - goto out; + goto out_fput_1; } + err = put_user(fd1, &usockvec[0]); + if (err) + goto out_fput_both; + + err = put_user(fd2, &usockvec[1]); + if (err) + goto out_fput_both; + audit_fd_pair(fd1, fd2); + fd_install(fd1, newfile1); fd_install(fd2, newfile2); /* fd1 and fd2 may be already another descriptors. * Not kernel problem. */ - err = put_user(fd1, &usockvec[0]); - if (!err) - err = put_user(fd2, &usockvec[1]); - if (!err) - return 0; + return 0; - sys_close(fd2); - sys_close(fd1); - return err; +out_fput_both: + fput(newfile2); + fput(newfile1); + put_unused_fd(fd2); + put_unused_fd(fd1); + goto out; + +out_fput_1: + fput(newfile1); + put_unused_fd(fd2); + put_unused_fd(fd1); + sock_release(sock2); + goto out; +out_put_unused_both: + put_unused_fd(fd2); +out_put_unused_1: + put_unused_fd(fd1); out_release_both: sock_release(sock2); out_release_1:
socketpair() error paths can be simplified to not call heavy-weight sys_close(). This patch makes socketpair() use of error paths which do not rely on heavy-weight call to sys_lose(): it's better to try to push the file descriptor to userspace before installing the socket file to the file descriptor, so that errors are catched earlier and being easier to handle. Three distinct error paths are needed since calling fput() on file structure returned by sock_alloc_file() will implicitly call sock_release() on the associated socket structure. Cc: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=1385977019-12282-1-git-send-email-ydroneaud@opteya.com --- Hi, Please discard my previous patch "[PATCH] net: handle error more gracefully in socketpair()" [1] as I haven't double checked the underlying reason for the not-so-complicated error paths. This one is perhaps less appealing. But I think it's still a good thing to write the file descriptor to userspace before calling fd_install(): it's making error recovery easier. Changes from v1 [1]: - use three different error path to handle file allocated through sock_alloc_file(). Regards. [1] http://mid.gmane.org/1385977019-12282-1-git-send-email-ydroneaud@opteya.com net/socket.c | 49 +++++++++++++++++++++++++++++++------------------ 1 file changed, 31 insertions(+), 18 deletions(-)