Message ID | 560AAF09.7010407@redhat.com |
---|---|
State | New |
Headers | show |
On 09/29/2015 11:32 AM, Carlos O'Donell wrote: > In the posix_fallocate description in the manual we list various > drawbacks with the emulation, including the fact that a file opened > with O_APPEND fails with EBADF. Similarly a file opened with O_WRONLY > fails with EBADF. We must be able to emulate a compare-and-swap via > pread/compare/pwrite in order to make the emulation as safe as possible. > It is not acceptable to ignore the read failure because it could result > in significant data loss across all of the blocks. There is no other way > to make this work without a true atomic CAS and SIGBUS handler (which > is looking more attractive as a way to remove the race condition). Just a note here, this still doesn't solve the O_WRONLY problem, because such an fd can't be mapped PROT_WRITE since it violates the permissions of file descriptor. We would need an fdreopen syscall, which doesn't exist, and that would open a race window where other threads might actually succeed at reading a file that should be write only (potential security risk). Cheers, Carlos.
On 09/29/2015 05:47 PM, Carlos O'Donell wrote: > On 09/29/2015 11:32 AM, Carlos O'Donell wrote: >> In the posix_fallocate description in the manual we list various >> drawbacks with the emulation, including the fact that a file opened >> with O_APPEND fails with EBADF. Similarly a file opened with O_WRONLY >> fails with EBADF. We must be able to emulate a compare-and-swap via >> pread/compare/pwrite in order to make the emulation as safe as possible. >> It is not acceptable to ignore the read failure because it could result >> in significant data loss across all of the blocks. There is no other way >> to make this work without a true atomic CAS and SIGBUS handler (which >> is looking more attractive as a way to remove the race condition). > > Just a note here, this still doesn't solve the O_WRONLY problem, because > such an fd can't be mapped PROT_WRITE since it violates the permissions > of file descriptor. We would need an fdreopen syscall, which doesn't exist, > and that would open a race window where other threads might actually > succeed at reading a file that should be write only (potential security > risk). fdreopen can be emulated through /proc on Linux, but it does not solve the problem with POSIX fcntl locking semantics (because *any* close operation by the process on the same file, not file descriptor, releases the locks). Florian
On Thu, Oct 22, 2015 at 09:42:35PM +0200, Florian Weimer wrote: > On 09/29/2015 05:47 PM, Carlos O'Donell wrote: > > On 09/29/2015 11:32 AM, Carlos O'Donell wrote: > >> In the posix_fallocate description in the manual we list various > >> drawbacks with the emulation, including the fact that a file opened > >> with O_APPEND fails with EBADF. Similarly a file opened with O_WRONLY > >> fails with EBADF. We must be able to emulate a compare-and-swap via > >> pread/compare/pwrite in order to make the emulation as safe as possible. > >> It is not acceptable to ignore the read failure because it could result > >> in significant data loss across all of the blocks. There is no other way > >> to make this work without a true atomic CAS and SIGBUS handler (which > >> is looking more attractive as a way to remove the race condition). > > > > Just a note here, this still doesn't solve the O_WRONLY problem, because > > such an fd can't be mapped PROT_WRITE since it violates the permissions > > of file descriptor. We would need an fdreopen syscall, which doesn't exist, > > and that would open a race window where other threads might actually > > succeed at reading a file that should be write only (potential security > > risk). > > fdreopen can be emulated through /proc on Linux, but it does not solve > the problem with POSIX fcntl locking semantics (because *any* close > operation by the process on the same file, not file descriptor, releases > the locks). This can presumably be avoided by using a child process (or possibly a thread that does unshare(CLONE_FILES) or something) to do the work. Rather hideous, but this whole emulation mess is hideous to begin with. Rich
diff --git a/manual/filesys.texi b/manual/filesys.texi index ed4f5fd..4064885 100644 --- a/manual/filesys.texi +++ b/manual/filesys.texi @@ -3300,6 +3300,10 @@ underlying file in the to-be-allocated area. Non-null bytes could be overwritten with null bytes. @item +If @var{fd} has been opened with the @code{O_WRONLY} flag, the function +will fail with an @code{errno} value of @code{EBADF}. + +@item If @var{fd} has been opened with the @code{O_APPEND} flag, the function will fail with an @code{errno} value of @code{EBADF}.