Message ID | 1412001812-9943-1-git-send-email-olivier.blin@softathome.com |
---|---|
State | New |
Headers | show |
On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote: > EOPNOTSUPP is not a valid error code for posix_fallocate(), the > implementation should have a fallback when the underlying filesystem > does not support fallocate(). > > This is especially useful when using posix_fallocate() on tmpfs with > kernels older than 3.5, for which there was no fallocate support. > > This copies the implementation of the internal fallback from glibc, > with a few adaptations for internals symbols. This code is dangerous (has race conditions that cause file corruption) and should not be added. See the bug report for the corresponding code in glibc: https://sourceware.org/bugzilla/show_bug.cgi?id=15661 Rich
On 09/29/14 18:42, Rich Felker wrote: > On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote: > > EOPNOTSUPP is not a valid error code for posix_fallocate(), the > > implementation should have a fallback when the underlying filesystem > > does not support fallocate(). > > > > This is especially useful when using posix_fallocate() on tmpfs with > > kernels older than 3.5, for which there was no fallocate support. > > > > This copies the implementation of the internal fallback from glibc, > > with a few adaptations for internals symbols. > > This code is dangerous (has race conditions that cause file > corruption) and should not be added. See the bug report for the > corresponding code in glibc: > > https://sourceware.org/bugzilla/show_bug.cgi?id=15661 > > Rich Indeed, this is dangerous. According to the specification of posix_fallocate(), there does not seem to be an appropriate error code to return when the filesystem driver does not natively support fallocate(). In this case, the libc should probably provide a fallback implementation. Users of posix_fallocate() do not expect it to fail if the FS does not support fallocate(). See for example the usage in wayland: http://cgit.freedesktop.org/wayland/wayland/tree/cursor/os-compatibility.c?h=1.6&id=1.6.0#n79 To have a better fallback, I don't see a way to avoid the extra child process either, since we can only do the atomic compare-and-swap through mmap. Does anyone have a better suggestion to implement the fallback? Thanks for bringing this up
On 09/29/14 13:39, Olivier Blin wrote: > On 09/29/14 18:42, Rich Felker wrote: >> On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote: >> > EOPNOTSUPP is not a valid error code for posix_fallocate(), the >> > implementation should have a fallback when the underlying filesystem >> > does not support fallocate(). >> > >> > This is especially useful when using posix_fallocate() on tmpfs with >> > kernels older than 3.5, for which there was no fallocate support. >> > >> > This copies the implementation of the internal fallback from glibc, >> > with a few adaptations for internals symbols. >> >> This code is dangerous (has race conditions that cause file >> corruption) and should not be added. See the bug report for the >> corresponding code in glibc: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=15661 >> >> Rich > Indeed, this is dangerous. > > According to the specification of posix_fallocate(), there does not seem > to be an appropriate error code to return when the filesystem driver > does not natively support fallocate(). > In this case, the libc should probably provide a fallback implementation. Why can't there be an implementation specific error message here since the specs are silent on the issue? I see no problem with returning ENOTSUP even though its not specified in [1]. > > Users of posix_fallocate() do not expect it to fail if the FS does not > support fallocate(). The specs don't guarantee that. A better approach would be to check for the errors specified in [1], in say a switch-case, and then have a default which is some other "unspecified" error. > See for example the usage in wayland: > http://cgit.freedesktop.org/wayland/wayland/tree/cursor/os-compatibility.c?h=1.6&id=1.6.0#n79 > Sounds like a bug in wayland. glibc might be creating some unreasonable expectations. > > To have a better fallback, I don't see a way to avoid the extra child > process either, since we can only do the atomic compare-and-swap through > mmap. > Does anyone have a better suggestion to implement the fallback? I'm not sure there is any good fallback if the fallocate syscall is not available. Again, it seems reasonable to me to just let it fail rather than trying to implement it via some brute force. Ref. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html
On Tue, Sep 30, 2014 at 05:21:43PM -0400, Anthony G. Basile wrote: > On 09/29/14 13:39, Olivier Blin wrote: > >On 09/29/14 18:42, Rich Felker wrote: > >>On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote: > >>> EOPNOTSUPP is not a valid error code for posix_fallocate(), the > >>> implementation should have a fallback when the underlying filesystem > >>> does not support fallocate(). > >>> > >>> This is especially useful when using posix_fallocate() on tmpfs with > >>> kernels older than 3.5, for which there was no fallocate support. > >>> > >>> This copies the implementation of the internal fallback from glibc, > >>> with a few adaptations for internals symbols. > >> > >>This code is dangerous (has race conditions that cause file > >>corruption) and should not be added. See the bug report for the > >>corresponding code in glibc: > >> > >>https://sourceware.org/bugzilla/show_bug.cgi?id=15661 > >> > >>Rich > >Indeed, this is dangerous. > > > >According to the specification of posix_fallocate(), there does not seem > >to be an appropriate error code to return when the filesystem driver > >does not natively support fallocate(). > >In this case, the libc should probably provide a fallback implementation. > > Why can't there be an implementation specific error message here > since the specs are silent on the issue? I see no problem with > returning ENOTSUP even though its not specified in [1]. For details see XSH 2.3 Error Numbers: "Implementations may support additional errors not included in this list, may generate errors included in this list under circumstances other than those described here, or may contain extensions or limitations that prevent some errors from occurring. ... Implementations may generate error numbers listed here under circumstances other than those described, if and only if all those error conditions can always be treated identically to the error conditions as described in this volume of POSIX.1-2008. Implementations shall not generate a different error number from one required by this volume of POSIX.1-2008 for an error condition described in this volume of POSIX.1-2008, but may generate additional errors unless explicitly disallowed for a particular function." Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_03 Rich
diff --git a/libc/sysdeps/linux/common/posix_fallocate.c b/libc/sysdeps/linux/common/posix_fallocate.c index 76771e3..cc4553c 100644 --- a/libc/sysdeps/linux/common/posix_fallocate.c +++ b/libc/sysdeps/linux/common/posix_fallocate.c @@ -12,12 +12,88 @@ #include <fcntl.h> #include <bits/kernel-features.h> #include <stdint.h> +#include <sys/statfs.h> #if defined __NR_fallocate +/* Reserve storage for the data of the file associated with FD. */ +/* Adapted from glibc */ +int +internal_fallocate (int fd, __off_t offset, __off_t len) +{ + struct stat64 st; + struct statfs f; + + /* `off_t' is a signed type. Therefore we can determine whether + OFFSET + LEN is too large if it is a negative value. */ + if (offset < 0 || len < 0) + return EINVAL; + if (offset + len < 0) + return EFBIG; + + /* First thing we have to make sure is that this is really a regular + file. */ + if (fstat64 (fd, &st) != 0) + return EBADF; + if (S_ISFIFO (st.st_mode)) + return ESPIPE; + if (! S_ISREG (st.st_mode)) + return ENODEV; + + if (len == 0) + { + if (st.st_size < offset) + { + int ret = ftruncate (fd, offset); + + if (ret != 0) + ret = errno; + return ret; + } + return 0; + } + + /* We have to know the block size of the filesystem to get at least some + sort of performance. */ + if (__libc_fstatfs (fd, &f) != 0) + return errno; + + /* Try to play safe. */ + if (f.f_bsize == 0) + f.f_bsize = 512; + + /* Write something to every block. */ + for (offset += (len - 1) % f.f_bsize; len > 0; offset += f.f_bsize) + { + len -= f.f_bsize; + + if (offset < st.st_size) + { + unsigned char c; + ssize_t rsize = __libc_pread (fd, &c, 1, offset); + + if (rsize < 0) + return errno; + /* If there is a non-zero byte, the block must have been + allocated already. */ + else if (rsize == 1 && c != 0) + continue; + } + + if (__libc_pwrite (fd, "", 1, offset) != 1) + return errno; + } + + return 0; +} + extern __typeof(fallocate) __libc_fallocate attribute_hidden; int posix_fallocate(int fd, __off_t offset, __off_t len) { - return __libc_fallocate(fd, 0, offset, len); + int result = __libc_fallocate(fd, 0, offset, len); + if (result != EOPNOTSUPP) + return result; + + return internal_fallocate(fd, offset, len); } # if defined __UCLIBC_HAS_LFS__ && __WORDSIZE == 64 strong_alias(posix_fallocate,posix_fallocate64) diff --git a/libc/sysdeps/linux/common/posix_fallocate64.c b/libc/sysdeps/linux/common/posix_fallocate64.c index 12ddbc2..2f02626 100644 --- a/libc/sysdeps/linux/common/posix_fallocate64.c +++ b/libc/sysdeps/linux/common/posix_fallocate64.c @@ -12,15 +12,91 @@ #include <fcntl.h> #include <bits/kernel-features.h> #include <stdint.h> +#include <sys/statfs.h> #if defined __NR_fallocate # if __WORDSIZE == 64 /* Can use normal posix_fallocate() */ # elif __WORDSIZE == 32 +/* Reserve storage for the data of the file associated with FD. */ +/* Adapted from glibc */ +static int +internal_fallocate64 (int fd, __off64_t offset, __off64_t len) +{ + struct stat64 st; + struct statfs64 f; + + /* `off64_t' is a signed type. Therefore we can determine whether + OFFSET + LEN is too large if it is a negative value. */ + if (offset < 0 || len < 0) + return EINVAL; + if (offset + len < 0) + return EFBIG; + + /* First thing we have to make sure is that this is really a regular + file. */ + if (fstat64 (fd, &st) != 0) + return EBADF; + if (S_ISFIFO (st.st_mode)) + return ESPIPE; + if (! S_ISREG (st.st_mode)) + return ENODEV; + + if (len == 0) + { + if (st.st_size < offset) + { + int ret = ftruncate64 (fd, offset); + + if (ret != 0) + ret = errno; + return ret; + } + return 0; + } + + /* We have to know the block size of the filesystem to get at least some + sort of performance. */ + if (fstatfs64 (fd, &f) != 0) + return errno; + + /* Try to play safe. */ + if (f.f_bsize == 0) + f.f_bsize = 512; + + /* Write something to every block. */ + for (offset += (len - 1) % f.f_bsize; len > 0; offset += f.f_bsize) + { + len -= f.f_bsize; + + if (offset < st.st_size) + { + unsigned char c; + ssize_t rsize = __libc_pread64 (fd, &c, 1, offset); + + if (rsize < 0) + return errno; + /* If there is a non-zero byte, the block must have been + allocated already. */ + else if (rsize == 1 && c != 0) + continue; + } + + if (__libc_pwrite64 (fd, "", 1, offset) != 1) + return errno; + } + + return 0; +} + extern __typeof(fallocate64) __libc_fallocate64 attribute_hidden; int posix_fallocate64(int fd, __off64_t offset, __off64_t len) { - return __libc_fallocate64(fd, 0, offset, len); + int result = __libc_fallocate64(fd, 0, offset, len); + if (result != EOPNOTSUPP) + return result; + + return internal_fallocate64(fd, offset, len); } # else # error your machine is neither 32 bit or 64 bit ... it must be magical