Message ID | 8b5ddc69aae83b332b205dbeddb3284ab00f827e.1529082112.git.jstancek@redhat.com |
---|---|
State | Accepted |
Delegated to: | Petr Vorel |
Headers | show |
Series | fcntl36: fix 32-bit sporadic failures | expand |
Jan, Running in a container, armhf userland, with arm64 kernel (real HW: hikey): $ uname -a Linux ltpfcntl36 4.16.0-2-arm64 #1 SMP Debian 4.16.12-1 (2018-05-27) armv8l GNU/Linux $ file ./fcntl36 ./fcntl36: ELF 32-bit LSB pie executable ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=074ba3622d50f62108260d2912ae5dc0e8d52d46, with debug_info, not stripped My un-patched binary shows me: [pid 3501] fcntl64(11, F_OFD_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=16384, l_len=4096} <unfinished ...> no F_SETLK for this binary. Running in a container, arm64 userland, with arm64 kernel (real HW: hikey): $ file ./fcntl36 ./fcntl36: ELF 64-bit LSB pie executable ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=8fb1d0e0e0d027fce5c8d11742ce5bdb7678749b, with debug_info, not stripped Same un-patched binary shows me: [pid 483] fcntl(10, F_OFD_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=13312, l_len=4096} <unfinished ...> Do you want me to test patched LTP in arm32/64 to make sure they work and you check if it fixes the issue for i686 on your side ? Or should I compile it some other way ? Let me know what you need. Rafael On 15 June 2018 at 14:22, Jan Stancek <jstancek@redhat.com> wrote: > No kernel bug, problem was found in my reproducer. > Can anyone confirm this fixes 32-bit arm too?
----- Original Message ----- > Jan, > > Running in a container, armhf userland, with arm64 kernel (real HW: hikey): > > $ uname -a > Linux ltpfcntl36 4.16.0-2-arm64 #1 SMP Debian 4.16.12-1 (2018-05-27) > armv8l GNU/Linux > > $ file ./fcntl36 > ./fcntl36: ELF 32-bit LSB pie executable ARM, EABI5 version 1 (SYSV), > dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for > GNU/Linux 3.2.0, > BuildID[sha1]=074ba3622d50f62108260d2912ae5dc0e8d52d46, with > debug_info, not stripped > > My un-patched binary shows me: > > [pid 3501] fcntl64(11, F_OFD_SETLK, {l_type=F_UNLCK, > l_whence=SEEK_SET, l_start=16384, l_len=4096} <unfinished ...> > > no F_SETLK for this binary. > > Running in a container, arm64 userland, with arm64 kernel (real HW: hikey): > > $ file ./fcntl36 > ./fcntl36: ELF 64-bit LSB pie executable ARM aarch64, version 1 > (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, > for GNU/Linux 3.7.0, > BuildID[sha1]=8fb1d0e0e0d027fce5c8d11742ce5bdb7678749b, with > debug_info, not stripped > > Same un-patched binary shows me: > > [pid 483] fcntl(10, F_OFD_SETLKW, {l_type=F_UNLCK, > l_whence=SEEK_SET, l_start=13312, l_len=4096} <unfinished ...> > > Do you want me to test patched LTP in arm32/64 to make sure they work Yes, if possible test patched LTP in environment which is known to hit the failure. > and you check if it fixes the issue for i686 on your side ? I already did, it does eliminate problem for me. > Or should > I compile it some other way ? Default build is OK. Apply patch, build and see if you can reproduce the failure. > > Let me know what you need. > > Rafael > > On 15 June 2018 at 14:22, Jan Stancek <jstancek@redhat.com> wrote: > > > No kernel bug, problem was found in my reproducer. > > Can anyone confirm this fixes 32-bit arm too? >
On Sat, Jun 16, 2018 at 1:22 AM, Jan Stancek <jstancek@redhat.com> wrote: > fcntl36 testcase has been observed to sporadically fail, when > running 32-bit user-space on 64-bit kernel. > > Strace shows that region length is 0 for some commands: > fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120, l_len=0} > > This is because testcase is passing struct flock64, but > command is F_SETLK, not F_SETLK64. > > So, kernel treats argument for POSIX command as 32-bit "struct flock": > > static long do_compat_fcntl64(unsigned int fd, unsigned int cmd, > ... > case F_SETLK: > case F_SETLKW: > err = get_compat_flock(&flock, compat_ptr(arg)); > ... > > in contrast to F_OFD_* commands, where argument is treated > as "struct flock64": > case F_SETLK64: > case F_SETLKW64: > case F_OFD_SETLK: > case F_OFD_SETLKW: > err = get_compat_flock64(&flock, compat_ptr(arg)); > ... > > Switch argument of POSIX commands to 'struct flock' and leave it to > glibc to pick correct syscall and command. > > I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary) > and i386 kernel by running 100 iterations. Sounds reasonable. I can stably reproduce this issue on my raspbery Pi3, and with your patch applied, the problem was gone. [root@centos-rpi3 fcntl]# uname -a Linux centos-rpi3 4.9.13-v7.1.el7 #1 SMP Mon Feb 27 10:45:31 UTC 2017 armv7l armv7l armv7l GNU/Linux [root@centos-rpi3 fcntl]# file fcntl36 fcntl36: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=73d3a834d43361ecd4331d9b77781ea932a970cc, not stripped [root@centos-rpi3 fcntl]# ./fcntl36 tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s fcntl36.c:303: INFO: OFD read lock vs OFD write lock fcntl36.c:381: PASS: Access between threads synchronized fcntl36.c:303: INFO: OFD write lock vs POSIX write lock fcntl36.c:381: PASS: Access between threads synchronized fcntl36.c:303: INFO: OFD read lock vs POSIX write lock fcntl36.c:381: PASS: Access between threads synchronized fcntl36.c:303: INFO: OFD write lock vs POSIX read lock fcntl36.c:381: PASS: Access between threads synchronized fcntl36.c:303: INFO: OFD write lock vs OFD write lock fcntl36.c:381: PASS: Access between threads synchronized fcntl36.c:303: INFO: OFD r/w lock vs POSIX write lock fcntl36.c:381: PASS: Access between threads synchronized fcntl36.c:303: INFO: OFD r/w lock vs POSIX read lock fcntl36.c:381: PASS: Access between threads synchronized Summary: passed 7 failed 0 skipped 0 warnings 0
On 15 June 2018 at 22:52, Jan Stancek <jstancek@redhat.com> wrote: > fcntl36 testcase has been observed to sporadically fail, when > running 32-bit user-space on 64-bit kernel. > > Strace shows that region length is 0 for some commands: > fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120, l_len=0} > > This is because testcase is passing struct flock64, but > command is F_SETLK, not F_SETLK64. > > So, kernel treats argument for POSIX command as 32-bit "struct flock": > > static long do_compat_fcntl64(unsigned int fd, unsigned int cmd, > ... > case F_SETLK: > case F_SETLKW: > err = get_compat_flock(&flock, compat_ptr(arg)); > ... > > in contrast to F_OFD_* commands, where argument is treated > as "struct flock64": > case F_SETLK64: > case F_SETLKW64: > case F_OFD_SETLK: > case F_OFD_SETLKW: > err = get_compat_flock64(&flock, compat_ptr(arg)); > ... > > Switch argument of POSIX commands to 'struct flock' and leave it to > glibc to pick correct syscall and command. > > I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary) > and i386 kernel by running 100 iterations. > > Signed-off-by: Jan Stancek <jstancek@redhat.com> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> > --- > No kernel bug, problem was found in my reproducer. > Can anyone confirm this fixes 32-bit arm too? This patch tested on arm32 Beagleboard x15 for 25 iterations and PASS. Here you find test log before and after the patch. Full test log, https://lkft.validation.linaro.org/scheduler/job/290879#L3056 > > testcases/kernel/syscalls/fcntl/fcntl36.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/testcases/kernel/syscalls/fcntl/fcntl36.c b/testcases/kernel/syscalls/fcntl/fcntl36.c > index 3246d13892cd..81bd5a647e4c 100644 > --- a/testcases/kernel/syscalls/fcntl/fcntl36.c > +++ b/testcases/kernel/syscalls/fcntl/fcntl36.c > @@ -127,7 +127,7 @@ static void *fn_posix_w(void *arg) > int fd = SAFE_OPEN(fname, O_RDWR); > long wt = pa->cnt; > > - struct flock64 lck = { > + struct flock lck = { > .l_whence = SEEK_SET, > .l_start = pa->offset, > .l_len = pa->length, > @@ -227,7 +227,7 @@ static void *fn_posix_r(void *arg) > int i; > int fd = SAFE_OPEN(fname, O_RDWR); > > - struct flock64 lck = { > + struct flock lck = { > .l_whence = SEEK_SET, > .l_start = pa->offset, > .l_len = pa->length, > -- > 1.8.3.1 > Thanks for the patch ! Best regards Naresh Kamboju
----- Original Message ----- > On 15 June 2018 at 22:52, Jan Stancek <jstancek@redhat.com> wrote: > > fcntl36 testcase has been observed to sporadically fail, when > > running 32-bit user-space on 64-bit kernel. > > > > Strace shows that region length is 0 for some commands: > > fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120, > > l_len=0} > > > > This is because testcase is passing struct flock64, but > > command is F_SETLK, not F_SETLK64. > > > > So, kernel treats argument for POSIX command as 32-bit "struct flock": > > > > static long do_compat_fcntl64(unsigned int fd, unsigned int cmd, > > ... > > case F_SETLK: > > case F_SETLKW: > > err = get_compat_flock(&flock, compat_ptr(arg)); > > ... > > > > in contrast to F_OFD_* commands, where argument is treated > > as "struct flock64": > > case F_SETLK64: > > case F_SETLKW64: > > case F_OFD_SETLK: > > case F_OFD_SETLKW: > > err = get_compat_flock64(&flock, compat_ptr(arg)); > > ... > > > > Switch argument of POSIX commands to 'struct flock' and leave it to > > glibc to pick correct syscall and command. > > > > I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary) > > and i386 kernel by running 100 iterations. > > > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > > Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> > Thanks Li & Naresh. Pushed. Regards, Jan
diff --git a/testcases/kernel/syscalls/fcntl/fcntl36.c b/testcases/kernel/syscalls/fcntl/fcntl36.c index 3246d13892cd..81bd5a647e4c 100644 --- a/testcases/kernel/syscalls/fcntl/fcntl36.c +++ b/testcases/kernel/syscalls/fcntl/fcntl36.c @@ -127,7 +127,7 @@ static void *fn_posix_w(void *arg) int fd = SAFE_OPEN(fname, O_RDWR); long wt = pa->cnt; - struct flock64 lck = { + struct flock lck = { .l_whence = SEEK_SET, .l_start = pa->offset, .l_len = pa->length, @@ -227,7 +227,7 @@ static void *fn_posix_r(void *arg) int i; int fd = SAFE_OPEN(fname, O_RDWR); - struct flock64 lck = { + struct flock lck = { .l_whence = SEEK_SET, .l_start = pa->offset, .l_len = pa->length,
fcntl36 testcase has been observed to sporadically fail, when running 32-bit user-space on 64-bit kernel. Strace shows that region length is 0 for some commands: fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120, l_len=0} This is because testcase is passing struct flock64, but command is F_SETLK, not F_SETLK64. So, kernel treats argument for POSIX command as 32-bit "struct flock": static long do_compat_fcntl64(unsigned int fd, unsigned int cmd, ... case F_SETLK: case F_SETLKW: err = get_compat_flock(&flock, compat_ptr(arg)); ... in contrast to F_OFD_* commands, where argument is treated as "struct flock64": case F_SETLK64: case F_SETLKW64: case F_OFD_SETLK: case F_OFD_SETLKW: err = get_compat_flock64(&flock, compat_ptr(arg)); ... Switch argument of POSIX commands to 'struct flock' and leave it to glibc to pick correct syscall and command. I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary) and i386 kernel by running 100 iterations. Signed-off-by: Jan Stancek <jstancek@redhat.com> --- No kernel bug, problem was found in my reproducer. Can anyone confirm this fixes 32-bit arm too? testcases/kernel/syscalls/fcntl/fcntl36.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)