Message ID | 874izmtu4w.fsf@oldenburg.str.redhat.com |
---|---|
State | New |
Headers | show |
Series | Linux: Inhibit tail calls in cancellable system calls | expand |
On 3/21/25 9:01 AM, Florian Weimer wrote: > Without this change, the system call wrapper function is not visible > on the stack at the time of the system call, which causes problems > for interception tools such as valgrind. > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix > Race conditions in pthread cancellation [BZ#12683]"). > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. > (We're still discussing if valgrind needs this, but if it does, here's a > patch.) We discussed this on the weekly patch queue review. Glad to see the Valgrind discussion is ongoing. This has cross-architecture implications, and can add extra function calls. I'm marking the patch in patchwork as changes requested since we should review and discuss Adhemerval's suggestion.
Hi Florian, On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: > Without this change, the system call wrapper function is not visible > on the stack at the time of the system call, which causes problems > for interception tools such as valgrind. > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix > Race conditions in pthread cancellation [BZ#12683]"). > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. > (We're still discussing if valgrind needs this, but if it does, here's a > patch.) I implemented the valgrind part of skipping the syscall_cancel frames here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 And there is a valgrind package build for fedora rawhide: https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 For ppc64le, s390x and x86_64 that patch seems enough. For i686 and aarch64 there does seem to be an issue with missing the glibc calling function because of a tail call. Also on i686 there is another extra frame on top __libc_do_syscall. I haven't yet tried on a system with a glibc with this patch applied. Cheers, Mark > --- > sysdeps/unix/sysdep.h | 57 ++++++++++++++++++++++++++++++++------------------- > 1 file changed, 36 insertions(+), 21 deletions(-) > > diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h > index 2cc98725c3..8f1e4fc4f9 100644 > --- a/sysdeps/unix/sysdep.h > +++ b/sysdeps/unix/sysdep.h > @@ -166,30 +166,45 @@ long int __syscall_cancel (__syscall_arg_t arg1, __syscall_arg_t arg2, > __SYSCALL_CANCEL7_ARG_DEF > __syscall_arg_t nr) attribute_hidden; > > +/* Inhibit tail call optimization, so that the stack frame of the > + system-call-implementing function is visible at the time of the > + system call. */ > +#define __syscall_cancel_barrier(...) \ > + ({ \ > + long int __syscall_cancel_result = __syscall_cancel (__VA_ARGS__); \ > + asm volatile ("" ::: "memory"); \ > + __syscall_cancel_result; \ > + }) > + > #define __SYSCALL_CANCEL0(name) \ > - __syscall_cancel (0, 0, 0, 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name) > + __syscall_cancel_barrier (0, 0, 0, 0, 0, 0, \ > + __SYSCALL_CANCEL7_ARG __NR_##name) > #define __SYSCALL_CANCEL1(name, a1) \ > - __syscall_cancel (__SSC (a1), 0, 0, 0, 0, 0, \ > - __SYSCALL_CANCEL7_ARG __NR_##name) > -#define __SYSCALL_CANCEL2(name, a1, a2) \ > - __syscall_cancel (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \ > - __SYSCALL_CANCEL7_ARG __NR_##name) > -#define __SYSCALL_CANCEL3(name, a1, a2, a3) \ > - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), 0, 0, 0, \ > - __SYSCALL_CANCEL7_ARG __NR_##name) > -#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \ > - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), \ > - __SSC(a4), 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name) > -#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \ > - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC(a4), \ > - __SSC (a5), 0, __SYSCALL_CANCEL7_ARG __NR_##name) > -#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \ > - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \ > - __SSC (a5), __SSC (a6), __SYSCALL_CANCEL7_ARG \ > - __NR_##name) > + __syscall_cancel_barrier (__SSC (a1), 0, 0, 0, 0, 0, \ > + __SYSCALL_CANCEL7_ARG __NR_##name) > +#define __SYSCALL_CANCEL2(name, a1, a2) \ > + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \ > + __SYSCALL_CANCEL7_ARG __NR_##name) > +#define __SYSCALL_CANCEL3(name, a1, a2, a3) \ > + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ > + 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name) > +#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \ > + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ > + __SSC(a4), 0, 0, \ > + __SYSCALL_CANCEL7_ARG __NR_##name) > +#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \ > + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ > + __SSC(a4), __SSC (a5), 0, \ > + __SYSCALL_CANCEL7_ARG __NR_##name) > +#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \ > + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ > + __SSC (a4), __SSC (a5), __SSC (a6), \ > + __SYSCALL_CANCEL7_ARG \ > + __NR_##name) > #define __SYSCALL_CANCEL7(name, a1, a2, a3, a4, a5, a6, a7) \ > - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \ > - __SSC (a5), __SSC (a6), __SSC (a7), __NR_##name) > + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ > + __SSC (a4), __SSC (a5), __SSC (a6), \ > + __SSC (a7), __NR_##name) > > #define __SYSCALL_CANCEL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n > #define __SYSCALL_CANCEL_NARGS(...) \ > > base-commit: 3e2be87832781a29ed67f38f87c1ce3dd4c1b866 >
Hi, On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote: > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: > > Without this change, the system call wrapper function is not visible > > on the stack at the time of the system call, which causes problems > > for interception tools such as valgrind. > > > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix > > Race conditions in pthread cancellation [BZ#12683]"). > > > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. > > (We're still discussing if valgrind needs this, but if it does, here's a > > patch.) > > I implemented the valgrind part of skipping the syscall_cancel frames > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 > And there is a valgrind package build for fedora rawhide: > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 > > For ppc64le, s390x and x86_64 that patch seems enough. > > For i686 and aarch64 there does seem to be an issue with missing the > glibc calling function because of a tail call. > > Also on i686 there is another extra frame on top __libc_do_syscall. I extended the patch to cover some extra sycall wrapper function symbols on i386 and armhf and pushed it to valgrind trunk and VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and f42. This does seem to show that only on arm64 the tail calls obscure observing the full call stack. Cheers, Mark
Hi, On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote: > On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote: > > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: > > > Without this change, the system call wrapper function is not visible > > > on the stack at the time of the system call, which causes problems > > > for interception tools such as valgrind. > > > > > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix > > > Race conditions in pthread cancellation [BZ#12683]"). > > > > > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. > > > (We're still discussing if valgrind needs this, but if it does, here's a > > > patch.) > > > > I implemented the valgrind part of skipping the syscall_cancel frames > > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 > > And there is a valgrind package build for fedora rawhide: > > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 > > > > For ppc64le, s390x and x86_64 that patch seems enough. > > > > For i686 and aarch64 there does seem to be an issue with missing the > > glibc calling function because of a tail call. > > > > Also on i686 there is another extra frame on top __libc_do_syscall. > > I extended the patch to cover some extra sycall wrapper function > symbols on i386 and armhf and pushed it to valgrind trunk and > VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and > f42. This does seem to show that only on arm64 the tail calls > obscure observing the full call stack. This has now landed in fedora rawhide and f42. Test results look good, except for some if the arm64 tests where the tail calls obscure observing the full call stack. Please let me know if you need any more input from us to get this fix in glibc. Cheers, Mark
Hi, On Sun, Apr 06, 2025 at 03:23:54PM +0200, Mark Wielaard wrote: > On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote: > > On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote: > > > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: > > > > Without this change, the system call wrapper function is not visible > > > > on the stack at the time of the system call, which causes problems > > > > for interception tools such as valgrind. > > > > > > > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix > > > > Race conditions in pthread cancellation [BZ#12683]"). > > > > > > > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. > > > > (We're still discussing if valgrind needs this, but if it does, here's a > > > > patch.) > > > > > > I implemented the valgrind part of skipping the syscall_cancel frames > > > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 > > > And there is a valgrind package build for fedora rawhide: > > > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 > > > > > > For ppc64le, s390x and x86_64 that patch seems enough. > > > > > > For i686 and aarch64 there does seem to be an issue with missing the > > > glibc calling function because of a tail call. > > > > > > Also on i686 there is another extra frame on top __libc_do_syscall. > > > > I extended the patch to cover some extra sycall wrapper function > > symbols on i386 and armhf and pushed it to valgrind trunk and > > VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and > > f42. This does seem to show that only on arm64 the tail calls > > obscure observing the full call stack. > > This has now landed in fedora rawhide and f42. Test results look good, > except for some if the arm64 tests where the tail calls obscure > observing the full call stack. Please let me know if you need any more > input from us to get this fix in glibc. Please let me know. Valgrind test results for syscall backtraces on anything except arm64 look good. We are working on valgrind 3.25.0 now, to be released around April 24. Thanks, Mark
Hi all, On Mon, 2025-04-14 at 14:06 +0200, Mark Wielaard wrote: > On Sun, Apr 06, 2025 at 03:23:54PM +0200, Mark Wielaard wrote: > > On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote: > > > On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote: > > > > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: > > > > > Without this change, the system call wrapper function is not visible > > > > > on the stack at the time of the system call, which causes problems > > > > > for interception tools such as valgrind. > > > > > > > > > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix > > > > > Race conditions in pthread cancellation [BZ#12683]"). > > > > > > > > > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. > > > > > (We're still discussing if valgrind needs this, but if it does, here's a > > > > > patch.) > > > > > > > > I implemented the valgrind part of skipping the syscall_cancel frames > > > > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 > > > > And there is a valgrind package build for fedora rawhide: > > > > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 > > > > > > > > For ppc64le, s390x and x86_64 that patch seems enough. > > > > > > > > For i686 and aarch64 there does seem to be an issue with missing the > > > > glibc calling function because of a tail call. > > > > > > > > Also on i686 there is another extra frame on top __libc_do_syscall. > > > > > > I extended the patch to cover some extra sycall wrapper function > > > symbols on i386 and armhf and pushed it to valgrind trunk and > > > VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and > > > f42. This does seem to show that only on arm64 the tail calls > > > obscure observing the full call stack. > > > > This has now landed in fedora rawhide and f42. Test results look good, > > except for some if the arm64 tests where the tail calls obscure > > observing the full call stack. Please let me know if you need any more > > input from us to get this fix in glibc. > > Please let me know. Valgrind test results for syscall backtraces on > anything except arm64 look good. We are working on valgrind 3.25.0 > now, to be released around April 24. valgrind 3.25.0-RC1 has been released and test results look good on most arches. arm64 does show the issue described above where the tail calls obscure observing the full call stack when doing system calls. Let me know what would be needed to get the above patch reviewed. Thanks, Mark
Hi all, On Fri, Apr 18, 2025 at 05:46:54PM +0200, Mark Wielaard wrote: > On Mon, 2025-04-14 at 14:06 +0200, Mark Wielaard wrote: > > On Sun, Apr 06, 2025 at 03:23:54PM +0200, Mark Wielaard wrote: > > > On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote: > > > > On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote: > > > > > On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: > > > > > > Without this change, the system call wrapper function is not visible > > > > > > on the stack at the time of the system call, which causes problems > > > > > > for interception tools such as valgrind. > > > > > > > > > > > > Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix > > > > > > Race conditions in pthread cancellation [BZ#12683]"). > > > > > > > > > > > > Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. > > > > > > (We're still discussing if valgrind needs this, but if it does, here's a > > > > > > patch.) > > > > > > > > > > I implemented the valgrind part of skipping the syscall_cancel frames > > > > > here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 > > > > > And there is a valgrind package build for fedora rawhide: > > > > > https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 > > > > > > > > > > For ppc64le, s390x and x86_64 that patch seems enough. > > > > > > > > > > For i686 and aarch64 there does seem to be an issue with missing the > > > > > glibc calling function because of a tail call. > > > > > > > > > > Also on i686 there is another extra frame on top __libc_do_syscall. > > > > > > > > I extended the patch to cover some extra sycall wrapper function > > > > symbols on i386 and armhf and pushed it to valgrind trunk and > > > > VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and > > > > f42. This does seem to show that only on arm64 the tail calls > > > > obscure observing the full call stack. > > > > > > This has now landed in fedora rawhide and f42. Test results look good, > > > except for some if the arm64 tests where the tail calls obscure > > > observing the full call stack. Please let me know if you need any more > > > input from us to get this fix in glibc. > > > > Please let me know. Valgrind test results for syscall backtraces on > > anything except arm64 look good. We are working on valgrind 3.25.0 > > now, to be released around April 24. > > valgrind 3.25.0-RC1 has been released and test results look good on > most arches. arm64 does show the issue described above where the tail > calls obscure observing the full call stack when doing system calls. valgrind 3.25.0 have been released and is now in Fedora rawhide and Fedora 42 with the new glibc syscall_cancel frames. The tail calls on aarch64 still seem to be a problem for observability of the syscall call stack. > Let me know what would be needed to get the above patch reviewed. Thanks, Mark
On 25/04/25 19:06, Mark Wielaard wrote: > Hi all, > > On Fri, Apr 18, 2025 at 05:46:54PM +0200, Mark Wielaard wrote: >> On Mon, 2025-04-14 at 14:06 +0200, Mark Wielaard wrote: >>> On Sun, Apr 06, 2025 at 03:23:54PM +0200, Mark Wielaard wrote: >>>> On Mon, Mar 31, 2025 at 11:29:41AM +0200, Mark Wielaard wrote: >>>>> On Fri, Mar 28, 2025 at 07:02:28PM +0100, Mark Wielaard wrote: >>>>>> On Fri, 2025-03-21 at 14:01 +0100, Florian Weimer wrote: >>>>>>> Without this change, the system call wrapper function is not visible >>>>>>> on the stack at the time of the system call, which causes problems >>>>>>> for interception tools such as valgrind. >>>>>>> >>>>>>> Enhances commit 89b53077d2a58f00e7debdfe58afabe953dac60d ("nptl: Fix >>>>>>> Race conditions in pthread cancellation [BZ#12683]"). >>>>>>> >>>>>>> Tested on i686-linux-gnu, powerpc64le-linux-gnu, x86_64-linux-gnu. >>>>>>> (We're still discussing if valgrind needs this, but if it does, here's a >>>>>>> patch.) >>>>>> >>>>>> I implemented the valgrind part of skipping the syscall_cancel frames >>>>>> here: https://bugs.kde.org/show_bug.cgi?id=502126#c2 >>>>>> And there is a valgrind package build for fedora rawhide: >>>>>> https://koji.fedoraproject.org/koji/buildinfo?buildID=2687393 >>>>>> >>>>>> For ppc64le, s390x and x86_64 that patch seems enough. >>>>>> >>>>>> For i686 and aarch64 there does seem to be an issue with missing the >>>>>> glibc calling function because of a tail call. >>>>>> >>>>>> Also on i686 there is another extra frame on top __libc_do_syscall. >>>>> >>>>> I extended the patch to cover some extra sycall wrapper function >>>>> symbols on i386 and armhf and pushed it to valgrind trunk and >>>>> VALGRIND_3_24_BRANCH. There are builds for fedora rawhide and >>>>> f42. This does seem to show that only on arm64 the tail calls >>>>> obscure observing the full call stack. >>>> >>>> This has now landed in fedora rawhide and f42. Test results look good, >>>> except for some if the arm64 tests where the tail calls obscure >>>> observing the full call stack. Please let me know if you need any more >>>> input from us to get this fix in glibc. >>> >>> Please let me know. Valgrind test results for syscall backtraces on >>> anything except arm64 look good. We are working on valgrind 3.25.0 >>> now, to be released around April 24. >> >> valgrind 3.25.0-RC1 has been released and test results look good on >> most arches. arm64 does show the issue described above where the tail >> calls obscure observing the full call stack when doing system calls. > > valgrind 3.25.0 have been released and is now in Fedora rawhide and > Fedora 42 with the new glibc syscall_cancel frames. The tail calls on > aarch64 still seem to be a problem for observability of the syscall > call stack. > I am trying to check if patch to inline the cancellation wrappers [1] would help it, but I am not sure how exactly would handle stacktraces that should be artificial and only represented for debug information. With the patch applied, both x86_64 and aarch64 should inline the syscall_cancel and internal_syscall_cancel call, only required an extra __syscall_cancel_arch call for the case when the process it multithread. On x86_64 it does shows as expected: valgrind-git (x86_64)$ ./coregrind/valgrind memcheck/tests/sendmsg [...] ==2131145== Syscall param sendmsg(msg) points to uninitialised byte(s) ==2131145== at 0x4972EE0: syscall_cancel (sysdep-cancel.h:83) ==2131145== by 0x4972EE0: sendmsg (sendmsg.c:28) ==2131145== by 0x4001332: main (sendmsg.c:46) ==2131145== Address 0x1ffefff850 is on thread 1's stack ==2131145== in frame #1, created by main (sendmsg.c:13) [...] But on aarch64 it shows internal_syscall_cancel, which is indeed inlined: valgrind-git (aarch64)$ ./coregrind/valgrind memcheck/tests/sendmsg [...] ==483437== Syscall param sendmsg(msg) points to uninitialised byte(s) ==483437== at 0x49D250C: internal_syscall_cancel (sysdep-cancel.h:44) ==483437== by 0x49D250C: syscall_cancel (sysdep-cancel.h:79) ==483437== by 0x49D250C: sendmsg (sendmsg.c:28) ==483437== by 0x4000B4B: main (sendmsg.c:46) ==483437== Address 0x1ffefffaf8 is on thread 1's stack ==483437== in frame #1, created by main (sendmsg.c:13) I am not sure if valgrind consider this an error, nor if it should be valgrind or compiler to handle this correctly. I am not aware of any attribute if can properly used to 'hide' internal_syscall_cancel in this case, or even if it makes sense. [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/cancel-wrappers-inline
Hi Adhemerval, On Fri, 2025-05-09 at 11:08 -0300, Adhemerval Zanella Netto wrote: > On 25/04/25 19:06, Mark Wielaard wrote: > > valgrind 3.25.0 have been released and is now in Fedora rawhide and > > Fedora 42 with the new glibc syscall_cancel frames. The tail calls on > > aarch64 still seem to be a problem for observability of the syscall > > call stack. > > I am trying to check if patch to inline the cancellation wrappers [1] > would help it, but I am not sure how exactly would handle stacktraces > that should be artificial and only represented for debug information. > > With the patch applied, both x86_64 and aarch64 should inline the > syscall_cancel and internal_syscall_cancel call, only required an > extra __syscall_cancel_arch call for the case when the process it > multithread. > > On x86_64 it does shows as expected: > > valgrind-git (x86_64)$ ./coregrind/valgrind memcheck/tests/sendmsg > [...] > ==2131145== Syscall param sendmsg(msg) points to uninitialised byte(s) > ==2131145== at 0x4972EE0: syscall_cancel (sysdep-cancel.h:83) > ==2131145== by 0x4972EE0: sendmsg (sendmsg.c:28) > ==2131145== by 0x4001332: main (sendmsg.c:46) > ==2131145== Address 0x1ffefff850 is on thread 1's stack > ==2131145== in frame #1, created by main (sendmsg.c:13) > [...] > > But on aarch64 it shows internal_syscall_cancel, which is indeed > inlined: > > valgrind-git (aarch64)$ ./coregrind/valgrind memcheck/tests/sendmsg > [...] > ==483437== Syscall param sendmsg(msg) points to uninitialised byte(s) > ==483437== at 0x49D250C: internal_syscall_cancel (sysdep-cancel.h:44) > ==483437== by 0x49D250C: syscall_cancel (sysdep-cancel.h:79) > ==483437== by 0x49D250C: sendmsg (sendmsg.c:28) > ==483437== by 0x4000B4B: main (sendmsg.c:46) > ==483437== Address 0x1ffefffaf8 is on thread 1's stack > ==483437== in frame #1, created by main (sendmsg.c:13) > > I am not sure if valgrind consider this an error, nor if it should be > valgrind or compiler to handle this correctly. I am not aware of any > attribute if can properly used to 'hide' internal_syscall_cancel in > this case, or even if it makes sense. Interesting. I would have assumed valgrind would have stripped out the syscall_cancel calls. But it looks like we assume they always have a __ prefix (__syscall_cancel_arch, __internal_syscall_cancel and __syscall_cancel). Apparently with glibc debuginfo installed it picks up the inlined DWARF name. But we can filter those out too. I think the above would work without glibc debuginfo installed because then valgrind won't even know/see the inlined names. Just to be sure, you are using the current git version of valgrind? And you haven't explicitly installed that version? If so, could you try the above with: ./vg-in-place memcheck/tests/sendmsg vg-in-place makes sure the just compiled tools are used. Thanks, Mark
On 09/05/25 11:24, Mark Wielaard wrote: > Hi Adhemerval, > > On Fri, 2025-05-09 at 11:08 -0300, Adhemerval Zanella Netto wrote: >> On 25/04/25 19:06, Mark Wielaard wrote: >>> valgrind 3.25.0 have been released and is now in Fedora rawhide and >>> Fedora 42 with the new glibc syscall_cancel frames. The tail calls on >>> aarch64 still seem to be a problem for observability of the syscall >>> call stack. >> >> I am trying to check if patch to inline the cancellation wrappers [1] >> would help it, but I am not sure how exactly would handle stacktraces >> that should be artificial and only represented for debug information. >> >> With the patch applied, both x86_64 and aarch64 should inline the >> syscall_cancel and internal_syscall_cancel call, only required an >> extra __syscall_cancel_arch call for the case when the process it >> multithread. >> >> On x86_64 it does shows as expected: >> >> valgrind-git (x86_64)$ ./coregrind/valgrind memcheck/tests/sendmsg >> [...] >> ==2131145== Syscall param sendmsg(msg) points to uninitialised byte(s) >> ==2131145== at 0x4972EE0: syscall_cancel (sysdep-cancel.h:83) >> ==2131145== by 0x4972EE0: sendmsg (sendmsg.c:28) >> ==2131145== by 0x4001332: main (sendmsg.c:46) >> ==2131145== Address 0x1ffefff850 is on thread 1's stack >> ==2131145== in frame #1, created by main (sendmsg.c:13) >> [...] >> >> But on aarch64 it shows internal_syscall_cancel, which is indeed >> inlined: >> >> valgrind-git (aarch64)$ ./coregrind/valgrind memcheck/tests/sendmsg >> [...] >> ==483437== Syscall param sendmsg(msg) points to uninitialised byte(s) >> ==483437== at 0x49D250C: internal_syscall_cancel (sysdep-cancel.h:44) >> ==483437== by 0x49D250C: syscall_cancel (sysdep-cancel.h:79) >> ==483437== by 0x49D250C: sendmsg (sendmsg.c:28) >> ==483437== by 0x4000B4B: main (sendmsg.c:46) >> ==483437== Address 0x1ffefffaf8 is on thread 1's stack >> ==483437== in frame #1, created by main (sendmsg.c:13) >> >> I am not sure if valgrind consider this an error, nor if it should be >> valgrind or compiler to handle this correctly. I am not aware of any >> attribute if can properly used to 'hide' internal_syscall_cancel in >> this case, or even if it makes sense. > > Interesting. I would have assumed valgrind would have stripped out the > syscall_cancel calls. But it looks like we assume they always have a __ > prefix (__syscall_cancel_arch, __internal_syscall_cancel and > __syscall_cancel). Apparently with glibc debuginfo installed it picks > up the inlined DWARF name. But we can filter those out too. > > I think the above would work without glibc debuginfo installed because > then valgrind won't even know/see the inlined names. I have tested only with a unstriped glibc build, so I am not sure how it would play with debuginfo. > > Just to be sure, you are using the current git version of valgrind? > And you haven't explicitly installed that version? If so, could you try > the above with: > > ./vg-in-place memcheck/tests/sendmsg > > vg-in-place makes sure the just compiled tools are used. I took care to export VALGRIND_LIB to the proper location, but the results with vg-in-place are the same. > > Thanks, > > Mark
Hi Adhemerval, On Fri, May 09, 2025 at 11:08:42AM -0300, Adhemerval Zanella Netto wrote: > On 25/04/25 19:06, Mark Wielaard wrote: > > valgrind 3.25.0 have been released and is now in Fedora rawhide and > > Fedora 42 with the new glibc syscall_cancel frames. The tail calls on > > aarch64 still seem to be a problem for observability of the syscall > > call stack. > > I am trying to check if patch to inline the cancellation wrappers [1] > would help it, but I am not sure how exactly would handle stacktraces > that should be artificial and only represented for debug information. I got that patch [1] working so I could test it myself (only on x86_64 for now). How do you properly test a glibc installed in a non-default location though? Currently I am doing something like: export LD_LIBRARY_PATH=/usr/local/glibc/lib /usr/local/glibc/lib/ld-linux-x86-64.so.2 memcheck/tests/sendmsg This can also work with valgrind in between because it doesn't use glibc dynamically itself. But it is not the easiest way to test things. > With the patch applied, both x86_64 and aarch64 should inline the > syscall_cancel and internal_syscall_cancel call, only required an > extra __syscall_cancel_arch call for the case when the process it > multithread. > > On x86_64 it does shows as expected: > > valgrind-git (x86_64)$ ./coregrind/valgrind memcheck/tests/sendmsg > [...] > ==2131145== Syscall param sendmsg(msg) points to uninitialised byte(s) > ==2131145== at 0x4972EE0: syscall_cancel (sysdep-cancel.h:83) > ==2131145== by 0x4972EE0: sendmsg (sendmsg.c:28) > ==2131145== by 0x4001332: main (sendmsg.c:46) > ==2131145== Address 0x1ffefff850 is on thread 1's stack > ==2131145== in frame #1, created by main (sendmsg.c:13) > [...] > > But on aarch64 it shows internal_syscall_cancel, which is indeed > inlined: > > valgrind-git (aarch64)$ ./coregrind/valgrind memcheck/tests/sendmsg > [...] > ==483437== Syscall param sendmsg(msg) points to uninitialised byte(s) > ==483437== at 0x49D250C: internal_syscall_cancel (sysdep-cancel.h:44) > ==483437== by 0x49D250C: syscall_cancel (sysdep-cancel.h:79) > ==483437== by 0x49D250C: sendmsg (sendmsg.c:28) > ==483437== by 0x4000B4B: main (sendmsg.c:46) > ==483437== Address 0x1ffefffaf8 is on thread 1's stack > ==483437== in frame #1, created by main (sendmsg.c:13) > > I am not sure if valgrind consider this an error, nor if it should be > valgrind or compiler to handle this correctly. I am not aware of any > attribute if can properly used to 'hide' internal_syscall_cancel in > this case, or even if it makes sense. It isn't an "error" but any extra frames (either "real" or inlined) on top of the function that does the actual system call causes existing suppressions and test wrappers to no longer work. So valgrind is responsible for filtering them out. For actual extra frames we now do (since 3.25.0). If the extra syscall frame wrappers are inlined they don't show up if no debuginfo/DWARF is available for glibc. But if it is available we'll have to filter them out and/or don't look them up for the top-level syscalls. Selectively removing these inlined calls is a little fragile at the moment in valgrind, since it is done at a later time than capturing the backtrace addresses. So either we have to pass through that we are handling a syscall at the moment and so don't want to "expand" the top-level frame with any inlines. And/Or we match on the "magic" inlined syscall cancel frames (everywhere). You could use __attribute__(__artificial__)) which should mark the function with DW_AT_artificial. Valgrind doesn't know about this attribute yet, but we could probably with some extra work. Thanks, Mark > [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/cancel-wrappers-inline
On 12/05/25 12:12, Mark Wielaard wrote: > Hi Adhemerval, > > On Fri, May 09, 2025 at 11:08:42AM -0300, Adhemerval Zanella Netto wrote: >> On 25/04/25 19:06, Mark Wielaard wrote: >>> valgrind 3.25.0 have been released and is now in Fedora rawhide and >>> Fedora 42 with the new glibc syscall_cancel frames. The tail calls on >>> aarch64 still seem to be a problem for observability of the syscall >>> call stack. >> >> I am trying to check if patch to inline the cancellation wrappers [1] >> would help it, but I am not sure how exactly would handle stacktraces >> that should be artificial and only represented for debug information. > > I got that patch [1] working so I could test it myself (only on x86_64 > for now). How do you properly test a glibc installed in a non-default > location though? Currently I am doing something like: > > export LD_LIBRARY_PATH=/usr/local/glibc/lib > /usr/local/glibc/lib/ld-linux-x86-64.so.2 memcheck/tests/sendmsg > > This can also work with valgrind in between because it doesn't use > glibc dynamically itself. But it is not the easiest way to test > things. I used -Wl,-dynamic-linker and -Wl,-rpath instead at configure: --enable-only64bit \ LDFLAGS="-Wl,--dynamic-linker,/path/to/glibc-build/testroot.pristine/lib/ld-linux-aarch64.so.1 \ -Wl,--rpath,/path/to/glibc-build/testroot.pristine/lib64" It won't use glibc startup object (*ct*.o), neither the headers; but using a recent systems it should not be a problem (the difference should be minimal). > >> With the patch applied, both x86_64 and aarch64 should inline the >> syscall_cancel and internal_syscall_cancel call, only required an >> extra __syscall_cancel_arch call for the case when the process it >> multithread. >> >> On x86_64 it does shows as expected: >> >> valgrind-git (x86_64)$ ./coregrind/valgrind memcheck/tests/sendmsg >> [...] >> ==2131145== Syscall param sendmsg(msg) points to uninitialised byte(s) >> ==2131145== at 0x4972EE0: syscall_cancel (sysdep-cancel.h:83) >> ==2131145== by 0x4972EE0: sendmsg (sendmsg.c:28) >> ==2131145== by 0x4001332: main (sendmsg.c:46) >> ==2131145== Address 0x1ffefff850 is on thread 1's stack >> ==2131145== in frame #1, created by main (sendmsg.c:13) >> [...] >> >> But on aarch64 it shows internal_syscall_cancel, which is indeed >> inlined: >> >> valgrind-git (aarch64)$ ./coregrind/valgrind memcheck/tests/sendmsg >> [...] >> ==483437== Syscall param sendmsg(msg) points to uninitialised byte(s) >> ==483437== at 0x49D250C: internal_syscall_cancel (sysdep-cancel.h:44) >> ==483437== by 0x49D250C: syscall_cancel (sysdep-cancel.h:79) >> ==483437== by 0x49D250C: sendmsg (sendmsg.c:28) >> ==483437== by 0x4000B4B: main (sendmsg.c:46) >> ==483437== Address 0x1ffefffaf8 is on thread 1's stack >> ==483437== in frame #1, created by main (sendmsg.c:13) >> >> I am not sure if valgrind consider this an error, nor if it should be >> valgrind or compiler to handle this correctly. I am not aware of any >> attribute if can properly used to 'hide' internal_syscall_cancel in >> this case, or even if it makes sense. > > It isn't an "error" but any extra frames (either "real" or inlined) on > top of the function that does the actual system call causes existing > suppressions and test wrappers to no longer work. So valgrind is > responsible for filtering them out. For actual extra frames we now do > (since 3.25.0). If the extra syscall frame wrappers are inlined they > don't show up if no debuginfo/DWARF is available for glibc. But if it > is available we'll have to filter them out and/or don't look them up > for the top-level syscalls. > > Selectively removing these inlined calls is a little fragile at the > moment in valgrind, since it is done at a later time than capturing > the backtrace addresses. So either we have to pass through that we are > handling a syscall at the moment and so don't want to "expand" the > top-level frame with any inlines. And/Or we match on the "magic" > inlined syscall cancel frames (everywhere). If I understood correctly these seems to be an issue only for valgrind regression tests, right? If so, could this be handled solely on test validation? Or does valgrind aims to hide such inlined calls on stacktrace reports as well? > > You could use __attribute__(__artificial__)) which should mark the > function with DW_AT_artificial. Valgrind doesn't know about this > attribute yet, but we could probably with some extra work. Yeah, this is the first thing I tried to see if valgrind could handle it. Doing this on glibc should be easy enough if you think it would be way to handle it on valgrind. > > Thanks, > > Mark > >> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/cancel-wrappers-inline
Hi Adhemerval, On Tue, May 13, 2025 at 09:22:21AM +0100, Adhemerval Zanella Netto wrote: > On 12/05/25 12:12, Mark Wielaard wrote: > >> I am trying to check if patch to inline the cancellation wrappers [1] > >> would help it, but I am not sure how exactly would handle stacktraces > >> that should be artificial and only represented for debug information. > > > > I got that patch [1] working so I could test it myself (only on x86_64 > > for now). How do you properly test a glibc installed in a non-default > > location though? Currently I am doing something like: > > > > export LD_LIBRARY_PATH=/usr/local/glibc/lib > > /usr/local/glibc/lib/ld-linux-x86-64.so.2 memcheck/tests/sendmsg > > > > This can also work with valgrind in between because it doesn't use > > glibc dynamically itself. But it is not the easiest way to test > > things. > > I used -Wl,-dynamic-linker and -Wl,-rpath instead at configure: > > --enable-only64bit \ > LDFLAGS="-Wl,--dynamic-linker,/path/to/glibc-build/testroot.pristine/lib/ld-linux-aarch64.so.1 \ > -Wl,--rpath,/path/to/glibc-build/testroot.pristine/lib64" > > It won't use glibc startup object (*ct*.o), neither the headers; but > using a recent systems it should not be a problem (the difference > should be minimal). O, thanks, that makes things much more convenient to test. > > It isn't an "error" but any extra frames (either "real" or inlined) on > > top of the function that does the actual system call causes existing > > suppressions and test wrappers to no longer work. So valgrind is > > responsible for filtering them out. For actual extra frames we now do > > (since 3.25.0). If the extra syscall frame wrappers are inlined they > > don't show up if no debuginfo/DWARF is available for glibc. But if it > > is available we'll have to filter them out and/or don't look them up > > for the top-level syscalls. > > > > Selectively removing these inlined calls is a little fragile at the > > moment in valgrind, since it is done at a later time than capturing > > the backtrace addresses. So either we have to pass through that we are > > handling a syscall at the moment and so don't want to "expand" the > > top-level frame with any inlines. And/Or we match on the "magic" > > inlined syscall cancel frames (everywhere). > > If I understood correctly these seems to be an issue only for valgrind > regression tests, right? If so, could this be handled solely on test > validation? > > Or does valgrind aims to hide such inlined calls on stacktrace reports > as well? We do aim to hide these in general since it invalidates any existing suppressions people might be using. In theory we can of course not guarantee these work between glibc versions, but for most glibc syscall wrappers they are surprisingly stable. It also allows people to generate new suppressions that should work across versions. > > You could use __attribute__(__artificial__)) which should mark the > > function with DW_AT_artificial. Valgrind doesn't know about this > > attribute yet, but we could probably with some extra work. > > Yeah, this is the first thing I tried to see if valgrind could handle > it. Doing this on glibc should be easy enough if you think it would > be way to handle it on valgrind. I currently have something working based on matching the function names, but checking for DW_AT_artificial would be nicer. Also because I think in general users are not interested in inlined artificial functions in their stack traces. So it should be a general improvement if we filter those out. I post when I have a patch that does that (there is one tricky issue if the abstract origin ends up in a dwz partial units, but that is completely an internal valgrind DWARF reader issue, nothing to do with glibc). > >> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/cancel-wrappers-inline > Thanks, Mark
diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h index 2cc98725c3..8f1e4fc4f9 100644 --- a/sysdeps/unix/sysdep.h +++ b/sysdeps/unix/sysdep.h @@ -166,30 +166,45 @@ long int __syscall_cancel (__syscall_arg_t arg1, __syscall_arg_t arg2, __SYSCALL_CANCEL7_ARG_DEF __syscall_arg_t nr) attribute_hidden; +/* Inhibit tail call optimization, so that the stack frame of the + system-call-implementing function is visible at the time of the + system call. */ +#define __syscall_cancel_barrier(...) \ + ({ \ + long int __syscall_cancel_result = __syscall_cancel (__VA_ARGS__); \ + asm volatile ("" ::: "memory"); \ + __syscall_cancel_result; \ + }) + #define __SYSCALL_CANCEL0(name) \ - __syscall_cancel (0, 0, 0, 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name) + __syscall_cancel_barrier (0, 0, 0, 0, 0, 0, \ + __SYSCALL_CANCEL7_ARG __NR_##name) #define __SYSCALL_CANCEL1(name, a1) \ - __syscall_cancel (__SSC (a1), 0, 0, 0, 0, 0, \ - __SYSCALL_CANCEL7_ARG __NR_##name) -#define __SYSCALL_CANCEL2(name, a1, a2) \ - __syscall_cancel (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \ - __SYSCALL_CANCEL7_ARG __NR_##name) -#define __SYSCALL_CANCEL3(name, a1, a2, a3) \ - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), 0, 0, 0, \ - __SYSCALL_CANCEL7_ARG __NR_##name) -#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \ - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), \ - __SSC(a4), 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name) -#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \ - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC(a4), \ - __SSC (a5), 0, __SYSCALL_CANCEL7_ARG __NR_##name) -#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \ - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \ - __SSC (a5), __SSC (a6), __SYSCALL_CANCEL7_ARG \ - __NR_##name) + __syscall_cancel_barrier (__SSC (a1), 0, 0, 0, 0, 0, \ + __SYSCALL_CANCEL7_ARG __NR_##name) +#define __SYSCALL_CANCEL2(name, a1, a2) \ + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), 0, 0, 0, 0, \ + __SYSCALL_CANCEL7_ARG __NR_##name) +#define __SYSCALL_CANCEL3(name, a1, a2, a3) \ + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ + 0, 0, 0, __SYSCALL_CANCEL7_ARG __NR_##name) +#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \ + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ + __SSC(a4), 0, 0, \ + __SYSCALL_CANCEL7_ARG __NR_##name) +#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \ + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ + __SSC(a4), __SSC (a5), 0, \ + __SYSCALL_CANCEL7_ARG __NR_##name) +#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \ + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ + __SSC (a4), __SSC (a5), __SSC (a6), \ + __SYSCALL_CANCEL7_ARG \ + __NR_##name) #define __SYSCALL_CANCEL7(name, a1, a2, a3, a4, a5, a6, a7) \ - __syscall_cancel (__SSC (a1), __SSC (a2), __SSC (a3), __SSC (a4), \ - __SSC (a5), __SSC (a6), __SSC (a7), __NR_##name) + __syscall_cancel_barrier (__SSC (a1), __SSC (a2), __SSC (a3), \ + __SSC (a4), __SSC (a5), __SSC (a6), \ + __SSC (a7), __NR_##name) #define __SYSCALL_CANCEL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n #define __SYSCALL_CANCEL_NARGS(...) \