mbox series

[0/3] Add the sched_setattr, sched_getattr functions

Message ID cover.1725561027.git.fweimer@redhat.com
Headers show
Series Add the sched_setattr, sched_getattr functions | expand

Message

Florian Weimer Sept. 5, 2024, 6:33 p.m. UTC
These are straightforward system call wrappers.  Like the existing
sched_* scheduler policy functions, sched_setattr interacts poorly with
PTHREAD_PRIO_PROTECT mutexes.

Documentation mostly refers to the manual pages.

Tested on i686-linux-gnu, x86_64-linux-gnu.  Built with
build-many-glibcs.py.

Thanks,
Florian

Florian Weimer (3):
  manual: Extract the @manpageurl{func,sec} macro
  Linux: Add the sched_setattr and sched_getattr functions
  Linux: Add missing scheduler constants to <sched.h>

 NEWS                                          |   3 +
 manual/macros.texi                            |   7 +-
 manual/resource.texi                          | 103 +++++++++++++++++
 sysdeps/unix/sysv/linux/Makefile              |  13 +++
 sysdeps/unix/sysv/linux/Versions              |   4 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/bits/sched.h          |  58 +++++++++-
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
 .../sysv/linux/microblaze/be/libc.abilist     |   2 +
 .../sysv/linux/microblaze/le/libc.abilist     |   2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
 sysdeps/unix/sysv/linux/sched_getattr.c       |  27 +++++
 sysdeps/unix/sysv/linux/sched_setattr.c       |  26 +++++
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/tst-sched-consts.py   |  56 +++++++++
 sysdeps/unix/sysv/linux/tst-sched_setattr.c   | 106 ++++++++++++++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
 43 files changed, 465 insertions(+), 4 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/sched_getattr.c
 create mode 100644 sysdeps/unix/sysv/linux/sched_setattr.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-sched-consts.py
 create mode 100644 sysdeps/unix/sysv/linux/tst-sched_setattr.c


base-commit: 3e4a01870ef9605ccf6475215a4b32aa86d5d206

Comments

Carlos O'Donell Sept. 5, 2024, 7:50 p.m. UTC | #1
On 9/5/24 2:33 PM, Florian Weimer wrote:
> These are straightforward system call wrappers.  Like the existing
> sched_* scheduler policy functions, sched_setattr interacts poorly with
> PTHREAD_PRIO_PROTECT mutexes.

As noted by Jopseh Myers in 2015, these interfaces should have had wrappers
a long time ago following:
https://sourceware.org/glibc/wiki/Consensus#WIP:_Kernel_syscalls_wrappers

I think these APIs meet the goals outlined there, modulo their poor interaction
with the pthread_* APIs, which is something that can be improved over time.

Without theses APIs the SCHED_DEADLINE support in the kernel has be manually
added to things chatr.

> Documentation mostly refers to the manual pages.
> 
> Tested on i686-linux-gnu, x86_64-linux-gnu.  Built with
> build-many-glibcs.py.
> 
> Thanks,
> Florian
> 
> Florian Weimer (3):
>   manual: Extract the @manpageurl{func,sec} macro
>   Linux: Add the sched_setattr and sched_getattr functions
>   Linux: Add missing scheduler constants to <sched.h>
> 
>  NEWS                                          |   3 +
>  manual/macros.texi                            |   7 +-
>  manual/resource.texi                          | 103 +++++++++++++++++
>  sysdeps/unix/sysv/linux/Makefile              |  13 +++
>  sysdeps/unix/sysv/linux/Versions              |   4 +
>  sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
>  sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
>  sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
>  sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
>  sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
>  sysdeps/unix/sysv/linux/bits/sched.h          |  58 +++++++++-
>  sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
>  sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
>  sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
>  .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
>  .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
>  .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
>  .../sysv/linux/microblaze/be/libc.abilist     |   2 +
>  .../sysv/linux/microblaze/le/libc.abilist     |   2 +
>  .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
>  .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
>  .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
>  .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
>  sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
>  sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
>  .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
>  .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
>  .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
>  .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
>  .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
>  .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
>  .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
>  .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
>  sysdeps/unix/sysv/linux/sched_getattr.c       |  27 +++++
>  sysdeps/unix/sysv/linux/sched_setattr.c       |  26 +++++
>  sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
>  sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
>  .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
>  .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
>  sysdeps/unix/sysv/linux/tst-sched-consts.py   |  56 +++++++++
>  sysdeps/unix/sysv/linux/tst-sched_setattr.c   | 106 ++++++++++++++++++
>  .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
>  .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
>  43 files changed, 465 insertions(+), 4 deletions(-)
>  create mode 100644 sysdeps/unix/sysv/linux/sched_getattr.c
>  create mode 100644 sysdeps/unix/sysv/linux/sched_setattr.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-sched-consts.py
>  create mode 100644 sysdeps/unix/sysv/linux/tst-sched_setattr.c
> 
> 
> base-commit: 3e4a01870ef9605ccf6475215a4b32aa86d5d206
enh Sept. 5, 2024, 8:29 p.m. UTC | #2
On Thu, Sep 5, 2024 at 3:51 PM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 9/5/24 2:33 PM, Florian Weimer wrote:
> > These are straightforward system call wrappers.  Like the existing
> > sched_* scheduler policy functions, sched_setattr interacts poorly with
> > PTHREAD_PRIO_PROTECT mutexes.
>
> As noted by Jopseh Myers in 2015, these interfaces should have had wrappers
> a long time ago following:
> https://sourceware.org/glibc/wiki/Consensus#WIP:_Kernel_syscalls_wrappers
>
> I think these APIs meet the goals outlined there, modulo their poor interaction
> with the pthread_* APIs, which is something that can be improved over time.

do you trust upstream not to break source compatibility again? there
have already been two different versions of this struct with the same
name. or are you just assuming that next time they do it, userspace
has a struct sched_attr_2?

(this -- and the fact that glibc hadn't taken a position -- is why i'd
been avoiding doing anything in bionic.)

btw, from my [machine-readable seccomp] notes, here's a couple of
other similarly old syscalls glibc's missing:

# Since Linux 4.3, not in glibc. Probed for and conditionally used by ART.
int membarrier(int, int) all
int userfaultfd(int) all

oddly musl does have a <sys/membarrier.h> with the former, but i don't
think anyone has the latter yet.

> Without theses APIs the SCHED_DEADLINE support in the kernel has be manually
> added to things chatr.
>
> > Documentation mostly refers to the manual pages.
> >
> > Tested on i686-linux-gnu, x86_64-linux-gnu.  Built with
> > build-many-glibcs.py.
> >
> > Thanks,
> > Florian
> >
> > Florian Weimer (3):
> >   manual: Extract the @manpageurl{func,sec} macro
> >   Linux: Add the sched_setattr and sched_getattr functions
> >   Linux: Add missing scheduler constants to <sched.h>
> >
> >  NEWS                                          |   3 +
> >  manual/macros.texi                            |   7 +-
> >  manual/resource.texi                          | 103 +++++++++++++++++
> >  sysdeps/unix/sysv/linux/Makefile              |  13 +++
> >  sysdeps/unix/sysv/linux/Versions              |   4 +
> >  sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
> >  sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
> >  sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
> >  sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
> >  sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
> >  sysdeps/unix/sysv/linux/bits/sched.h          |  58 +++++++++-
> >  sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
> >  sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
> >  sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
> >  .../sysv/linux/loongarch/lp64/libc.abilist    |   2 +
> >  .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
> >  .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
> >  .../sysv/linux/microblaze/be/libc.abilist     |   2 +
> >  .../sysv/linux/microblaze/le/libc.abilist     |   2 +
> >  .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
> >  .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
> >  .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
> >  .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
> >  sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
> >  sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
> >  .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
> >  .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
> >  .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
> >  .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
> >  .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
> >  .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
> >  .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
> >  .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
> >  sysdeps/unix/sysv/linux/sched_getattr.c       |  27 +++++
> >  sysdeps/unix/sysv/linux/sched_setattr.c       |  26 +++++
> >  sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
> >  sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
> >  .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
> >  .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
> >  sysdeps/unix/sysv/linux/tst-sched-consts.py   |  56 +++++++++
> >  sysdeps/unix/sysv/linux/tst-sched_setattr.c   | 106 ++++++++++++++++++
> >  .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
> >  .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
> >  43 files changed, 465 insertions(+), 4 deletions(-)
> >  create mode 100644 sysdeps/unix/sysv/linux/sched_getattr.c
> >  create mode 100644 sysdeps/unix/sysv/linux/sched_setattr.c
> >  create mode 100644 sysdeps/unix/sysv/linux/tst-sched-consts.py
> >  create mode 100644 sysdeps/unix/sysv/linux/tst-sched_setattr.c
> >
> >
> > base-commit: 3e4a01870ef9605ccf6475215a4b32aa86d5d206
>
> --
> Cheers,
> Carlos.
>
Florian Weimer Sept. 5, 2024, 8:39 p.m. UTC | #3
> On Thu, Sep 5, 2024 at 3:51 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> On 9/5/24 2:33 PM, Florian Weimer wrote:
>> > These are straightforward system call wrappers.  Like the existing
>> > sched_* scheduler policy functions, sched_setattr interacts poorly with
>> > PTHREAD_PRIO_PROTECT mutexes.
>>
>> As noted by Jopseh Myers in 2015, these interfaces should have had wrappers
>> a long time ago following:
>> https://sourceware.org/glibc/wiki/Consensus#WIP:_Kernel_syscalls_wrappers
>>
>> I think these APIs meet the goals outlined there, modulo their poor interaction
>> with the pthread_* APIs, which is something that can be improved over time.
>
> do you trust upstream not to break source compatibility again? there
> have already been two different versions of this struct with the same
> name. or are you just assuming that next time they do it, userspace
> has a struct sched_attr_2?

The v2 patch has some additional explanation how the extensibility is
expected to work:

  [PATCH v2 2/3] Linux: Add the sched_setattr and sched_getattr functions
  <https://inbox.sourceware.org/libc-alpha/1f7f9c80017e91f0af99e1bfd9ec1f59c6cc0da7.1725567796.git.fweimer@redhat.com/>

It's fairly standard as far as such interfaces go, I think.

> btw, from my [machine-readable seccomp] notes, here's a couple of
> other similarly old syscalls glibc's missing:
>
> # Since Linux 4.3, not in glibc. Probed for and conditionally used by ART.
> int membarrier(int, int) all
> int userfaultfd(int) all

Well, membarrier started out with just two arguments, and now it's using
three.  I suspect it's going to cause more trouble for us than
sched_setattr/sched_getattr.

Thanks,
Florian
enh Sept. 5, 2024, 9:14 p.m. UTC | #4
On Thu, Sep 5, 2024 at 4:39 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> > On Thu, Sep 5, 2024 at 3:51 PM Carlos O'Donell <carlos@redhat.com> wrote:
> >>
> >> On 9/5/24 2:33 PM, Florian Weimer wrote:
> >> > These are straightforward system call wrappers.  Like the existing
> >> > sched_* scheduler policy functions, sched_setattr interacts poorly with
> >> > PTHREAD_PRIO_PROTECT mutexes.
> >>
> >> As noted by Jopseh Myers in 2015, these interfaces should have had wrappers
> >> a long time ago following:
> >> https://sourceware.org/glibc/wiki/Consensus#WIP:_Kernel_syscalls_wrappers
> >>
> >> I think these APIs meet the goals outlined there, modulo their poor interaction
> >> with the pthread_* APIs, which is something that can be improved over time.
> >
> > do you trust upstream not to break source compatibility again? there
> > have already been two different versions of this struct with the same
> > name. or are you just assuming that next time they do it, userspace
> > has a struct sched_attr_2?
>
> The v2 patch has some additional explanation how the extensibility is
> expected to work:
>
>   [PATCH v2 2/3] Linux: Add the sched_setattr and sched_getattr functions
>   <https://inbox.sourceware.org/libc-alpha/1f7f9c80017e91f0af99e1bfd9ec1f59c6cc0da7.1725567796.git.fweimer@redhat.com/>
>
> It's fairly standard as far as such interfaces go, I think.

oh, interesting ... do you have example precedent there? (in
particular i'm wondering whether i'm just blanking on other places
where this has happened in bionic, or whether the most recent examples
pre-date Android/only affected architectures Android doesn't support?)

> > btw, from my [machine-readable seccomp] notes, here's a couple of
> > other similarly old syscalls glibc's missing:
> >
> > # Since Linux 4.3, not in glibc. Probed for and conditionally used by ART.
> > int membarrier(int, int) all
> > int userfaultfd(int) all
>
> Well, membarrier started out with just two arguments, and now it's using
> three.

even more weird then that musl has the wrapper!

> I suspect it's going to cause more trouble for us than
> sched_setattr/sched_getattr.

maybe. i should perhaps have said that i _started_ doing the
sched_setattr()/sched_getattr() work in bionic but found that it broke
a bunch of stuff. iirc it was specifically the combination of
"everyone has had to polyfill this" and "...but there have been two
versions" that made it a pain.

> Thanks,
> Florian
>
Florian Weimer Sept. 6, 2024, 12:23 p.m. UTC | #5
> On Thu, Sep 5, 2024 at 4:39 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> > On Thu, Sep 5, 2024 at 3:51 PM Carlos O'Donell <carlos@redhat.com> wrote:
>> >>
>> >> On 9/5/24 2:33 PM, Florian Weimer wrote:
>> >> > These are straightforward system call wrappers.  Like the existing
>> >> > sched_* scheduler policy functions, sched_setattr interacts poorly with
>> >> > PTHREAD_PRIO_PROTECT mutexes.
>> >>
>> >> As noted by Jopseh Myers in 2015, these interfaces should have had wrappers
>> >> a long time ago following:
>> >> https://sourceware.org/glibc/wiki/Consensus#WIP:_Kernel_syscalls_wrappers
>> >>
>> >> I think these APIs meet the goals outlined there, modulo their poor interaction
>> >> with the pthread_* APIs, which is something that can be improved over time.
>> >
>> > do you trust upstream not to break source compatibility again? there
>> > have already been two different versions of this struct with the same
>> > name. or are you just assuming that next time they do it, userspace
>> > has a struct sched_attr_2?
>>
>> The v2 patch has some additional explanation how the extensibility is
>> expected to work:
>>
>>   [PATCH v2 2/3] Linux: Add the sched_setattr and sched_getattr functions
>>   <https://inbox.sourceware.org/libc-alpha/1f7f9c80017e91f0af99e1bfd9ec1f59c6cc0da7.1725567796.git.fweimer@redhat.com/>
>>
>> It's fairly standard as far as such interfaces go, I think.
>
> oh, interesting ... do you have example precedent there?

Isn't it widely used in Windows?  See the dwSize member here for a
random example:

<https://learn.microsoft.com/en-us/windows/win32/api/tlhelp32/ns-tlhelp32-processentry32>

Aleksa Sarai's recent series provides a convenient reference of similar
exensible interfaces used in the Linux UAPI:

  [PATCH RFC v2 00/10] extensible syscalls: CHECK_FIELDS to allow for
  easier feature detection
  <https://lore.kernel.org/linux-api/20240906-extensible-structs-check_fields-v2-0-0f46d2de9bad@cyphar.com/>

It's also similar to the tag-length-value approach used by inotify, FUSE
and others.

> (in particular i'm wondering whether i'm just blanking on other places
> where this has happened in bionic, or whether the most recent examples
> pre-date Android/only affected architectures Android doesn't support?)

The obvious one is struct statx (missing from Aleksa's series for some
reason), although currently the padding fields are being used up.  If
you have your own definition of struct statx, it still needs regular
maintenance.

With the arrival of __has_include, in glibc, we generally try to get
these type definitions from kernel headers.

> maybe. i should perhaps have said that i _started_ doing the
> sched_setattr()/sched_getattr() work in bionic but found that it broke
> a bunch of stuff. iirc it was specifically the combination of
> "everyone has had to polyfill this" and "...but there have been two
> versions" that made it a pain.

Of course things will break because instead of calling their
syscall-invoking functions my_sched_getaddr or something like that,
everyone uses the name that libcs will end up using once they add
support …

Thanks,
Florian
enh Sept. 6, 2024, 1:54 p.m. UTC | #6
On Fri, Sep 6, 2024 at 8:23 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> > On Thu, Sep 5, 2024 at 4:39 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> > On Thu, Sep 5, 2024 at 3:51 PM Carlos O'Donell <carlos@redhat.com> wrote:
> >> >>
> >> >> On 9/5/24 2:33 PM, Florian Weimer wrote:
> >> >> > These are straightforward system call wrappers.  Like the existing
> >> >> > sched_* scheduler policy functions, sched_setattr interacts poorly with
> >> >> > PTHREAD_PRIO_PROTECT mutexes.
> >> >>
> >> >> As noted by Jopseh Myers in 2015, these interfaces should have had wrappers
> >> >> a long time ago following:
> >> >> https://sourceware.org/glibc/wiki/Consensus#WIP:_Kernel_syscalls_wrappers
> >> >>
> >> >> I think these APIs meet the goals outlined there, modulo their poor interaction
> >> >> with the pthread_* APIs, which is something that can be improved over time.
> >> >
> >> > do you trust upstream not to break source compatibility again? there
> >> > have already been two different versions of this struct with the same
> >> > name. or are you just assuming that next time they do it, userspace
> >> > has a struct sched_attr_2?
> >>
> >> The v2 patch has some additional explanation how the extensibility is
> >> expected to work:
> >>
> >>   [PATCH v2 2/3] Linux: Add the sched_setattr and sched_getattr functions
> >>   <https://inbox.sourceware.org/libc-alpha/1f7f9c80017e91f0af99e1bfd9ec1f59c6cc0da7.1725567796.git.fweimer@redhat.com/>
> >>
> >> It's fairly standard as far as such interfaces go, I think.
> >
> > oh, interesting ... do you have example precedent there?
>
> Isn't it widely used in Windows?  See the dwSize member here for a
> random example:
>
> <https://learn.microsoft.com/en-us/windows/win32/api/tlhelp32/ns-tlhelp32-processentry32>

neither binary nor source compatibility with Windows comes up much in
my line of work :-)

(though to be fair to them, the few Windows APIs i am aware of _do_
have suffixes like "2" or "Ex" to evolve in a backwards-compatible
way.)

> Aleksa Sarai's recent series provides a convenient reference of similar
> exensible interfaces used in the Linux UAPI:
>
>   [PATCH RFC v2 00/10] extensible syscalls: CHECK_FIELDS to allow for
>   easier feature detection
>   <https://lore.kernel.org/linux-api/20240906-extensible-structs-check_fields-v2-0-0f46d2de9bad@cyphar.com/>
>
> It's also similar to the tag-length-value approach used by inotify, FUSE
> and others.
>
> > (in particular i'm wondering whether i'm just blanking on other places
> > where this has happened in bionic, or whether the most recent examples
> > pre-date Android/only affected architectures Android doesn't support?)
>
> The obvious one is struct statx (missing from Aleksa's series for some
> reason), although currently the padding fields are being used up.  If
> you have your own definition of struct statx, it still needs regular
> maintenance.

statx(2) is too new to have caused me any trouble yet (i've yet to see
a single user), but i think that one's fine because it uses the

  __u64 __spare3[11];

trick instead, so neither binary nor source compatibility issues there.

the struct sched_attr alternative of

  __u32 size;

is fine from a kernel-userspace binary compatibility perspective, but
doesn't help with userspace-userspace compatibility (like the
32-/64-bit time_t discussion), nor source compatibility.

> With the arrival of __has_include, in glibc, we generally try to get
> these type definitions from kernel headers.
>
> > maybe. i should perhaps have said that i _started_ doing the
> > sched_setattr()/sched_getattr() work in bionic but found that it broke
> > a bunch of stuff. iirc it was specifically the combination of
> > "everyone has had to polyfill this" and "...but there have been two
> > versions" that made it a pain.
>
> Of course things will break because instead of calling their
> syscall-invoking functions my_sched_getaddr or something like that,
> everyone uses the name that libcs will end up using once they add
> support …
>
> Thanks,
> Florian
>
Florian Weimer Sept. 6, 2024, 2:04 p.m. UTC | #7
> the struct sched_attr alternative of
>
>   __u32 size;
>
> is fine from a kernel-userspace binary compatibility perspective, but
> doesn't help with userspace-userspace compatibility (like the
> 32-/64-bit time_t discussion), nor source compatibility.

That's fair, it's not something that you are expected to use in a public
header, and my manual update try this to explain that.  And it's similar
to struct dirent that copying values requires a bit of fiddling (the
kernel may produce names that overflow the embedded string space in
struct dirent).

Thanks,
Florian
Adhemerval Zanella Netto Sept. 6, 2024, 2:14 p.m. UTC | #8
On 06/09/24 11:04, Florian Weimer wrote:
>> the struct sched_attr alternative of
>>
>>   __u32 size;
>>
>> is fine from a kernel-userspace binary compatibility perspective, but
>> doesn't help with userspace-userspace compatibility (like the
>> 32-/64-bit time_t discussion), nor source compatibility.
> 
> That's fair, it's not something that you are expected to use in a public
> header, and my manual update try this to explain that.  And it's similar
> to struct dirent that copying values requires a bit of fiddling (the
> kernel may produce names that overflow the embedded string space in
> struct dirent).
> 

> There isn't any thing special to the initial addition, as long as we
> don't do emulation.  Strucurally, sched_getattr has the same behavior as
> recv, for example.  It makes sense to put something into the manual
> about extending the struct, and I did that.

That's is the same issues that I discussed with Elliot on the openat2
and that we did not come up with defined strategy because each direction
would require some tradeoffs:

  1. Follow kABI and tie the struct with a size:
     PRO: it simplifies implementation and allows any possible kernel
          extension.
     CON: it adds possible subtle bugs in userspace binary compatibility,
          and also does not have fully source compatibility (the source
          relies on the installed kernel headers).

  2. Add a libc defined struct, possible with slack space for extension:
     PRO: Binary and source compatibility across glibc and kernel releases.
     CON: It would require to sync any extra flags for current supported
          fieds, and any extra field that do not fit in the slack space
          would require a new symbol version.

I have a slight inclination to *not* follow kernel way, since it was not
designed for userland possible issues and the subtle problems are really
difficult to debug.
Carlos O'Donell Sept. 6, 2024, 5:26 p.m. UTC | #9
On 9/6/24 10:14 AM, Adhemerval Zanella Netto wrote:
>> There isn't any thing special to the initial addition, as long as we
>> don't do emulation.  Strucurally, sched_getattr has the same behavior as
>> recv, for example.  It makes sense to put something into the manual
>> about extending the struct, and I did that.
> 
> That's is the same issues that I discussed with Elliot on the openat2
> and that we did not come up with defined strategy because each direction
> would require some tradeoffs:
> 
>   1. Follow kABI and tie the struct with a size:
>      PRO: it simplifies implementation and allows any possible kernel
>           extension.
>      CON: it adds possible subtle bugs in userspace binary compatibility,
>           and also does not have fully source compatibility (the source
>           relies on the installed kernel headers).

For existing interfaces in the kernel I think #1 is the best option.

We are going to face significant resistance to now following the kABI for the syscall
wrapper and rightly so I think.

> 
>   2. Add a libc defined struct, possible with slack space for extension:
>      PRO: Binary and source compatibility across glibc and kernel releases.
>      CON: It would require to sync any extra flags for current supported
>           fieds, and any extra field that do not fit in the slack space
>           would require a new symbol version.

I think we can do #2 for newer generic pthread_*_np interfaces that allow us to expose
these types in a more generic way.
 
> I have a slight inclination to *not* follow kernel way, since it was not
> designed for userland possible issues and the subtle problems are really
> difficult to debug.

My position is #1 for existing syscalls.

My position is #2 for new pthread_*_np interfaces that we use to extend scheduling.

Thoughts?

We need to also review new pthread_*_np filesystem interfaces e.g.
https://inbox.sourceware.org/libc-alpha/704d7df1ae00370c8c52ca20eeac77d64dd9da3b.1665752531.git.fweimer@redhat.com/T/
Florian Weimer Sept. 6, 2024, 5:33 p.m. UTC | #10
* Adhemerval Zanella Netto:

> On 06/09/24 11:04, Florian Weimer wrote:
>>> the struct sched_attr alternative of
>>>
>>>   __u32 size;
>>>
>>> is fine from a kernel-userspace binary compatibility perspective, but
>>> doesn't help with userspace-userspace compatibility (like the
>>> 32-/64-bit time_t discussion), nor source compatibility.
>> 
>> That's fair, it's not something that you are expected to use in a public
>> header, and my manual update try this to explain that.  And it's similar
>> to struct dirent that copying values requires a bit of fiddling (the
>> kernel may produce names that overflow the embedded string space in
>> struct dirent).
>> 
>
>> There isn't any thing special to the initial addition, as long as we
>> don't do emulation.  Strucurally, sched_getattr has the same behavior as
>> recv, for example.  It makes sense to put something into the manual
>> about extending the struct, and I did that.
>
> That's is the same issues that I discussed with Elliot on the openat2
> and that we did not come up with defined strategy because each direction
> would require some tradeoffs:
>
>   1. Follow kABI and tie the struct with a size:
>      PRO: it simplifies implementation and allows any possible kernel
>           extension.
>      CON: it adds possible subtle bugs in userspace binary compatibility,
>           and also does not have fully source compatibility (the source
>           relies on the installed kernel headers).
>
>   2. Add a libc defined struct, possible with slack space for extension:
>      PRO: Binary and source compatibility across glibc and kernel releases.
>      CON: It would require to sync any extra flags for current supported
>           fieds, and any extra field that do not fit in the slack space
>           would require a new symbol version.

CON for 2 is also that it's pretty close to emulation, which we know
doesn't work.  We have to check all flags and fail all unknown things
proactively because we can't be sure how unknown flags interact with the
official kernel-supported extension mechanism.  This means that any new
functionality will require kernel and glibc updates.  I expect
applications not to adopt this, sticking to direct system calls and the
UAPI headers instead.

I also expect that the run-time checks for 1 are easier to write than
the compile-time checks for 2, even if we provide suitable preprocessor
macros to enable conditional compilation based on the availability of
different struct versions.

> I have a slight inclination to *not* follow kernel way, since it was not
> designed for userland possible issues and the subtle problems are really
> difficult to debug.

I disagree, it works just fine for userspace.  You just need follow the
same extension protocol for the relevant interface.

Anyway, I'm not going to push the sched_attr changes for now, and will
work on a new section for the manual on types which change their layout.

Thanks,
Florian
Carlos O'Donell Sept. 6, 2024, 6:49 p.m. UTC | #11
On 9/6/24 1:33 PM, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> On 06/09/24 11:04, Florian Weimer wrote:
>>>> the struct sched_attr alternative of
>>>>
>>>>   __u32 size;
>>>>
>>>> is fine from a kernel-userspace binary compatibility perspective, but
>>>> doesn't help with userspace-userspace compatibility (like the
>>>> 32-/64-bit time_t discussion), nor source compatibility.
>>>
>>> That's fair, it's not something that you are expected to use in a public
>>> header, and my manual update try this to explain that.  And it's similar
>>> to struct dirent that copying values requires a bit of fiddling (the
>>> kernel may produce names that overflow the embedded string space in
>>> struct dirent).
>>>
>>
>>> There isn't any thing special to the initial addition, as long as we
>>> don't do emulation.  Strucurally, sched_getattr has the same behavior as
>>> recv, for example.  It makes sense to put something into the manual
>>> about extending the struct, and I did that.
>>
>> That's is the same issues that I discussed with Elliot on the openat2
>> and that we did not come up with defined strategy because each direction
>> would require some tradeoffs:
>>
>>   1. Follow kABI and tie the struct with a size:
>>      PRO: it simplifies implementation and allows any possible kernel
>>           extension.
>>      CON: it adds possible subtle bugs in userspace binary compatibility,
>>           and also does not have fully source compatibility (the source
>>           relies on the installed kernel headers).
>>
>>   2. Add a libc defined struct, possible with slack space for extension:
>>      PRO: Binary and source compatibility across glibc and kernel releases.
>>      CON: It would require to sync any extra flags for current supported
>>           fieds, and any extra field that do not fit in the slack space
>>           would require a new symbol version.
> 
> CON for 2 is also that it's pretty close to emulation, which we know
> doesn't work.  We have to check all flags and fail all unknown things
> proactively because we can't be sure how unknown flags interact with the
> official kernel-supported extension mechanism.  This means that any new
> functionality will require kernel and glibc updates.  I expect
> applications not to adopt this, sticking to direct system calls and the
> UAPI headers instead.

Agreed.

The required double update is very high friction for downstream distributions.

Impossible really if a new symbol set is required because you need more bytes.

I also think that if we want to add an *additional* symbol for #2, then we can do
that, but collaboration with musl and bionic on libc-coord would be a good idea.

> I also expect that the run-time checks for 1 are easier to write than
> the compile-time checks for 2, even if we provide suitable preprocessor
> macros to enable conditional compilation based on the availability of
> different struct versions.
> 
>> I have a slight inclination to *not* follow kernel way, since it was not
>> designed for userland possible issues and the subtle problems are really
>> difficult to debug.
> 
> I disagree, it works just fine for userspace.  You just need follow the
> same extension protocol for the relevant interface.
> 
> Anyway, I'm not going to push the sched_attr changes for now, and will
> work on a new section for the manual on types which change their layout.

Thanks.
Adhemerval Zanella Netto Sept. 6, 2024, 7:08 p.m. UTC | #12
On 06/09/24 15:49, Carlos O'Donell wrote:
> On 9/6/24 1:33 PM, Florian Weimer wrote:
>> * Adhemerval Zanella Netto:
>>
>>> On 06/09/24 11:04, Florian Weimer wrote:
>>>>> the struct sched_attr alternative of
>>>>>
>>>>>   __u32 size;
>>>>>
>>>>> is fine from a kernel-userspace binary compatibility perspective, but
>>>>> doesn't help with userspace-userspace compatibility (like the
>>>>> 32-/64-bit time_t discussion), nor source compatibility.
>>>>
>>>> That's fair, it's not something that you are expected to use in a public
>>>> header, and my manual update try this to explain that.  And it's similar
>>>> to struct dirent that copying values requires a bit of fiddling (the
>>>> kernel may produce names that overflow the embedded string space in
>>>> struct dirent).
>>>>
>>>
>>>> There isn't any thing special to the initial addition, as long as we
>>>> don't do emulation.  Strucurally, sched_getattr has the same behavior as
>>>> recv, for example.  It makes sense to put something into the manual
>>>> about extending the struct, and I did that.
>>>
>>> That's is the same issues that I discussed with Elliot on the openat2
>>> and that we did not come up with defined strategy because each direction
>>> would require some tradeoffs:
>>>
>>>   1. Follow kABI and tie the struct with a size:
>>>      PRO: it simplifies implementation and allows any possible kernel
>>>           extension.
>>>      CON: it adds possible subtle bugs in userspace binary compatibility,
>>>           and also does not have fully source compatibility (the source
>>>           relies on the installed kernel headers).
>>>
>>>   2. Add a libc defined struct, possible with slack space for extension:
>>>      PRO: Binary and source compatibility across glibc and kernel releases.
>>>      CON: It would require to sync any extra flags for current supported
>>>           fieds, and any extra field that do not fit in the slack space
>>>           would require a new symbol version.
>>
>> CON for 2 is also that it's pretty close to emulation, which we know
>> doesn't work.  We have to check all flags and fail all unknown things
>> proactively because we can't be sure how unknown flags interact with the
>> official kernel-supported extension mechanism.  This means that any new
>> functionality will require kernel and glibc updates.  I expect
>> applications not to adopt this, sticking to direct system calls and the
>> UAPI headers instead.
> 
> Agreed.
> 
> The required double update is very high friction for downstream distributions.
> 
> Impossible really if a new symbol set is required because you need more bytes.
> 
> I also think that if we want to add an *additional* symbol for #2, then we can do
> that, but collaboration with musl and bionic on libc-coord would be a good idea.

The emulation does work in some very specific cases, where some syscall 
is a superset of old one.  For instance, if the new syscall adds a flags
(like preadv2), it is still well defined to call the old one as long
the new provide the old syscall functionality (like preadv2 with flag 0
being essentially preadv).

But I give you that this not straightforward and does additional work
from glibc to handle correctly multiple kernel version. Also, with 1. the
idea is provide forward compatibility which is something that we never 
supported in glibc version.

That's why I think we still need to chose here which tradeoff is less
troublesome.  I only *slight* prefer 2. because I think it makes a glibc
version release more *consistent*, since the interface is suppose to work
regardless of the installed kernel header.  It does not fully solve forward
support I agree.

> 
>> I also expect that the run-time checks for 1 are easier to write than
>> the compile-time checks for 2, even if we provide suitable preprocessor
>> macros to enable conditional compilation based on the availability of
>> different struct versions.
>>
>>> I have a slight inclination to *not* follow kernel way, since it was not
>>> designed for userland possible issues and the subtle problems are really
>>> difficult to debug.
>>
>> I disagree, it works just fine for userspace.  You just need follow the
>> same extension protocol for the relevant interface.
>>
>> Anyway, I'm not going to push the sched_attr changes for now, and will
>> work on a new section for the manual on types which change their layout.

I really don't mean to suppress this implementation, I just wanted to raise
the tradeoff it would incur and define how exactly glibc will handle such
kernel interfaces because I did have much reply on openat2 addition and
other libcs seems not keen on such strategy.

But even though I slight prefer 2., I sure can work with 1 (and that seems
to what you and Carlos seems to be leaning to).

> My position is #1 for existing syscalls.
> 
> My position is #2 for new pthread_*_np interfaces that we use to extend scheduling.
> 
> Thoughts?

I can work with this approach.