diff mbox series

sockaddr.3type: BUGS: Document that libc should be fixed using a union

Message ID 20230205152835.17413-1-alx@kernel.org
State New
Headers show
Series sockaddr.3type: BUGS: Document that libc should be fixed using a union | expand

Commit Message

Alejandro Colomar Feb. 5, 2023, 3:28 p.m. UTC
As discussed before, and Bastien and I seem to agree, ideally we should
define the following types:

    struct sockaddr_storage {
        union {
            struct {
                sa_family_t      ss_family;
            };
            struct sockaddr_in   sin;
            struct sockaddr_in6  sin6;
            struct sockaddr_un   sun;
            // ...
        };
    };

    struct [[deprecated]] sockaddr {
        sa_family_t              sa_family;
    };

    union [[gnu::transparent_union]] sockaddr_ptr {
        struct sockaddr_storage  *ss;
        struct sockaddr          *sa;
    };

And then we could define APIs like:

    int bind(int sockfd, const union sockaddr_ptr *addr, socklen_t len);

Link: <https://lore.kernel.org/linux-man/ab492040-2058-bcbe-c920-a9088a20f071@gmail.com/T/#u>
Link: <https://inbox.sourceware.org/libc-alpha/20230120134043.10247-1-alx@kernel.org/T/#u>
Cc: GCC <gcc@gcc.gnu.org>
Cc: glibc <libc-alpha@sourceware.org>
Cc: Bastien Roucariès <rouca@debian.org>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Igor Sysoev <igor@sysoev.ru>
Cc: Rich Felker <dalias@libc.org>
Cc: Andrew Clayton <a.clayton@nginx.com>
Cc: Richard Biener <richard.guenther@gmail.com>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3type/sockaddr.3type | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Alejandro Colomar Feb. 5, 2023, 3:31 p.m. UTC | #1
Formatted version:

BUGS
        sockaddr_storage was designed back when strict aliasing wasn’t a  prob‐
        lem.  Back then, one would define a variable of that type, and then ac‐
        cess it as any of the other sockaddr_* types, depending on the value of
        the  first  member.   This is Undefined Behavior.  However, there is no
        way to use these APIs without invoking Unedfined  Behavior,  either  in
        the  user  program  or  in libc, so it is still recommended to use this
        method.  The only correct way to use  different  types  in  an  API  is
        through  a  union.   However, that union must be implemented in the li‐
        brary, since the type must be shared between the library and user code,
        so libc should be fixed by implementing sockaddr_storage as a union.
Rich Felker Feb. 5, 2023, 11:43 p.m. UTC | #2
On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:
> As discussed before, and Bastien and I seem to agree, ideally we should
> define the following types:
> 
>     struct sockaddr_storage {
>         union {
>             struct {
>                 sa_family_t      ss_family;
>             };
>             struct sockaddr_in   sin;
>             struct sockaddr_in6  sin6;
>             struct sockaddr_un   sun;
>             // ...
>         };
>     };

AFAIK this is not permitted because of namespace. sys/socket.h is not
permitted to expose sockaddr_{in,in6,un}. And if you defined
differently-tagged structures with the same contents, it would not do
any good; accessing the members with the wrong-tagged struct type
would still be UB.

Really, there is no action needed here. Nothing is wrong on libc's
side. The problem is just that the type is *not useful for anything*
and should not be used except in the context of sizeof, which is
purely a documentation issue.

>     struct [[deprecated]] sockaddr {
>         sa_family_t              sa_family;
>     };
> 
>     union [[gnu::transparent_union]] sockaddr_ptr {
>         struct sockaddr_storage  *ss;
>         struct sockaddr          *sa;
>     };
> 
> And then we could define APIs like:
> 
>     int bind(int sockfd, const union sockaddr_ptr *addr, socklen_t len);

You cannot just change APIs because you wish they were different.
Ideally bind, etc. would just take void *, which is what the struct
sockaddr * is being used as. But they don't, so callers have to cast.
It's ugly but it's really not a big deal. Much less of a big deal than
breaking the interface because you think it would look prettier if it
had been done differently.

Rich
Alejandro Colomar Feb. 5, 2023, 11:59 p.m. UTC | #3
Hi Rich,

On 2/6/23 00:43, Rich Felker wrote:
> On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:
>> As discussed before, and Bastien and I seem to agree, ideally we should
>> define the following types:
>>
>>      struct sockaddr_storage {
>>          union {
>>              struct {
>>                  sa_family_t      ss_family;
>>              };
>>              struct sockaddr_in   sin;
>>              struct sockaddr_in6  sin6;
>>              struct sockaddr_un   sun;
>>              // ...
>>          };
>>      };
> 
> AFAIK this is not permitted because of namespace. sys/socket.h is not
> permitted to expose sockaddr_{in,in6,un}. And if you defined
> differently-tagged structures with the same contents, it would not do
> any good; accessing the members with the wrong-tagged struct type
> would still be UB.

I'm not sure.  ISO C has that restriction that a header can only define what the 
standard says it defines.  However, does POSIX have the same restriction? 
Doesn't POSIX allow including any other POSIX headers (maybe it does, but IIRC 
it doesn't)?  Since <sys/socket.h> is just a POSIX thing, that's the only 
standard we should care about.

> 
> Really, there is no action needed here. Nothing is wrong on libc's
> side. The problem is just that the type is *not useful for anything*
> and should not be used except in the context of sizeof, which is
> purely a documentation issue.
> 
>>      struct [[deprecated]] sockaddr {
>>          sa_family_t              sa_family;
>>      };
>>
>>      union [[gnu::transparent_union]] sockaddr_ptr {
>>          struct sockaddr_storage  *ss;
>>          struct sockaddr          *sa;
>>      };
>>
>> And then we could define APIs like:
>>
>>      int bind(int sockfd, const union sockaddr_ptr *addr, socklen_t len);
> 
> You cannot just change APIs because you wish they were different.

This API is compatible.  In fact, it already is now like that:

alx@debian:/usr/include$ grepc bind
./x86_64-linux-gnu/sys/socket.h:112:
extern int bind (int __fd, __CONST_SOCKADDR_ARG __addr, socklen_t __len)
      __THROW;

alx@debian:/usr/include$ sed -n 83,84p x86_64-linux-gnu/sys/socket.h
typedef union { __SOCKADDR_ALLTYPES
	      } __CONST_SOCKADDR_ARG __attribute__ ((__transparent_union__));


> Ideally bind, etc. would just take void *,

void * is a bit too much unsafe.  GCC's transparent unions are a restricted 
catch-many pointer, rather than a catch-all.

> which is what the struct
> sockaddr * is being used as.

And in fact, void* wouldn't sole the union problem.

> But they don't, so callers have to cast.

With the current glibc implementation, you don't need to cast, thanks to the 
[[gnu::transparent_union]]:


alx@debian:~/tmp$ cat sock.c
#define _GNU_SOURCE
#include <sys/socket.h>
#include <sys/un.h>

int
bind_un(int sfd, const struct sockaddr_un *addr, socklen_t len)
{
	return bind(sfd, addr, len);
}
alx@debian:~/tmp$ cc -Wall -Wextra sock.c -S
alx@debian:~/tmp$ clang -Wall -Wextra sock.c -S
alx@debian:~/tmp$


> It's ugly but it's really not a big deal. Much less of a big deal than
> breaking the interface because you think it would look prettier if it
> had been done differently.

It's not breaking the interface; not in GNU C.  Current code still falls back to 
the a POSIX-complying UB-invoking interface when you don't ask for _GNU_SOURCE, 
but we can keep that.  I'm only asking that we fix the GNU C version.  Moreover, 
in POSIX-complying code, you can keep the interface pointer, since you'll need 
to cast anyway, but can still make sockaddr_storage be implemented through an 
anonymous union.

Cheers,

Alex

> 
> Rich
Rich Felker Feb. 6, 2023, 12:15 a.m. UTC | #4
On Mon, Feb 06, 2023 at 12:59:48AM +0100, Alejandro Colomar wrote:
> Hi Rich,
> 
> On 2/6/23 00:43, Rich Felker wrote:
> >On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:
> >>As discussed before, and Bastien and I seem to agree, ideally we should
> >>define the following types:
> >>
> >>     struct sockaddr_storage {
> >>         union {
> >>             struct {
> >>                 sa_family_t      ss_family;
> >>             };
> >>             struct sockaddr_in   sin;
> >>             struct sockaddr_in6  sin6;
> >>             struct sockaddr_un   sun;
> >>             // ...
> >>         };
> >>     };
> >
> >AFAIK this is not permitted because of namespace. sys/socket.h is not
> >permitted to expose sockaddr_{in,in6,un}. And if you defined
> >differently-tagged structures with the same contents, it would not do
> >any good; accessing the members with the wrong-tagged struct type
> >would still be UB.
> 
> I'm not sure.  ISO C has that restriction that a header can only
> define what the standard says it defines.  However, does POSIX have
> the same restriction?

Yes.

> Doesn't POSIX allow including any other POSIX
> headers (maybe it does, but IIRC it doesn't)?

Not except where it's explicitly allowed.

> Since <sys/socket.h>
> is just a POSIX thing, that's the only standard we should care
> about.

The relevant text is here:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_02_02

> >Really, there is no action needed here. Nothing is wrong on libc's
> >side. The problem is just that the type is *not useful for anything*
> >and should not be used except in the context of sizeof, which is
> >purely a documentation issue.
> >
> >>     struct [[deprecated]] sockaddr {
> >>         sa_family_t              sa_family;
> >>     };
> >>
> >>     union [[gnu::transparent_union]] sockaddr_ptr {
> >>         struct sockaddr_storage  *ss;
> >>         struct sockaddr          *sa;
> >>     };
> >>
> >>And then we could define APIs like:
> >>
> >>     int bind(int sockfd, const union sockaddr_ptr *addr, socklen_t len);
> >
> >You cannot just change APIs because you wish they were different.
> 
> This API is compatible.  In fact, it already is now like that:

Unless I'm mistaken, it's not. The function pointer the name 'bind'
produces will not have the right type, and will have problems being
stored in a function pointer object with the right type, as well as
wrong results with _Generic, etc. Only plain calls to it are
unaffected.

> alx@debian:/usr/include$ grepc bind
> ./x86_64-linux-gnu/sys/socket.h:112:
> extern int bind (int __fd, __CONST_SOCKADDR_ARG __addr, socklen_t __len)
>      __THROW;
> 
> alx@debian:/usr/include$ sed -n 83,84p x86_64-linux-gnu/sys/socket.h
> typedef union { __SOCKADDR_ALLTYPES
> 	      } __CONST_SOCKADDR_ARG __attribute__ ((__transparent_union__));
> 
> 
> >Ideally bind, etc. would just take void *,
> 
> void * is a bit too much unsafe.  GCC's transparent unions are a
> restricted catch-many pointer, rather than a catch-all.
> 
> >which is what the struct
> >sockaddr * is being used as.
> 
> And in fact, void* wouldn't sole the union problem.
> 
> >But they don't, so callers have to cast.
> 
> With the current glibc implementation, you don't need to cast,
> thanks to the [[gnu::transparent_union]]:

This is just facilitating writing non-portable code, since correct
code written to the spec *does* have to cast.

> >It's ugly but it's really not a big deal. Much less of a big deal than
> >breaking the interface because you think it would look prettier if it
> >had been done differently.
> 
> It's not breaking the interface; not in GNU C.  Current code still
> falls back to the a POSIX-complying UB-invoking interface when you
> don't ask for _GNU_SOURCE, but we can keep that.  I'm only asking
> that we fix the GNU C version.  Moreover, in POSIX-complying code,
> you can keep the interface pointer, since you'll need to cast
> anyway, but can still make sockaddr_storage be implemented through
> an anonymous union.

If it's only for _GNU_SOURCE it's probably mostly harmless, and
allowable to violate the namespace rules, but also not terribly
helpful...

Rich
Xi Ruoyao Feb. 6, 2023, 6:02 a.m. UTC | #5
On Sun, 2023-02-05 at 16:31 +0100, Alejandro Colomar via Libc-alpha wrote:

> The only correct way to use  different  types  in  an  API  is
> through  a  union.

I don't think this statement is true (in general).  Technically we can
write something like this:

struct sockaddr { ... };
struct sockaddr_in { ... };
struct sockaddr_in6 { ... };

int bind(int fd, const struct sockaddr *addr, socklen_t addrlen)
{
    if (addrlen < sizeof(struct sockaddr) {
        errno = EINVAL;
        return -1;
    }

    /* cannot use "addr->sa_family" directly: it will be an UB */
    sa_family_t sa_family;
    memcpy(&sa_family, addr, sizeof(sa_family));

    switch (sa_family) {
        case AF_INET:
            return _do_bind_in(fd, (struct sockaddr_in *)addr, addrlen);
        case AF_INET6:
            return _do_bind_in6(fd, (struct sockaddr_in6 *)addr, addrlen);
        /* more cases follow here */
        default:
            errno = EINVAL;
            return -1;
        }
    }
}

In this way we can use sockaddr_{in,in6,...} for bind() safely, as long
as we can distinguish the "real" type of addr using the leading byte
sequence (and the caller uses it carefully).

But obviously sockaddr_storage can't be distinguished here, so casting a
struct sockaddr_stroage * to struct sockaddr * and passing it to bind()
will still be wrong (unless we make sockaddr_storage an union or add
[[gnu::may_alias]]).
Rich Felker Feb. 6, 2023, 11:20 a.m. UTC | #6
On Mon, Feb 06, 2023 at 02:02:23PM +0800, Xi Ruoyao wrote:
> On Sun, 2023-02-05 at 16:31 +0100, Alejandro Colomar via Libc-alpha wrote:
> 
> > The only correct way to use  different  types  in  an  API  is
> > through  a  union.
> 
> I don't think this statement is true (in general).  Technically we can
> write something like this:
> 
> struct sockaddr { ... };
> struct sockaddr_in { ... };
> struct sockaddr_in6 { ... };
> 
> int bind(int fd, const struct sockaddr *addr, socklen_t addrlen)
> {
>     if (addrlen < sizeof(struct sockaddr) {
>         errno = EINVAL;
>         return -1;
>     }
> 
>     /* cannot use "addr->sa_family" directly: it will be an UB */
>     sa_family_t sa_family;
>     memcpy(&sa_family, addr, sizeof(sa_family));
> 
>     switch (sa_family) {
>         case AF_INET:
>             return _do_bind_in(fd, (struct sockaddr_in *)addr, addrlen);
>         case AF_INET6:
>             return _do_bind_in6(fd, (struct sockaddr_in6 *)addr, addrlen);
>         /* more cases follow here */
>         default:
>             errno = EINVAL;
>             return -1;
>         }
>     }
> }
> 
> In this way we can use sockaddr_{in,in6,...} for bind() safely, as long
> as we can distinguish the "real" type of addr using the leading byte
> sequence (and the caller uses it carefully).
> 
> But obviously sockaddr_storage can't be distinguished here, so casting a
> struct sockaddr_stroage * to struct sockaddr * and passing it to bind()
> will still be wrong (unless we make sockaddr_storage an union or add
> [[gnu::may_alias]]).

If you wanted to make this work, you can just memcpy sockaddr_storage
to a local object of the right declared type to access it. But this is
only relevant for a userspace implementation of bind() rather than one
that just marshalls it across some syscall boundary to a kernel.

Rich
Alejandro Colomar Feb. 6, 2023, 11:55 a.m. UTC | #7
Hi Xi,

On 2/6/23 07:02, Xi Ruoyao wrote:
> On Sun, 2023-02-05 at 16:31 +0100, Alejandro Colomar via Libc-alpha wrote:
> 
>> The only correct way to use  different  types  in  an  API  is
>> through  a  union.
> 
> I don't think this statement is true (in general).  Technically we can
> write something like this:
> 
> struct sockaddr { ... };
> struct sockaddr_in { ... };
> struct sockaddr_in6 { ... };
> 
> int bind(int fd, const struct sockaddr *addr, socklen_t addrlen)
> {
>      if (addrlen < sizeof(struct sockaddr) {
>          errno = EINVAL;
>          return -1;
>      }
> 
>      /* cannot use "addr->sa_family" directly: it will be an UB */
>      sa_family_t sa_family;
>      memcpy(&sa_family, addr, sizeof(sa_family));
> 
>      switch (sa_family) {
>          case AF_INET:
>              return _do_bind_in(fd, (struct sockaddr_in *)addr, addrlen);
>          case AF_INET6:
>              return _do_bind_in6(fd, (struct sockaddr_in6 *)addr, addrlen);
>          /* more cases follow here */
>          default:
>              errno = EINVAL;
>              return -1;
>          }
>      }
> }
> 
> In this way we can use sockaddr_{in,in6,...} for bind() safely, as long
> as we can distinguish the "real" type of addr using the leading byte
> sequence (and the caller uses it carefully).

True; I hadn't thought of memcpy()ing the first member of the struct.  That's 
valid; overcomplicated but valid.

> 
> But obviously sockaddr_storage can't be distinguished here, so casting a
> struct sockaddr_stroage * to struct sockaddr * and passing it to bind()
> will still be wrong (unless we make sockaddr_storage an union or add
> [[gnu::may_alias]]).

But as you say, it still leaves us with a question.  What should one declare for 
passing to the standard APIs?  It can only be a union.  So we can either tell 
users to each create their own union, or we can make sockaddr_storage be a 
union.  The latter slightly violates POSIX due to namespaces as Rich noted, but 
that's a minor violation, and POSIX can be changed to accomodate for that.

If we change sockaddr_storage to be a union, we have two benefits:

-  Old code which uses sockaddr_storage is made conforming (non-UB) without 
modifying the source.
-  Users can inspect the structures.

If we don't, and deprecate sockaddr_storage, we should tell users to declare 
their own unions _and_ treat all these structures as black boxes which can only 
be read by memcpy()ing their contents.

Which of the two do we want?  I think fixing sockaddr_storage is simpler, and 
allows existing practice of reading these structures.  The other one just makes 
(or rather acknowledges, since it has always been like that) a lot of existing 
code invoke UB, and acknowledges that you can't safely use these structures 
without a lot of workarounding.

Cheers,

Alex
Rich Felker Feb. 6, 2023, 1:38 p.m. UTC | #8
On Mon, Feb 06, 2023 at 12:55:12PM +0100, Alejandro Colomar wrote:
> Hi Xi,
> 
> On 2/6/23 07:02, Xi Ruoyao wrote:
> >On Sun, 2023-02-05 at 16:31 +0100, Alejandro Colomar via Libc-alpha wrote:
> >
> >>The only correct way to use  different  types  in  an  API  is
> >>through  a  union.
> >
> >I don't think this statement is true (in general).  Technically we can
> >write something like this:
> >
> >struct sockaddr { ... };
> >struct sockaddr_in { ... };
> >struct sockaddr_in6 { ... };
> >
> >int bind(int fd, const struct sockaddr *addr, socklen_t addrlen)
> >{
> >     if (addrlen < sizeof(struct sockaddr) {
> >         errno = EINVAL;
> >         return -1;
> >     }
> >
> >     /* cannot use "addr->sa_family" directly: it will be an UB */
> >     sa_family_t sa_family;
> >     memcpy(&sa_family, addr, sizeof(sa_family));
> >
> >     switch (sa_family) {
> >         case AF_INET:
> >             return _do_bind_in(fd, (struct sockaddr_in *)addr, addrlen);
> >         case AF_INET6:
> >             return _do_bind_in6(fd, (struct sockaddr_in6 *)addr, addrlen);
> >         /* more cases follow here */
> >         default:
> >             errno = EINVAL;
> >             return -1;
> >         }
> >     }
> >}
> >
> >In this way we can use sockaddr_{in,in6,...} for bind() safely, as long
> >as we can distinguish the "real" type of addr using the leading byte
> >sequence (and the caller uses it carefully).
> 
> True; I hadn't thought of memcpy()ing the first member of the
> struct.  That's valid; overcomplicated but valid.
> 
> >
> >But obviously sockaddr_storage can't be distinguished here, so casting a
> >struct sockaddr_stroage * to struct sockaddr * and passing it to bind()
> >will still be wrong (unless we make sockaddr_storage an union or add
> >[[gnu::may_alias]]).
> 
> But as you say, it still leaves us with a question.  What should one
> declare for passing to the standard APIs?  It can only be a union.
> So we can either tell users to each create their own union, or we
> can make sockaddr_storage be a union.  The latter slightly violates
> POSIX due to namespaces as Rich noted, but that's a minor violation,
> and POSIX can be changed to accomodate for that.

There is absolutely not any need to declare the union for application
code calling the socket APIs. You declare whatever type you will be
using. For binding or connecting a unix socket, sockaddr_un. For IPv6,
sockaddr_in6. Etc. Then you cast the pointer to struct sockaddr * and
pass it to the function.

But normally you don't use declared-type objects for this anyway. You
use the struct sockaddr * obtained from getaddrinfo as an abstract
pointer and never dereference it at all.

> If we change sockaddr_storage to be a union, we have two benefits:
> 
> -  Old code which uses sockaddr_storage is made conforming (non-UB)
> without modifying the source.

It's not conforming. It's just no longer UB as a result of unspecified
implementation choices you'd be making.

> -  Users can inspect the structures.
> 
> If we don't, and deprecate sockaddr_storage, we should tell users to
> declare their own unions _and_ treat all these structures as black
> boxes which can only be read by memcpy()ing their contents.

No, there is no need to tell users to do any such thing. No action is
needed here except for folks to stop using sockaddr_storage.

Rich
Alejandro Colomar Feb. 6, 2023, 2:11 p.m. UTC | #9
Hi Rich,

On 2/6/23 14:38, Rich Felker wrote:
> There is absolutely not any need to declare the union for application
> code calling the socket APIs. You declare whatever type you will be
> using. For binding or connecting a unix socket, sockaddr_un. For IPv6,
> sockaddr_in6. Etc. Then you cast the pointer to struct sockaddr * and
> pass it to the function.

Except that you may be using generic code that may use either of AF_UNIX, 
AF_INET, and AF_INET6.  A web server may very well use all the three.

> 
> But normally you don't use declared-type objects for this anyway. You
> use the struct sockaddr * obtained from getaddrinfo as an abstract
> pointer and never dereference it at all.

That's right.  Most of the time, we should be using getaddrinfo(3), which 
already provides the storage.  I don't know for sure if there are any cases that 
can't be rewritten to work that way.

However, there are some APIs that require you to allocate an object.  For 
example recvfrom(2).  How would you recommend using recvfrom(2), or is it some 
API to avoid?  Example of usage: 
<Mhttps://man7.org/tlpi/code/online/dist/sockets/id_echo_sv.c.html>.


Cheers,

Alex
Zack Weinberg Feb. 6, 2023, 5:21 p.m. UTC | #10
On Mon, Feb 6, 2023, at 9:11 AM, Alejandro Colomar via Libc-alpha wrote:
> On 2/6/23 14:38, Rich Felker wrote:
>> There is absolutely not any need to declare the union for application
>> code calling the socket APIs. You declare whatever type you will be
>> using. For binding or connecting a unix socket, sockaddr_un. For IPv6,
>> sockaddr_in6. Etc. Then you cast the pointer to struct sockaddr * and
>> pass it to the function.
>
> Except that you may be using generic code that may use either of AF_UNIX, 
> AF_INET, and AF_INET6.  A web server may very well use all the three.

I have personally tripped over systems where struct sockaddr_un was _bigger_
than struct sockaddr_storage.  Also, AFAIK modern kernels (not just Linux)
do not actually impose a 108-byte (or whatever) limit on the length of
sun_path; application code can treat the structure definition as being

   struct sockaddr_un {
       sa_family_t sun_family;
       char sun_path[];  // C99 flexible array member
   };

as long as the `addrlen` parameter to whatever system call you're using
accurately reflects the size of the address object you passed in.  Kind
of the same as how you can make your own bigger fd_set to call select()
with, if you want.  Point being, even if sockaddr_storage is bigger than
the _default_ sockaddr_un, that still might not be big enough.

I'd also like to point out that none of these structures can change size
without breaking ABI compatibility.  In particular, namespace issues
aside, glibc _cannot_ make the definition of struct sockaddr be either

    struct sockaddr {
        sa_family_t sa_family;
    };

or

    struct sockaddr {
        union {
            // ...
            struct sockaddr_in6 in6;
        };
    };

because a variable declaration `struct sockaddr sa;` must allocate 16
bytes of space -- no more and no less.

> However, there are some APIs that require you to allocate an object.  For 
> example recvfrom(2).  How would you recommend using recvfrom(2)

Well, most address families have fixed-size addresses: if you called
socket(AF_INET6, SOCK_DGRAM, 0) then you know recvfrom on that socket
needs enough space for struct sockaddr_in6.  If you receive a socket
descriptor as an argument and you don't know its address family, you
can use `getsockopt(sock, SOL_SOCKET, SO_DOMAIN)` to look it up.

This won't work for AF_UNIX, though.  recvfrom() _will_ tell you if
you didn't give it enough space (by updating `addrlen` to a bigger
number than you passed in); it's not idempotent, so you can't call
it again, but you _could_ call getpeername() instead.  In principle
you could have a connectionless (SOCK_DGRAM) AF_UNIX socket and then
getpeername() wouldn't work, but does anyone actually _want_ the
peer address when serving from an AF_UNIX socket, as opposed to
the SCM_CREDENTIALS ancillary message or the SO_PEERCRED sockopt query?

zw
Rich Felker Feb. 6, 2023, 5:48 p.m. UTC | #11
On Mon, Feb 06, 2023 at 03:11:10PM +0100, Alejandro Colomar wrote:
> Hi Rich,
> 
> On 2/6/23 14:38, Rich Felker wrote:
> >There is absolutely not any need to declare the union for application
> >code calling the socket APIs. You declare whatever type you will be
> >using. For binding or connecting a unix socket, sockaddr_un. For IPv6,
> >sockaddr_in6. Etc. Then you cast the pointer to struct sockaddr * and
> >pass it to the function.
> 
> Except that you may be using generic code that may use either of
> AF_UNIX, AF_INET, and AF_INET6.  A web server may very well use all
> the three.

If you have generic code, the generic code is not what's creating the
object. It got the object from the calling code or from some callback
that allocated it, in which case it doesn't have to care about any of
this.

> >But normally you don't use declared-type objects for this anyway. You
> >use the struct sockaddr * obtained from getaddrinfo as an abstract
> >pointer and never dereference it at all.
> 
> That's right.  Most of the time, we should be using getaddrinfo(3),
> which already provides the storage.  I don't know for sure if there
> are any cases that can't be rewritten to work that way.
> 
> However, there are some APIs that require you to allocate an object.
> For example recvfrom(2).  How would you recommend using recvfrom(2),
> or is it some API to avoid?  Example of usage:
> <Mhttps://man7.org/tlpi/code/online/dist/sockets/id_echo_sv.c.html>.

If using allocated storage, there's nothing special you have to do.
But if you want to avoid malloc and use a declared object, you have a
few options:

If it's your socket and you know what address family it's associated
with, you just pass an object of that type. recvfrom will always
produce output of the same AF.

If you're in a situation like having been invoked from inetd, you can
use getsockname into a sockaddr_storage object or union or whatever,
then read out the family field (with memcpy if needed). Then you know
the AF to use. Or, you can just recvfrom into a sockaddr_storage
object, determine the family at that time, then memcpy the the
appropriate object type. Alternatively, for the common case where you
want a printable name for the address, you just pass it to getnameinfo
as-is and let getnameinfo deal with how to read it.

Rich
Eric Blake Feb. 6, 2023, 6:45 p.m. UTC | #12
On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:

Regardless of the merits of the patch, let's not introduce typos:

> +++ b/man3type/sockaddr.3type
> @@ -120,6 +120,26 @@ .SH NOTES
>  .I <netinet/in.h>
>  and
>  .IR <sys/un.h> .
> +.SH BUGS
> +.I sockaddr_storage
> +was designed back when strict aliasing wasn't a problem.
> +Back then,
> +one would define a variable of that type,
> +and then access it as any of the other
> +.IR sockaddr_ *
> +types,
> +depending on the value of the first member.
> +This is Undefined Behavior.
> +However, there is no way to use these APIs without invoking Unedfined Behavior,

Undefined

> +either in the user program or in libc,
> +so it is still recommended to use this method.
> +The only correct way to use different types in an API is through a union.
> +However,
> +that union must be implemented in the library,
> +since the type must be shared between the library and user code,
> +so libc should be fixed by implementing
> +.I sockaddr_storage
> +as a union.
>  .SH SEE ALSO
>  .BR accept (2),
>  .BR bind (2),

Also, while I could raise the issue with the Austin Group on your
behalf to get the POSIX wording improved, I think it would work better
if you initiate a bug report rather than having me do it:

https://www.austingroupbugs.net/main_page.php
Alejandro Colomar Feb. 7, 2023, 1:21 a.m. UTC | #13
Hello Eric,

On 2/6/23 19:45, Eric Blake wrote:
> On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:
> 
> Regardless of the merits of the patch, let's not introduce typos:
> 
>> +++ b/man3type/sockaddr.3type
>> @@ -120,6 +120,26 @@ .SH NOTES
>>   .I <netinet/in.h>
>>   and
>>   .IR <sys/un.h> .
>> +.SH BUGS
>> +.I sockaddr_storage
>> +was designed back when strict aliasing wasn't a problem.
>> +Back then,
>> +one would define a variable of that type,
>> +and then access it as any of the other
>> +.IR sockaddr_ *
>> +types,
>> +depending on the value of the first member.
>> +This is Undefined Behavior.
>> +However, there is no way to use these APIs without invoking Unedfined Behavior,
> 
> Undefined
> 
>> +either in the user program or in libc,
>> +so it is still recommended to use this method.
>> +The only correct way to use different types in an API is through a union.
>> +However,
>> +that union must be implemented in the library,
>> +since the type must be shared between the library and user code,
>> +so libc should be fixed by implementing
>> +.I sockaddr_storage
>> +as a union.
>>   .SH SEE ALSO
>>   .BR accept (2),
>>   .BR bind (2),
> 
> Also, while I could raise the issue with the Austin Group on your
> behalf to get the POSIX wording improved, I think it would work better
> if you initiate a bug report rather than having me do it:
> 
> https://www.austingroupbugs.net/main_page.php

I gave it a try after a year since my last attempt.  The open group website is 
one of the worst I've ever seen.  I'm sorry, I can't get it to work.  Could you 
please report a bug on my behalf?

I have an opengroup account, and supposedly am subscribed to the austin-group-l 
mailing list.  But I'm not able to log it.

Thanks,

Alex

>
roucaries bastien March 18, 2023, 7:54 a.m. UTC | #14
Hi,

I have opened a defect at austin group
https://www.austingroupbugs.net/view.php?id=1641

Bastien

Le lun. 6 févr. 2023 à 18:45, Eric Blake <eblake@redhat.com> a écrit :
>
> On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:
>
> Regardless of the merits of the patch, let's not introduce typos:
>
> > +++ b/man3type/sockaddr.3type
> > @@ -120,6 +120,26 @@ .SH NOTES
> >  .I <netinet/in.h>
> >  and
> >  .IR <sys/un.h> .
> > +.SH BUGS
> > +.I sockaddr_storage
> > +was designed back when strict aliasing wasn't a problem.
> > +Back then,
> > +one would define a variable of that type,
> > +and then access it as any of the other
> > +.IR sockaddr_ *
> > +types,
> > +depending on the value of the first member.
> > +This is Undefined Behavior.
> > +However, there is no way to use these APIs without invoking Unedfined Behavior,
>
> Undefined
>
> > +either in the user program or in libc,
> > +so it is still recommended to use this method.
> > +The only correct way to use different types in an API is through a union.
> > +However,
> > +that union must be implemented in the library,
> > +since the type must be shared between the library and user code,
> > +so libc should be fixed by implementing
> > +.I sockaddr_storage
> > +as a union.
> >  .SH SEE ALSO
> >  .BR accept (2),
> >  .BR bind (2),
>
> Also, while I could raise the issue with the Austin Group on your
> behalf to get the POSIX wording improved, I think it would work better
> if you initiate a bug report rather than having me do it:
>
> https://www.austingroupbugs.net/main_page.php
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>
Alejandro Colomar March 20, 2023, 10:49 a.m. UTC | #15
Hi Bastien and Eric!

On 3/18/23 08:54, roucaries bastien wrote:
> Hi,
> 
> I have opened a defect at austin group
> https://www.austingroupbugs.net/view.php?id=1641

Sorry for not having prepared yet my report as I promised to Eric.
I've been busy with other stuff, but still had in mind doing it :).
But yes, basically that report is what I would have written, so you
have my +1.  ;)

Cheers,

Alex

> 
> Bastien
> 
> Le lun. 6 févr. 2023 à 18:45, Eric Blake <eblake@redhat.com> a écrit :
>>
>> On Sun, Feb 05, 2023 at 04:28:36PM +0100, Alejandro Colomar wrote:
>>
>> Regardless of the merits of the patch, let's not introduce typos:
>>
>>> +++ b/man3type/sockaddr.3type
>>> @@ -120,6 +120,26 @@ .SH NOTES
>>>  .I <netinet/in.h>
>>>  and
>>>  .IR <sys/un.h> .
>>> +.SH BUGS
>>> +.I sockaddr_storage
>>> +was designed back when strict aliasing wasn't a problem.
>>> +Back then,
>>> +one would define a variable of that type,
>>> +and then access it as any of the other
>>> +.IR sockaddr_ *
>>> +types,
>>> +depending on the value of the first member.
>>> +This is Undefined Behavior.
>>> +However, there is no way to use these APIs without invoking Unedfined Behavior,
>>
>> Undefined
>>
>>> +either in the user program or in libc,
>>> +so it is still recommended to use this method.
>>> +The only correct way to use different types in an API is through a union.
>>> +However,
>>> +that union must be implemented in the library,
>>> +since the type must be shared between the library and user code,
>>> +so libc should be fixed by implementing
>>> +.I sockaddr_storage
>>> +as a union.
>>>  .SH SEE ALSO
>>>  .BR accept (2),
>>>  .BR bind (2),
>>
>> Also, while I could raise the issue with the Austin Group on your
>> behalf to get the POSIX wording improved, I think it would work better
>> if you initiate a bug report rather than having me do it:
>>
>> https://www.austingroupbugs.net/main_page.php
>>
>> --
>> Eric Blake, Principal Software Engineer
>> Red Hat, Inc.           +1-919-301-3266
>> Virtualization:  qemu.org | libvirt.org
>>
diff mbox series

Patch

diff --git a/man3type/sockaddr.3type b/man3type/sockaddr.3type
index 319a5e552..239e836fc 100644
--- a/man3type/sockaddr.3type
+++ b/man3type/sockaddr.3type
@@ -120,6 +120,26 @@  .SH NOTES
 .I <netinet/in.h>
 and
 .IR <sys/un.h> .
+.SH BUGS
+.I sockaddr_storage
+was designed back when strict aliasing wasn't a problem.
+Back then,
+one would define a variable of that type,
+and then access it as any of the other
+.IR sockaddr_ *
+types,
+depending on the value of the first member.
+This is Undefined Behavior.
+However, there is no way to use these APIs without invoking Unedfined Behavior,
+either in the user program or in libc,
+so it is still recommended to use this method.
+The only correct way to use different types in an API is through a union.
+However,
+that union must be implemented in the library,
+since the type must be shared between the library and user code,
+so libc should be fixed by implementing
+.I sockaddr_storage
+as a union.
 .SH SEE ALSO
 .BR accept (2),
 .BR bind (2),