Message ID | 20190110142529.14315-1-dh.herrmann@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [1/3] net: introduce SO_BINDTOIF sockopt | expand |
On Thu, Jan 10, 2019 at 3:25 PM David Herrmann <dh.herrmann@gmail.com> wrote: > > This introduces a new generic SOL_SOCKET-level socket option called > SO_BINDTOIF. It behaves similar to SO_BINDTODEVICE, but takes a network > interface index as argument, rather than the network interface name. > > User-space often refers to network-interfaces via their index, but has > to temporarily resolve it to a name for a call into SO_BINDTODEVICE. > This might pose problems when the network-device is renamed > asynchronously by other parts of the system. When this happens, the > SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong > device. > > In most cases user-space only ever operates on devices which they > either manage themselves, or otherwise have a guarantee that the device > name will not change (e.g., devices that are UP cannot be renamed). > However, particularly in libraries this guarantee is non-obvious and it > would be nice if that race-condition would simply not exist. It would > make it easier for those libraries to operate even in situations where > the device-name might change under the hood. > > A real use-case that we recently hit is trying to start the network > stack early in the initrd but make it survive into the real system. > Existing distributions rename network-interfaces during the transition > from initrd into the real system. This, obviously, cannot affect > devices that are up and running (unless you also consider moving them > between network-namespaces). However, the network manager now has to > make sure its management engine for dormant devices will not run in > parallel to these renames. Particularly, when you offload operations > like DHCP into separate processes, these might setup their sockets > early, and thus have to resolve the device-name possibly running into > this race-condition. > > By avoiding a call to resolve the device-name, we no longer depend on > the name and can run network setup of dormant devices in parallel to > the transition off the initrd. The SO_BINDTOIF ioctl plugs this race. > > Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Tom Gundersen <teg@jklm.no> > --- > arch/alpha/include/uapi/asm/socket.h | 2 ++ > arch/ia64/include/uapi/asm/socket.h | 2 ++ > arch/mips/include/uapi/asm/socket.h | 2 ++ > arch/parisc/include/uapi/asm/socket.h | 2 ++ > arch/s390/include/uapi/asm/socket.h | 2 ++ > arch/sparc/include/uapi/asm/socket.h | 2 ++ > arch/xtensa/include/uapi/asm/socket.h | 2 ++ > include/uapi/asm-generic/socket.h | 2 ++ > net/core/sock.c | 46 +++++++++++++++++++++------ > 9 files changed, 52 insertions(+), 10 deletions(-) > > diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h > index 065fb372e355..6e346e51eec7 100644 > --- a/arch/alpha/include/uapi/asm/socket.h > +++ b/arch/alpha/include/uapi/asm/socket.h > @@ -115,4 +115,6 @@ > #define SO_TXTIME 61 > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 62 > + > #endif /* _UAPI_ASM_SOCKET_H */ > diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h > index c872c4e6bafb..ece83ba17b9d 100644 > --- a/arch/ia64/include/uapi/asm/socket.h > +++ b/arch/ia64/include/uapi/asm/socket.h > @@ -117,4 +117,6 @@ > #define SO_TXTIME 61 > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 62 > + > #endif /* _ASM_IA64_SOCKET_H */ > diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h > index 71370fb3ceef..27f7f761ace5 100644 > --- a/arch/mips/include/uapi/asm/socket.h > +++ b/arch/mips/include/uapi/asm/socket.h > @@ -126,4 +126,6 @@ > #define SO_TXTIME 61 > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 62 > + > #endif /* _UAPI_ASM_SOCKET_H */ > diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h > index 061b9cf2a779..efd3917f23e1 100644 > --- a/arch/parisc/include/uapi/asm/socket.h > +++ b/arch/parisc/include/uapi/asm/socket.h > @@ -107,4 +107,6 @@ > #define SO_TXTIME 0x4036 > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 0x4037 > + > #endif /* _UAPI_ASM_SOCKET_H */ > diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h > index 39d901476ee5..c8ba542e69e6 100644 > --- a/arch/s390/include/uapi/asm/socket.h > +++ b/arch/s390/include/uapi/asm/socket.h > @@ -114,4 +114,6 @@ > #define SO_TXTIME 61 > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 62 > + > #endif /* _ASM_SOCKET_H */ > diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h > index 7ea35e5601b6..50006bde7dc0 100644 > --- a/arch/sparc/include/uapi/asm/socket.h > +++ b/arch/sparc/include/uapi/asm/socket.h > @@ -104,6 +104,8 @@ > #define SO_TXTIME 0x003f > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 0x0040 > + > /* Security levels - as per NRL IPv6 - don't actually do anything */ > #define SO_SECURITY_AUTHENTICATION 0x5001 > #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002 > diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h > index 1de07a7f7680..a36241ffbd86 100644 > --- a/arch/xtensa/include/uapi/asm/socket.h > +++ b/arch/xtensa/include/uapi/asm/socket.h > @@ -119,4 +119,6 @@ > #define SO_TXTIME 61 > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 62 > + > #endif /* _XTENSA_SOCKET_H */ > diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h > index a12692e5f7a8..31fb8414ea4c 100644 > --- a/include/uapi/asm-generic/socket.h > +++ b/include/uapi/asm-generic/socket.h > @@ -110,4 +110,6 @@ > #define SO_TXTIME 61 > #define SCM_TXTIME SO_TXTIME > > +#define SO_BINDTOIF 62 > + > #endif /* __ASM_GENERIC_SOCKET_H */ > diff --git a/net/core/sock.c b/net/core/sock.c > index 6aa2e7e0b4fb..df8f83bc22b3 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -520,20 +520,43 @@ struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie) > } > EXPORT_SYMBOL(sk_dst_check); > > -static int sock_setbindtodevice(struct sock *sk, char __user *optval, > - int optlen) > +static int sock_setbindtodevice_locked(struct sock *sk, int ifindex) > { > int ret = -ENOPROTOOPT; > #ifdef CONFIG_NETDEVICES > struct net *net = sock_net(sk); > - char devname[IFNAMSIZ]; > - int index; > > /* Sorry... */ > ret = -EPERM; > if (!ns_capable(net->user_ns, CAP_NET_RAW)) > goto out; > > + ret = -EINVAL; > + if (ifindex < 0) > + goto out; > + > + sk->sk_bound_dev_if = ifindex; > + if (sk->sk_prot->rehash) > + sk->sk_prot->rehash(sk); > + sk_dst_reset(sk); > + > + ret = 0; > + > +out: > +#endif > + > + return ret; > +} > + > +static int sock_setbindtodevice(struct sock *sk, char __user *optval, > + int optlen) > +{ > + int ret = -ENOPROTOOPT; > +#ifdef CONFIG_NETDEVICES > + struct net *net = sock_net(sk); > + char devname[IFNAMSIZ]; > + int index; > + > ret = -EINVAL; > if (optlen < 0) > goto out; > @@ -566,14 +589,9 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval, > } > > lock_sock(sk); > - sk->sk_bound_dev_if = index; > - if (sk->sk_prot->rehash) > - sk->sk_prot->rehash(sk); > - sk_dst_reset(sk); > + ret = sock_setbindtodevice_locked(sk, index); > release_sock(sk); > > - ret = 0; > - > out: > #endif > > @@ -1055,6 +1073,10 @@ int sock_setsockopt(struct socket *sock, int level, int optname, > } > break; > > + case SO_BINDTOIF: > + ret = sock_setbindtodevice_locked(sk, val); > + break; > + > default: > ret = -ENOPROTOOPT; > break; > @@ -1399,6 +1421,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname, > SOF_TXTIME_REPORT_ERRORS : 0; > break; > > + case SO_BINDTOIF: > + v.val = sk->sk_bound_dev_if; > + break; > + > default: > /* We implement the SO_SNDLOWAT etc to not be settable > * (1003.1g 7). > -- > 2.20.1 >
On 1/10/19 7:25 AM, David Herrmann wrote: > This introduces a new generic SOL_SOCKET-level socket option called > SO_BINDTOIF. It behaves similar to SO_BINDTODEVICE, but takes a network > interface index as argument, rather than the network interface name. SO_BINDTOIF is not very descriptive related to SO_BINDTODEVICE. SO_BINDTOINDEX, SO_BINDTODEVINDEX or SO_BINDTODEVIDX would be clearer about this option versus SO_BINDTODEVICE.
On Thu, Jan 10, 2019 at 8:38 AM David Ahern <dsahern@gmail.com> wrote: > > On 1/10/19 7:25 AM, David Herrmann wrote: > > This introduces a new generic SOL_SOCKET-level socket option called > > SO_BINDTOIF. It behaves similar to SO_BINDTODEVICE, but takes a network > > interface index as argument, rather than the network interface name. > > SO_BINDTOIF is not very descriptive related to SO_BINDTODEVICE. > > SO_BINDTOINDEX, SO_BINDTODEVINDEX or SO_BINDTODEVIDX would be clearer > about this option versus SO_BINDTODEVICE. > +1 , also SO_BINDTOIFINDEX
Hi On Thu, Jan 10, 2019 at 5:38 PM David Ahern <dsahern@gmail.com> wrote: > > On 1/10/19 7:25 AM, David Herrmann wrote: > > This introduces a new generic SOL_SOCKET-level socket option called > > SO_BINDTOIF. It behaves similar to SO_BINDTODEVICE, but takes a network > > interface index as argument, rather than the network interface name. > > SO_BINDTOIF is not very descriptive related to SO_BINDTODEVICE. > > SO_BINDTOINDEX, SO_BINDTODEVINDEX or SO_BINDTODEVIDX would be clearer > about this option versus SO_BINDTODEVICE. I am open for these suggestions. I don't particularly have any preference on the names, but I agree that BINDTOIF is not very easy to read. For v2 I will pick SO_BINDTOIFINDEX (as suggested by Roopa), unless there are any objections. Thanks! David
Hi David, I love your patch! Yet something to improve: [auto build test ERROR on net/master] [also build test ERROR on v5.0-rc1 next-20190111] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/David-Herrmann/net-introduce-SO_BINDTOIF-sockopt/20190111-124603 config: sparc-sparc64_defconfig (attached as .config) compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=sparc All errors (new ones prefixed by >>): net/core/sock.c: In function 'sock_getsockopt': >> net/core/sock.c:1424:2: error: duplicate case value case SO_BINDTOIF: ^~~~ net/core/sock.c:1258:2: note: previously used here case SO_PEERCRED: ^~~~ vim +1424 net/core/sock.c 1403 1404 case SO_COOKIE: 1405 lv = sizeof(u64); 1406 if (len < lv) 1407 return -EINVAL; 1408 v.val64 = sock_gen_cookie(sk); 1409 break; 1410 1411 case SO_ZEROCOPY: 1412 v.val = sock_flag(sk, SOCK_ZEROCOPY); 1413 break; 1414 1415 case SO_TXTIME: 1416 lv = sizeof(v.txtime); 1417 v.txtime.clockid = sk->sk_clockid; 1418 v.txtime.flags |= sk->sk_txtime_deadline_mode ? 1419 SOF_TXTIME_DEADLINE_MODE : 0; 1420 v.txtime.flags |= sk->sk_txtime_report_errors ? 1421 SOF_TXTIME_REPORT_ERRORS : 0; 1422 break; 1423 > 1424 case SO_BINDTOIF: 1425 v.val = sk->sk_bound_dev_if; 1426 break; 1427 1428 default: 1429 /* We implement the SO_SNDLOWAT etc to not be settable 1430 * (1003.1g 7). 1431 */ 1432 return -ENOPROTOOPT; 1433 } 1434 1435 if (len > lv) 1436 len = lv; 1437 if (copy_to_user(optval, &v, len)) 1438 return -EFAULT; 1439 lenout: 1440 if (put_user(len, optlen)) 1441 return -EFAULT; 1442 return 0; 1443 } 1444 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index 065fb372e355..6e346e51eec7 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h @@ -115,4 +115,6 @@ #define SO_TXTIME 61 #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 62 + #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h index c872c4e6bafb..ece83ba17b9d 100644 --- a/arch/ia64/include/uapi/asm/socket.h +++ b/arch/ia64/include/uapi/asm/socket.h @@ -117,4 +117,6 @@ #define SO_TXTIME 61 #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 62 + #endif /* _ASM_IA64_SOCKET_H */ diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index 71370fb3ceef..27f7f761ace5 100644 --- a/arch/mips/include/uapi/asm/socket.h +++ b/arch/mips/include/uapi/asm/socket.h @@ -126,4 +126,6 @@ #define SO_TXTIME 61 #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 62 + #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h index 061b9cf2a779..efd3917f23e1 100644 --- a/arch/parisc/include/uapi/asm/socket.h +++ b/arch/parisc/include/uapi/asm/socket.h @@ -107,4 +107,6 @@ #define SO_TXTIME 0x4036 #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 0x4037 + #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h index 39d901476ee5..c8ba542e69e6 100644 --- a/arch/s390/include/uapi/asm/socket.h +++ b/arch/s390/include/uapi/asm/socket.h @@ -114,4 +114,6 @@ #define SO_TXTIME 61 #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 62 + #endif /* _ASM_SOCKET_H */ diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h index 7ea35e5601b6..50006bde7dc0 100644 --- a/arch/sparc/include/uapi/asm/socket.h +++ b/arch/sparc/include/uapi/asm/socket.h @@ -104,6 +104,8 @@ #define SO_TXTIME 0x003f #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 0x0040 + /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002 diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h index 1de07a7f7680..a36241ffbd86 100644 --- a/arch/xtensa/include/uapi/asm/socket.h +++ b/arch/xtensa/include/uapi/asm/socket.h @@ -119,4 +119,6 @@ #define SO_TXTIME 61 #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 62 + #endif /* _XTENSA_SOCKET_H */ diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index a12692e5f7a8..31fb8414ea4c 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -110,4 +110,6 @@ #define SO_TXTIME 61 #define SCM_TXTIME SO_TXTIME +#define SO_BINDTOIF 62 + #endif /* __ASM_GENERIC_SOCKET_H */ diff --git a/net/core/sock.c b/net/core/sock.c index 6aa2e7e0b4fb..df8f83bc22b3 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -520,20 +520,43 @@ struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie) } EXPORT_SYMBOL(sk_dst_check); -static int sock_setbindtodevice(struct sock *sk, char __user *optval, - int optlen) +static int sock_setbindtodevice_locked(struct sock *sk, int ifindex) { int ret = -ENOPROTOOPT; #ifdef CONFIG_NETDEVICES struct net *net = sock_net(sk); - char devname[IFNAMSIZ]; - int index; /* Sorry... */ ret = -EPERM; if (!ns_capable(net->user_ns, CAP_NET_RAW)) goto out; + ret = -EINVAL; + if (ifindex < 0) + goto out; + + sk->sk_bound_dev_if = ifindex; + if (sk->sk_prot->rehash) + sk->sk_prot->rehash(sk); + sk_dst_reset(sk); + + ret = 0; + +out: +#endif + + return ret; +} + +static int sock_setbindtodevice(struct sock *sk, char __user *optval, + int optlen) +{ + int ret = -ENOPROTOOPT; +#ifdef CONFIG_NETDEVICES + struct net *net = sock_net(sk); + char devname[IFNAMSIZ]; + int index; + ret = -EINVAL; if (optlen < 0) goto out; @@ -566,14 +589,9 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval, } lock_sock(sk); - sk->sk_bound_dev_if = index; - if (sk->sk_prot->rehash) - sk->sk_prot->rehash(sk); - sk_dst_reset(sk); + ret = sock_setbindtodevice_locked(sk, index); release_sock(sk); - ret = 0; - out: #endif @@ -1055,6 +1073,10 @@ int sock_setsockopt(struct socket *sock, int level, int optname, } break; + case SO_BINDTOIF: + ret = sock_setbindtodevice_locked(sk, val); + break; + default: ret = -ENOPROTOOPT; break; @@ -1399,6 +1421,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname, SOF_TXTIME_REPORT_ERRORS : 0; break; + case SO_BINDTOIF: + v.val = sk->sk_bound_dev_if; + break; + default: /* We implement the SO_SNDLOWAT etc to not be settable * (1003.1g 7).
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIF. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIF ioctl plugs this race. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> --- arch/alpha/include/uapi/asm/socket.h | 2 ++ arch/ia64/include/uapi/asm/socket.h | 2 ++ arch/mips/include/uapi/asm/socket.h | 2 ++ arch/parisc/include/uapi/asm/socket.h | 2 ++ arch/s390/include/uapi/asm/socket.h | 2 ++ arch/sparc/include/uapi/asm/socket.h | 2 ++ arch/xtensa/include/uapi/asm/socket.h | 2 ++ include/uapi/asm-generic/socket.h | 2 ++ net/core/sock.c | 46 +++++++++++++++++++++------ 9 files changed, 52 insertions(+), 10 deletions(-)