diff mbox

[net-next] sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name

Message ID 509184D9.8030103@hp.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Brian Haley Oct. 31, 2012, 8:06 p.m. UTC
Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
will then require another call like if_indextoname() to get the actual interface
name, have it return the name directly.

This also matches the existing man page description on socket(7) which mentions
the argument being an interface name.

If the value has not been set, zero is returned and optlen will be set to zero
to indicate there is no interface name present.

Signed-off-by: Brian Haley <brian.haley@hp.com>

--

 	}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andi Kleen Oct. 31, 2012, 8:47 p.m. UTC | #1
Brian Haley <brian.haley@hp.com> writes:

> Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
> will then require another call like if_indextoname() to get the actual interface
> name, have it return the name directly.
>
> This also matches the existing man page description on socket(7) which mentions
> the argument being an interface name.
>
> If the value has not been set, zero is returned and optlen will be set to zero
> to indicate there is no interface name present.

That will break all existing programs using the return value, right?
Better to fix the manpage

-Andi
Brian Haley Nov. 1, 2012, 2:02 p.m. UTC | #2
On 10/31/2012 04:47 PM, Andi Kleen wrote:
> Brian Haley <brian.haley@hp.com> writes:
> 
>> Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
>> will then require another call like if_indextoname() to get the actual interface
>> name, have it return the name directly.
>>
>> This also matches the existing man page description on socket(7) which mentions
>> the argument being an interface name.
>>
>> If the value has not been set, zero is returned and optlen will be set to zero
>> to indicate there is no interface name present.
> 
> That will break all existing programs using the return value, right?
> Better to fix the manpage

Dave just accepted the original code for this into net-next last week, so I
don't think it's too late to change it to be correct.

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 1, 2012, 2:52 p.m. UTC | #3
From: Andi Kleen <andi@firstfloor.org>
Date: Wed, 31 Oct 2012 13:47:01 -0700

> Brian Haley <brian.haley@hp.com> writes:
> 
>> Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
>> will then require another call like if_indextoname() to get the actual interface
>> name, have it return the name directly.
>>
>> This also matches the existing man page description on socket(7) which mentions
>> the argument being an interface name.
>>
>> If the value has not been set, zero is returned and optlen will be set to zero
>> to indicate there is no interface name present.
> 
> That will break all existing programs using the return value, right?
> Better to fix the manpage

It never returned a value before, you could not getsockopt() on
SO_BINDTODEVICE previously.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov Nov. 2, 2012, 9:36 a.m. UTC | #4
> +static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> +				int __user *optlen, int len)
> +{
> +	int ret = -ENOPROTOOPT;
> +#ifdef CONFIG_NETDEVICES
> +	struct net *net = sock_net(sk);
> +	struct net_device *dev;
> +	char devname[IFNAMSIZ];
> +
> +	if (sk->sk_bound_dev_if == 0) {
> +		len = 0;
> +		goto zero;
> +	}
> +
> +	ret = -EINVAL;
> +	if (len < IFNAMSIZ)
> +		goto out;
> +
> +	rcu_read_lock();
> +	dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
> +	if (dev)
> +		strcpy(devname, dev->name);

This still races with the device name change, potentially providing
a name which never existed in the system, doesn't it?

> +	rcu_read_unlock();
> +	ret = -ENODEV;
> +	if (!dev)
> +		goto out;
> +
> +	len = strlen(devname) + 1;
> +
> +	ret = -EFAULT;
> +	if (copy_to_user(optval, devname, len))
> +		goto out;
> +
> +zero:
> +	ret = -EFAULT;
> +	if (put_user(len, optlen))
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +#endif
> +
> +	return ret;
> +}

Thanks,
Pavel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley Nov. 2, 2012, 3:02 p.m. UTC | #5
On 11/02/2012 05:36 AM, Pavel Emelyanov wrote:
>> +static int sock_getbindtodevice(struct sock *sk, char __user *optval,
>> +				int __user *optlen, int len)
>> +{
>> +	int ret = -ENOPROTOOPT;
>> +#ifdef CONFIG_NETDEVICES
>> +	struct net *net = sock_net(sk);
>> +	struct net_device *dev;
>> +	char devname[IFNAMSIZ];
>> +
>> +	if (sk->sk_bound_dev_if == 0) {
>> +		len = 0;
>> +		goto zero;
>> +	}
>> +
>> +	ret = -EINVAL;
>> +	if (len < IFNAMSIZ)
>> +		goto out;
>> +
>> +	rcu_read_lock();
>> +	dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
>> +	if (dev)
>> +		strcpy(devname, dev->name);
> 
> This still races with the device name change, potentially providing
> a name which never existed in the system, doesn't it?

My only argument here is that SIOCGIFNAME has had this same code forever, and
noone has ever complained about that returning a garbled name.  Even
dev_get_by_name() only holds an rcu lock when doing a strncmp().

We'd need to audit the whole kernel to catch all the places where we potentially
look at dev->name while it could change.  Is it really worth it?

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 2, 2012, 11:34 p.m. UTC | #6
On Fri, 2012-11-02 at 11:02 -0400, Brian Haley wrote:
> On 11/02/2012 05:36 AM, Pavel Emelyanov wrote:
> >> +static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> >> +				int __user *optlen, int len)
> >> +{
> >> +	int ret = -ENOPROTOOPT;
> >> +#ifdef CONFIG_NETDEVICES
> >> +	struct net *net = sock_net(sk);
> >> +	struct net_device *dev;
> >> +	char devname[IFNAMSIZ];
> >> +
> >> +	if (sk->sk_bound_dev_if == 0) {
> >> +		len = 0;
> >> +		goto zero;
> >> +	}
> >> +
> >> +	ret = -EINVAL;
> >> +	if (len < IFNAMSIZ)
> >> +		goto out;
> >> +
> >> +	rcu_read_lock();
> >> +	dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
> >> +	if (dev)
> >> +		strcpy(devname, dev->name);
> > 
> > This still races with the device name change, potentially providing
> > a name which never existed in the system, doesn't it?
> 
> My only argument here is that SIOCGIFNAME has had this same code forever, and
> noone has ever complained about that returning a garbled name.  Even
> dev_get_by_name() only holds an rcu lock when doing a strncmp().
> 
> We'd need to audit the whole kernel to catch all the places where we potentially
> look at dev->name while it could change.  Is it really worth it?

A net device name can't be changed while the device is up, or while
another task holds the RTNL lock.  I think that covers almost all uses.
I don't know whether it's worth going out to look for exceptions, but we
might as well fix the cases we know about.

Ben.
Brian Haley Nov. 6, 2012, 4:01 a.m. UTC | #7
On 11/02/2012 05:34 PM, Ben Hutchings wrote:
> On Fri, 2012-11-02 at 11:02 -0400, Brian Haley wrote:
>> On 11/02/2012 05:36 AM, Pavel Emelyanov wrote:
>>>> +static int sock_getbindtodevice(struct sock *sk, char __user *optval,
>>>> +				int __user *optlen, int len)
>>>> +{
>>>> +	int ret = -ENOPROTOOPT;
>>>> +#ifdef CONFIG_NETDEVICES
>>>> +	struct net *net = sock_net(sk);
>>>> +	struct net_device *dev;
>>>> +	char devname[IFNAMSIZ];
>>>> +
>>>> +	if (sk->sk_bound_dev_if == 0) {
>>>> +		len = 0;
>>>> +		goto zero;
>>>> +	}
>>>> +
>>>> +	ret = -EINVAL;
>>>> +	if (len < IFNAMSIZ)
>>>> +		goto out;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
>>>> +	if (dev)
>>>> +		strcpy(devname, dev->name);
>>>
>>> This still races with the device name change, potentially providing
>>> a name which never existed in the system, doesn't it?
>>
>> My only argument here is that SIOCGIFNAME has had this same code forever, and
>> noone has ever complained about that returning a garbled name.  Even
>> dev_get_by_name() only holds an rcu lock when doing a strncmp().
>>
>> We'd need to audit the whole kernel to catch all the places where we potentially
>> look at dev->name while it could change.  Is it really worth it?
>
> A net device name can't be changed while the device is up, or while
> another task holds the RTNL lock.  I think that covers almost all uses.
> I don't know whether it's worth going out to look for exceptions, but we
> might as well fix the cases we know about.

So do you think we can fix these corner cases later and get the API 
right first?

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 8, 2012, 12:23 a.m. UTC | #8
On Mon, 2012-11-05 at 21:01 -0700, Brian Haley wrote:
> On 11/02/2012 05:34 PM, Ben Hutchings wrote:
> > On Fri, 2012-11-02 at 11:02 -0400, Brian Haley wrote:
> >> On 11/02/2012 05:36 AM, Pavel Emelyanov wrote:
> >>>> +static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> >>>> +				int __user *optlen, int len)
> >>>> +{
> >>>> +	int ret = -ENOPROTOOPT;
> >>>> +#ifdef CONFIG_NETDEVICES
> >>>> +	struct net *net = sock_net(sk);
> >>>> +	struct net_device *dev;
> >>>> +	char devname[IFNAMSIZ];
> >>>> +
> >>>> +	if (sk->sk_bound_dev_if == 0) {
> >>>> +		len = 0;
> >>>> +		goto zero;
> >>>> +	}
> >>>> +
> >>>> +	ret = -EINVAL;
> >>>> +	if (len < IFNAMSIZ)
> >>>> +		goto out;
> >>>> +
> >>>> +	rcu_read_lock();
> >>>> +	dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
> >>>> +	if (dev)
> >>>> +		strcpy(devname, dev->name);
> >>>
> >>> This still races with the device name change, potentially providing
> >>> a name which never existed in the system, doesn't it?
> >>
> >> My only argument here is that SIOCGIFNAME has had this same code forever, and
> >> noone has ever complained about that returning a garbled name.  Even
> >> dev_get_by_name() only holds an rcu lock when doing a strncmp().
> >>
> >> We'd need to audit the whole kernel to catch all the places where we potentially
> >> look at dev->name while it could change.  Is it really worth it?
> >
> > A net device name can't be changed while the device is up, or while
> > another task holds the RTNL lock.  I think that covers almost all uses.
> > I don't know whether it's worth going out to look for exceptions, but we
> > might as well fix the cases we know about.
> 
> So do you think we can fix these corner cases later and get the API 
> right first?

Well you can avoid the problem here by using rtnl_{,un}lock() and
__dev_get_by_index().  But if that's too heavyweight (it depends on how
often you think this might need to be called) then I suppose it can be
left until we have a general solution for races with renaming.

Ben.
Brian Haley Nov. 8, 2012, 3:36 p.m. UTC | #9
On 11/05/2012 09:01 PM, Brian Haley wrote:
>>>>> +    rcu_read_lock();
>>>>> +    dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
>>>>> +    if (dev)
>>>>> +        strcpy(devname, dev->name);
>>>>
>>>> This still races with the device name change, potentially providing
>>>> a name which never existed in the system, doesn't it?
>>>
>>> My only argument here is that SIOCGIFNAME has had this same code
>>> forever, and
>>> noone has ever complained about that returning a garbled name.  Even
>>> dev_get_by_name() only holds an rcu lock when doing a strncmp().
>>>
>>> We'd need to audit the whole kernel to catch all the places where we
>>> potentially
>>> look at dev->name while it could change.  Is it really worth it?
>>
>> A net device name can't be changed while the device is up, or while
>> another task holds the RTNL lock.  I think that covers almost all uses.
>> I don't know whether it's worth going out to look for exceptions, but we
>> might as well fix the cases we know about.
>
> So do you think we can fix these corner cases later and get the API
> right first?

Hi Dave,

I noticed this isn't in patchwork anymore.  Is it possible to get this 
in and then work on the other issue in a separate patch since it's not 
just this code that has this problem?

Of course I'm still not convinced that extra check is necessary (but 
I'll do it to make others happy) because if you do a setsockopt("eth0"), 
all you care is that a getsockopt() returns "eth0" - if it returns 
anything else, eth0_renamed, eth0_foo, etc you'll notice it's not the 
same and take action.  And the fact that the name can't change when UP 
means the odds are small it can happen anyways.

Thanks,

-Brian

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 8, 2012, 8:02 p.m. UTC | #10
From: Brian Haley <brian.haley@hp.com>
Date: Thu, 08 Nov 2012 08:36:02 -0700

> I noticed this isn't in patchwork anymore.

I want all the races resolved, and in any event the thing to do
when something isn't in patchwork IS TO RESUBMIT IT!

Every time you ask me stuff like this you take up more of my already
low amount of time.

Today I'm again without internet access other than my cell phone, so
everything you can do TO REDUCE my overhead would be greatly
appreciated.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 0a023b8..9172ff4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -505,7 +505,8 @@  struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie)
 }
 EXPORT_SYMBOL(sk_dst_check);

-static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
+static int sock_setbindtodevice(struct sock *sk, char __user *optval,
+				int optlen)
 {
 	int ret = -ENOPROTOOPT;
 #ifdef CONFIG_NETDEVICES
@@ -562,6 +563,52 @@  out:
 	return ret;
 }

+static int sock_getbindtodevice(struct sock *sk, char __user *optval,
+				int __user *optlen, int len)
+{
+	int ret = -ENOPROTOOPT;
+#ifdef CONFIG_NETDEVICES
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+	char devname[IFNAMSIZ];
+
+	if (sk->sk_bound_dev_if == 0) {
+		len = 0;
+		goto zero;
+	}
+
+	ret = -EINVAL;
+	if (len < IFNAMSIZ)
+		goto out;
+
+	rcu_read_lock();
+	dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
+	if (dev)
+		strcpy(devname, dev->name);
+	rcu_read_unlock();
+	ret = -ENODEV;
+	if (!dev)
+		goto out;
+
+	len = strlen(devname) + 1;
+
+	ret = -EFAULT;
+	if (copy_to_user(optval, devname, len))
+		goto out;
+
+zero:
+	ret = -EFAULT;
+	if (put_user(len, optlen))
+		goto out;
+
+	ret = 0;
+
+out:
+#endif
+
+	return ret;
+}
+
 static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
 {
 	if (valbool)
@@ -589,7 +636,7 @@  int sock_setsockopt(struct socket *sock, int level, int optname,
 	 */

 	if (optname == SO_BINDTODEVICE)
-		return sock_bindtodevice(sk, optval, optlen);
+		return sock_setbindtodevice(sk, optval, optlen);

 	if (optlen < sizeof(int))
 		return -EINVAL;
@@ -1074,9 +1121,10 @@  int sock_getsockopt(struct socket *sock, int level, int
optname,
 	case SO_NOFCS:
 		v.val = sock_flag(sk, SOCK_NOFCS);
 		break;
+
 	case SO_BINDTODEVICE:
-		v.val = sk->sk_bound_dev_if;
-		break;
+		return sock_getbindtodevice(sk, optval, optlen, len);
+
 	default:
 		return -ENOPROTOOPT;