diff mbox

linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?

Message ID 4BE031FA.6040006@hp.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Brian Haley May 4, 2010, 2:40 p.m. UTC
David Miller wrote:
> From: David Stevens <dlstevens@us.ibm.com>
> Date: Tue, 4 May 2010 00:48:46 -0700
> 
>> It's set to -1 by default, but the common code for unicast and
>> multicast in getsockopt is falling through to use the dst_entry.
>>
>> I believe (though I haven't actually tried it recently) it actually
>> uses "1" for the default value for multicast;

No, on-the-wire it's actually 64.

> It doesn't, all of the uses in the ipv6 stack say something like:
> 
> 	if (multicast)
> 		hlimit = np->mcast_hops;
> 	else
> 		hlimit = np->hop_limit;
> 	if (hlimit < 0)
> 		hlimit = ip6_dst_hoplimit(dst);
> 
> Therefore, the change suggested by Elliot and which I committed is the
> way to get the correct behavior and fix this.

Not exactly.  It fixes the case where it's wrong by default, but
the corner case of setting it to -1 via setsockopt() says:

    x == -1:       use kernel default

But that will revert back to the kernel using 64 on the next transmit.
I can work on an update to this that makes a new mcast_hops per-interface
setting and makes ip6_dst_hoplimit() aware of it.  Or even easier, just
have setsockopt() trap the -1 and set np->mcast_hops to 1.  Built but
untested patch below.

-Brian

--


Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).

Signed-off-by: Brian Haley <brian.haley@hp.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Stevens May 4, 2010, 4:12 p.m. UTC | #1
I think the original code was intending to do late binding -- carry "-1" 
as
meaning "not set by user" and use the default value _at_the_time_of_
_the_send_, and in its context. For that to have worked, the checks for
"<0" in the send paths should've checked for multicast and used the
multicast default as you're saying, Brian. And doing that not on the
set, but when generating packets, is what I would've expected.

I don't see anything that's broken by changing it to use the default at
the time of the set since for mcast the default is really a constant,
and in fact, it looks like in addition to not actually using the default 
of 1,
it was returning "-1" in the cmsg when not set by the user (and it, too,
should've been "1", which it would return now).

But if the default is different for each destination or interface in
the multicast case (ie, by adding conf settings for mcast), then
it really should do late binding and leave it as "-1" in the set, right?
That's what I thought it was already doing, but apparently not;
I think it used to, but maybe I just didn't notice.

                                        +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley May 4, 2010, 4:43 p.m. UTC | #2
David Stevens wrote:
> I think the original code was intending to do late binding -- carry "-1" 
> as
> meaning "not set by user" and use the default value _at_the_time_of_
> _the_send_, and in its context. For that to have worked, the checks for
> "<0" in the send paths should've checked for multicast and used the
> multicast default as you're saying, Brian. And doing that not on the
> set, but when generating packets, is what I would've expected.

Right, we could do it that way, but then how far do we unravel the thread?
Unicast hoplimit is settable in the route, do we add a mcast_hops there
too, in addition to the per-interface tunable?  I think just having it
the recommended default is good enough here, until someone shows they
have the need to do more.

> I don't see anything that's broken by changing it to use the default at
> the time of the set since for mcast the default is really a constant,
> and in fact, it looks like in addition to not actually using the default 
> of 1,
> it was returning "-1" in the cmsg when not set by the user (and it, too,
> should've been "1", which it would return now).
> 
> But if the default is different for each destination or interface in
> the multicast case (ie, by adding conf settings for mcast), then
> it really should do late binding and leave it as "-1" in the set, right?
> That's what I thought it was already doing, but apparently not;
> I think it used to, but maybe I just didn't notice.

Yes, that would be the ideal fix, and give the admin more control over
the value, but it seems like overkill to me.  It's been 64 for a while,
and it's always been changeable by apps.  I guess the only thing to
think about is there could be an app that works because it being 64
today, but will break tomorrow.  Having a tunable parameter will let
you get the app working without re-writing it.

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 4, 2010, 5:05 p.m. UTC | #3
> Yes, that would be the ideal fix, and give the admin more control over
> the value, but it seems like overkill to me.  It's been 64 for a while,
> and it's always been changeable by apps.  I guess the only thing to
> think about is there could be an app that works because it being 64
> today, but will break tomorrow.  Having a tunable parameter will let
> you get the app working without re-writing it.

        Well, it should've been 1, and any app relying on having
multicast routing really should set it explicitly.
        I think per-interface defaulting to 1 should be ok. I'd
prefer carrying the "-1" so apps that set it get what they want
and apps that don't carry the current default, rather than the
value at the time the socket was created, but practically it probably
doesn't matter. In reality, apps that need more than one will
already be setting it to a non-default value.

                                                        +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 4, 2010, 9:38 p.m. UTC | #4
From: Brian Haley <brian.haley@hp.com>
Date: Tue, 04 May 2010 10:40:58 -0400

> Not exactly.  It fixes the case where it's wrong by default, but
> the corner case of setting it to -1 via setsockopt() says:
> 
>     x == -1:       use kernel default
> 
> But that will revert back to the kernel using 64 on the next transmit.
> I can work on an update to this that makes a new mcast_hops per-interface
> setting and makes ip6_dst_hoplimit() aware of it.  Or even easier, just
> have setsockopt() trap the -1 and set np->mcast_hops to 1.  Built but
> untested patch below.

I thought that we agreed that when the user explicitly asks for "-1"
it should get the behavior right now, with is to use
ip6_dst_hoplimit()?  I think I even acknowledged when Elliot mentioned
this explicitly, and I think it's a good idea.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 4, 2010, 9:39 p.m. UTC | #5
From: David Stevens <dlstevens@us.ibm.com>
Date: Tue, 4 May 2010 09:12:59 -0700

> But if the default is different for each destination or interface in
> the multicast case (ie, by adding conf settings for mcast), then
> it really should do late binding and leave it as "-1" in the set, right?
> That's what I thought it was already doing, but apparently not;
> I think it used to, but maybe I just didn't notice.

Unlike other people in this thread who sometimes aren't even checking
how the current code works, I checked all of the available source
control history in this area and this code has always behaved this way.

Right from day one.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 4, 2010, 9:46 p.m. UTC | #6
From: Brian Haley <brian.haley@hp.com>
Date: Tue, 04 May 2010 10:40:58 -0400

> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
> 
> Signed-off-by: Brian Haley <brian.haley@hp.com>

In cast it wasn't clear from my other reply, I'm not applying this
patch because I intentionally left this behavior there based upon
some comments from Elliot in that this lets developers get the
old default by asking for "-1" explicitly with a setsockopt.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
enh May 4, 2010, 10:26 p.m. UTC | #7
On Tue, May 4, 2010 at 14:46, David Miller <davem@davemloft.net> wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 04 May 2010 10:40:58 -0400
>
>> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
>> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
>>
>> Signed-off-by: Brian Haley <brian.haley@hp.com>
>
> In cast it wasn't clear from my other reply, I'm not applying this
> patch because I intentionally left this behavior there based upon
> some comments from Elliot in that this lets developers get the
> old default by asking for "-1" explicitly with a setsockopt.

(for the record, i don't need that behavior myself, and have no
opinion on whether or not it makes sense for you guys... i'll only
ever call setsockopt with 0 <= value <= 255. all i need is for the
default when i never call setsockopt to be 1. for now, i've added a
work-around where i explicitly call setsockopt with 1 when i create
the socket.)
David Miller May 4, 2010, 11:07 p.m. UTC | #8
From: enh <enh@google.com>
Date: Tue, 4 May 2010 15:26:52 -0700

> On Tue, May 4, 2010 at 14:46, David Miller <davem@davemloft.net> wrote:
>> From: Brian Haley <brian.haley@hp.com>
>> Date: Tue, 04 May 2010 10:40:58 -0400
>>
>>> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
>>> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
>>>
>>> Signed-off-by: Brian Haley <brian.haley@hp.com>
>>
>> In cast it wasn't clear from my other reply, I'm not applying this
>> patch because I intentionally left this behavior there based upon
>> some comments from Elliot in that this lets developers get the
>> old default by asking for "-1" explicitly with a setsockopt.
> 
> (for the record, i don't need that behavior myself, and have no
> opinion on whether or not it makes sense for you guys... i'll only
> ever call setsockopt with 0 <= value <= 255. all i need is for the
> default when i never call setsockopt to be 1. for now, i've added a
> work-around where i explicitly call setsockopt with 1 when i create
> the socket.)

It's more of an issue of having at least some kind of compatability
story when we change this.

With what's in the tree now we can at least say "if you explicitly
setsockopt() the value to '-1' you will get the same behavior now
as beforehand"

Whereas with what others are suggesting, we can't give people a way
in their applications to do that other than to suggest they use
disgusting concoctions like "set non-multicast hoplimit to '-1',
getsockopt() that, then set the multicast hop explicitly to that"

And even that won't work the same as now, in that changes to the
per-route metric will be ignored.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley May 5, 2010, 3:36 p.m. UTC | #9
David Miller wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 04 May 2010 10:40:58 -0400
> 
>> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
>> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
>>
>> Signed-off-by: Brian Haley <brian.haley@hp.com>
> 
> In cast it wasn't clear from my other reply, I'm not applying this
> patch because I intentionally left this behavior there based upon
> some comments from Elliot in that this lets developers get the
> old default by asking for "-1" explicitly with a setsockopt.

I now see that in Elliot's email, but I think it's incorrect.  The RFC
says that setting it to -1 should get you the kernel default, which is
now 1.  Without this change, setting it to -1 will get you 64, the
old behavior.  If the user wants to, they can always just set it to
64 themselves, that's better than assuming when you set it to -1
you're going to get 64.

I'm just trying to make this follow the RFC and behave like other OSes
for consistency.

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 5, 2010, 10 p.m. UTC | #10
From: Brian Haley <brian.haley@hp.com>
Date: Wed, 05 May 2010 11:36:31 -0400

> I now see that in Elliot's email, but I think it's incorrect.  The RFC
> says that setting it to -1 should get you the kernel default, which is
> now 1.  Without this change, setting it to -1 will get you 64, the
> old behavior.  If the user wants to, they can always just set it to
> 64 themselves, that's better than assuming when you set it to -1
> you're going to get 64.

It's not 64, it's whatever the per-route metric is.

> I'm just trying to make this follow the RFC and behave like other OSes
> for consistency.

I'm just trying to have a real compatability story, and we don't with
any of the proposals you guys are giving me because it complete ignores
the fact that the old default comes from the route and is not some
constant.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley May 6, 2010, 1:50 a.m. UTC | #11
David Miller wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Wed, 05 May 2010 11:36:31 -0400
> 
>> I now see that in Elliot's email, but I think it's incorrect.  The RFC
>> says that setting it to -1 should get you the kernel default, which is
>> now 1.  Without this change, setting it to -1 will get you 64, the
>> old behavior.  If the user wants to, they can always just set it to
>> 64 themselves, that's better than assuming when you set it to -1
>> you're going to get 64.
> 
> It's not 64, it's whatever the per-route metric is.

Not unless that metric's been set via RTAX_HOPLIMIT (and I believe
this is the unicast hop limit value anyways), and that metric
defaults to -1.  Routes added via a Router Advertisement are most
likely going to have a hop limit of 64, but I believe that's only
supposed to apply to unicast.

I *did* search the kernel code and test this before my original reply - it
uses the unicast hop limit from the interface as Elliot originally showed.

~# sysctl net.ipv6.conf.eth2.hop_limit
net.ipv6.conf.eth2.hop_limit = 64

21:04:48.766181 IP6 (hlim 64, next-header UDP (17) payload length: 108)
    fe80::21f:29ff:fef0:2f46.48914 > ip6-allrouters.7639: UDP, length 100

~# sysctl net.ipv6.conf.eth2.hop_limit=63
net.ipv6.conf.eth2.hop_limit = 63

21:05:09.670190 IP6 (hlim 63, next-header UDP (17) payload length: 108)
    fe80::21f:29ff:fef0:2f46.48914 > ip6-allrouters.7639: UDP, length 100

At this point in time I'll gladly implement a per-interface sysctl
to end this discussion.

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 6, 2010, 7:10 a.m. UTC | #12
From: Brian Haley <brian.haley@hp.com>
Date: Wed, 05 May 2010 21:50:18 -0400

> David Miller wrote:
>> From: Brian Haley <brian.haley@hp.com>
>> Date: Wed, 05 May 2010 11:36:31 -0400
>> 
>>> I now see that in Elliot's email, but I think it's incorrect.  The RFC
>>> says that setting it to -1 should get you the kernel default, which is
>>> now 1.  Without this change, setting it to -1 will get you 64, the
>>> old behavior.  If the user wants to, they can always just set it to
>>> 64 themselves, that's better than assuming when you set it to -1
>>> you're going to get 64.
>> 
>> It's not 64, it's whatever the per-route metric is.
> 
> Not unless that metric's been set via RTAX_HOPLIMIT (and I believe
> this is the unicast hop limit value anyways), and that metric
> defaults to -1.

Right, if it is, and anyone who does set it and expects the default
multicast hop limit to follow along have no portable way to code their
application in a way that works before and after fixing the RFC
issues.

I gave them a way, by making explicit setting of "-1" do what it's
always done.

> At this point in time I'll gladly implement a per-interface sysctl
> to end this discussion.

The game is over, the result decided, and this is just post-game
discussion as far as I'm concerned. :-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index bd43f01..fa6875b 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -486,7 +486,10 @@  done:
 			goto e_inval;
 		if (val > 255 || val < -1)
 			goto e_inval;
-		np->mcast_hops = val;
+		if (val == -1)
+			np->mcast_hops = IPV6_DEFAULT_MCASTHOPS;
+		else
+			np->mcast_hops = val;
 		retv = 0;
 		break;