PATCH: Multicast: Filter multicast traffic per socket mc_list

Message ID	alpine.DEB.1.10.0904161035390.19650@qirst.com
State	Rejected, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Thu, 16 Apr 2009 10:38:23 -0400 (EDT) From: Christoph Lameter <cl@linux.com> To: David Miller <davem@davemloft.net> cc: netdev@vger.kernel.org, vladislav.yasevich@hp.com, nhorman@tuxdriver.com, dlstevens@us.ibm.com Subject: PATCH: Multicast: Filter multicast traffic per socket mc_list Message-ID: <alpine.DEB.1.10.0904161035390.19650@qirst.com> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: netdev-owner@vger.kernel.org Precedence: bulk

Christoph Lameter (Ampere) April 16, 2009, 2:38 p.m. UTC

Do what David Stevens suggest: Add a per socket option



Subject: Multicast: Filter Multicast traffic per socket mc_list

If two processes open the same port as a multicast socket and then
join two different multicast groups then traffic for both multicast groups
is forwarded to either process. This means that application will get surprising
data that they did not ask for. Applications will have to filter these out in
order to work correctly if multiple apps run on the same system.

These are pretty strange semantics but they have been around since the
beginning of multicast support on Unix systems. Most of the other operating
systems supporting Multicast have since changed to only supplying multicast
traffic to a socket that was selected through multicast join operations.

This patch does change Linux to behave in the same way. But there may be
applications that rely on the old behavior. Therefore we provide a means
to switch back to the old behavior using a new multicast socket option

	IP_MULTICAST_ALL

If set then all multicast traffic to the port is forwarded to the socket
(additional constraints are the SSM inclusion and exclusion lists!).
If not set (default) then only traffic for multicast groups that were
joined by thesocket is received.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 include/linux/in.h      |    1 +
 include/net/inet_sock.h |    3 ++-
 net/ipv4/igmp.c         |    4 ++--
 net/ipv4/ip_sockglue.c  |   11 +++++++++++
 4 files changed, 16 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Stevens April 16, 2009, 3:09 p.m. UTC | #1

This isn't what I suggested-- you have the default backwards. It must 
default
to current behavior, or it's pointless.

The text you have with it is overstated, too. Of course applications using
your model can still receive unexpected data-- it does not reserve the
port or multicast address to just your sender or to multicast traffic 
alone.

My suggestion is to do nothing. :-) But if that's too difficult, an 
alternative
would be a socket option that delivers traffic for joined groups only and
defaults off. In fact, it'd probably be most useful if it also prevents 
unicast
traffic for sockets using that port, too. None of these things have the 
magic
effect of preventing unwanted data delivery, but it'd allow you to receive
multiple, specific groups on a single socket with just the joins to 
indicate
which.

                                                +-DLS


netdev-owner@vger.kernel.org wrote on 04/16/2009 07:38:23 AM:

> Do what David Stevens suggest: Add a per socket option
> 
> 
> 
> Subject: Multicast: Filter Multicast traffic per socket mc_list
> 
> If two processes open the same port as a multicast socket and then
> join two different multicast groups then traffic for both multicast 
groups
> is forwarded to either process. This means that application will get 
surprising
> data that they did not ask for. Applications will have to filter these 
out in
> order to work correctly if multiple apps run on the same system.
> 
> These are pretty strange semantics but they have been around since the
> beginning of multicast support on Unix systems. Most of the other 
operating
> systems supporting Multicast have since changed to only supplying 
multicast
> traffic to a socket that was selected through multicast join operations.
> 
> This patch does change Linux to behave in the same way. But there may be
> applications that rely on the old behavior. Therefore we provide a means
> to switch back to the old behavior using a new multicast socket option
> 
>    IP_MULTICAST_ALL
> 
> If set then all multicast traffic to the port is forwarded to the socket
> (additional constraints are the SSM inclusion and exclusion lists!).
> If not set (default) then only traffic for multicast groups that were
> joined by thesocket is received.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  include/linux/in.h      |    1 +
>  include/net/inet_sock.h |    3 ++-
>  net/ipv4/igmp.c         |    4 ++--
>  net/ipv4/ip_sockglue.c  |   11 +++++++++++
>  4 files changed, 16 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/include/net/inet_sock.h
> ===================================================================
> --- linux-2.6.orig/include/net/inet_sock.h   2009-04-16 
08:59:20.000000000 -0500
> +++ linux-2.6/include/net/inet_sock.h   2009-04-16 09:04:47.000000000 
-0500
> @@ -130,7 +130,8 @@ struct inet_sock {
>              freebind:1,
>              hdrincl:1,
>              mc_loop:1,
> -            transparent:1;
> +            transparent:1,
> +            mc_all:1;
>     int         mc_index;
>     __be32         mc_addr;
>     struct ip_mc_socklist   *mc_list;
> Index: linux-2.6/net/ipv4/igmp.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/igmp.c   2009-04-16 08:54:47.000000000 -0500
> +++ linux-2.6/net/ipv4/igmp.c   2009-04-16 09:04:06.000000000 -0500
> @@ -2187,7 +2187,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
>     struct ip_sf_socklist *psl;
>     int i;
> 
> -   if (!ipv4_is_multicast(loc_addr))
> +   if (ipv4_is_lbcast(loc_addr) || !ipv4_is_multicast(loc_addr))
>        return 1;
> 
>     for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
> @@ -2196,7 +2196,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
>           break;
>     }
>     if (!pmc)
> -      return 1;
> +      return inet->mc_all;
>     psl = pmc->sflist;
>     if (!psl)
>        return pmc->sfmode == MCAST_EXCLUDE;
> Index: linux-2.6/include/linux/in.h
> ===================================================================
> --- linux-2.6.orig/include/linux/in.h   2009-04-16 09:05:41.000000000 
-0500
> +++ linux-2.6/include/linux/in.h   2009-04-16 09:32:52.000000000 -0500
> @@ -107,6 +107,7 @@ struct in_addr {
>  #define MCAST_JOIN_SOURCE_GROUP      46
>  #define MCAST_LEAVE_SOURCE_GROUP   47
>  #define MCAST_MSFILTER         48
> +#define IP_MULTICAST_ALL      49
> 
>  #define MCAST_EXCLUDE   0
>  #define MCAST_INCLUDE   1
> Index: linux-2.6/net/ipv4/ip_sockglue.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/ip_sockglue.c   2009-04-16 
09:09:52.000000000 -0500
> +++ linux-2.6/net/ipv4/ip_sockglue.c   2009-04-16 09:31:40.000000000 
-0500
> @@ -449,6 +449,7 @@ static int do_ip_setsockopt(struct sock
>                (1<<IP_ROUTER_ALERT) | (1<<IP_FREEBIND) |
>                (1<<IP_PASSSEC) | (1<<IP_TRANSPARENT))) ||
>         optname == IP_MULTICAST_TTL ||
> +       optname == IP_MULTICAST_ALL ||
>         optname == IP_MULTICAST_LOOP ||
>         optname == IP_RECVORIGDSTADDR) {
>        if (optlen >= sizeof(int)) {
> @@ -895,6 +896,13 @@ static int do_ip_setsockopt(struct sock
>        kfree(gsf);
>        break;
>     }
> +   case IP_MULTICAST_ALL:
> +      if (optlen<1)
> +         goto e_inval;
> +      if (val != 0 && val != 1)
> +         goto e_inval;
> +      inet->mc_all = val;
> +      break;
>     case IP_ROUTER_ALERT:
>        err = ip_ra_control(sk, val ? 1 : 0, NULL);
>        break;
> @@ -1147,6 +1155,9 @@ static int do_ip_getsockopt(struct sock
>        release_sock(sk);
>        return err;
>     }
> +   case IP_MULTICAST_ALL:
> +      val = inet->mc_all;
> +      break;
>     case IP_PKTOPTIONS:
>     {
>        struct msghdr msg;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Neil Horman April 16, 2009, 3:15 p.m. UTC | #2

On Thu, Apr 16, 2009 at 10:38:23AM -0400, Christoph Lameter wrote:
> Do what David Stevens suggest: Add a per socket option
> 
> 
> 
> Subject: Multicast: Filter Multicast traffic per socket mc_list
> 
> If two processes open the same port as a multicast socket and then
> join two different multicast groups then traffic for both multicast groups
> is forwarded to either process. This means that application will get surprising
> data that they did not ask for. Applications will have to filter these out in
> order to work correctly if multiple apps run on the same system.
> 
> These are pretty strange semantics but they have been around since the
> beginning of multicast support on Unix systems. Most of the other operating
> systems supporting Multicast have since changed to only supplying multicast
> traffic to a socket that was selected through multicast join operations.
> 
> This patch does change Linux to behave in the same way. But there may be
> applications that rely on the old behavior. Therefore we provide a means
> to switch back to the old behavior using a new multicast socket option
> 
> 	IP_MULTICAST_ALL
> 
> If set then all multicast traffic to the port is forwarded to the socket
> (additional constraints are the SSM inclusion and exclusion lists!).
> If not set (default) then only traffic for multicast groups that were
> joined by thesocket is received.
> 
I think your comment is reveresed here isn't it?  the default you have below is
that mc_all is set, which defaults you to the existing behavior, rather than the
new behavior introduced by this patch.


Ack to the patch though
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Neil


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vlad Yasevich April 16, 2009, 3:24 p.m. UTC | #3

Christoph Lameter wrote:
> Do what David Stevens suggest: Add a per socket option
> 
> 
> 
> Subject: Multicast: Filter Multicast traffic per socket mc_list
> 
> If two processes open the same port as a multicast socket and then
> join two different multicast groups then traffic for both multicast groups
> is forwarded to either process. This means that application will get surprising
> data that they did not ask for. Applications will have to filter these out in
> order to work correctly if multiple apps run on the same system.
> 
> These are pretty strange semantics but they have been around since the
> beginning of multicast support on Unix systems. Most of the other operating
> systems supporting Multicast have since changed to only supplying multicast
> traffic to a socket that was selected through multicast join operations.
> 
> This patch does change Linux to behave in the same way. But there may be
> applications that rely on the old behavior. Therefore we provide a means
> to switch back to the old behavior using a new multicast socket option
> 
> 	IP_MULTICAST_ALL
> 
> If set then all multicast traffic to the port is forwarded to the socket
> (additional constraints are the SSM inclusion and exclusion lists!).
> If not set (default) then only traffic for multicast groups that were
> joined by thesocket is received.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  include/linux/in.h      |    1 +
>  include/net/inet_sock.h |    3 ++-
>  net/ipv4/igmp.c         |    4 ++--
>  net/ipv4/ip_sockglue.c  |   11 +++++++++++
>  4 files changed, 16 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/include/net/inet_sock.h
> ===================================================================
> --- linux-2.6.orig/include/net/inet_sock.h	2009-04-16 08:59:20.000000000 -0500
> +++ linux-2.6/include/net/inet_sock.h	2009-04-16 09:04:47.000000000 -0500
> @@ -130,7 +130,8 @@ struct inet_sock {
>  				freebind:1,
>  				hdrincl:1,
>  				mc_loop:1,
> -				transparent:1;
> +				transparent:1,
> +				mc_all:1;
>  	int			mc_index;
>  	__be32			mc_addr;
>  	struct ip_mc_socklist	*mc_list;
> Index: linux-2.6/net/ipv4/igmp.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/igmp.c	2009-04-16 08:54:47.000000000 -0500
> +++ linux-2.6/net/ipv4/igmp.c	2009-04-16 09:04:06.000000000 -0500
> @@ -2187,7 +2187,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
>  	struct ip_sf_socklist *psl;
>  	int i;
> 
> -	if (!ipv4_is_multicast(loc_addr))
> +	if (ipv4_is_lbcast(loc_addr) || !ipv4_is_multicast(loc_addr))
>  		return 1;

I don't think this change is needed.  ipv4_is_lbcast() checks if the
address is 255.255.255.255.  That address is already !ipv4_is_multicast().

Subnet broadcasts are also !ipv4_is_multicast.

> 
>  	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
> @@ -2196,7 +2196,7 @@ int ip_mc_sf_allow(struct sock *sk, __be
>  			break;
>  	}
>  	if (!pmc)
> -		return 1;
> +		return inet->mc_all;
>  	psl = pmc->sflist;
>  	if (!psl)
>  		return pmc->sfmode == MCAST_EXCLUDE;
> Index: linux-2.6/include/linux/in.h
> ===================================================================
> --- linux-2.6.orig/include/linux/in.h	2009-04-16 09:05:41.000000000 -0500
> +++ linux-2.6/include/linux/in.h	2009-04-16 09:32:52.000000000 -0500
> @@ -107,6 +107,7 @@ struct in_addr {
>  #define MCAST_JOIN_SOURCE_GROUP		46
>  #define MCAST_LEAVE_SOURCE_GROUP	47
>  #define MCAST_MSFILTER			48
> +#define IP_MULTICAST_ALL		49
> 
>  #define MCAST_EXCLUDE	0
>  #define MCAST_INCLUDE	1
> Index: linux-2.6/net/ipv4/ip_sockglue.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/ip_sockglue.c	2009-04-16 09:09:52.000000000 -0500
> +++ linux-2.6/net/ipv4/ip_sockglue.c	2009-04-16 09:31:40.000000000 -0500
> @@ -449,6 +449,7 @@ static int do_ip_setsockopt(struct sock
>  			     (1<<IP_ROUTER_ALERT) | (1<<IP_FREEBIND) |
>  			     (1<<IP_PASSSEC) | (1<<IP_TRANSPARENT))) ||
>  	    optname == IP_MULTICAST_TTL ||
> +	    optname == IP_MULTICAST_ALL ||
>  	    optname == IP_MULTICAST_LOOP ||
>  	    optname == IP_RECVORIGDSTADDR) {
>  		if (optlen >= sizeof(int)) {
> @@ -895,6 +896,13 @@ static int do_ip_setsockopt(struct sock
>  		kfree(gsf);
>  		break;
>  	}
> +	case IP_MULTICAST_ALL:
> +		if (optlen<1)
> +			goto e_inval;
> +		if (val != 0 && val != 1)
> +			goto e_inval;
> +		inet->mc_all = val;
> +		break;
>  	case IP_ROUTER_ALERT:
>  		err = ip_ra_control(sk, val ? 1 : 0, NULL);
>  		break;
> @@ -1147,6 +1155,9 @@ static int do_ip_getsockopt(struct sock
>  		release_sock(sk);
>  		return err;
>  	}
> +	case IP_MULTICAST_ALL:
> +		val = inet->mc_all;
> +		break;
>  	case IP_PKTOPTIONS:
>  	{
>  		struct msghdr msg;

You might need to set inet->mc_all to 1 in inet_create() since I am not sure if
we want to change the default behavior.  The knowledge that some apps have
a very "unique" way of doing multicast makes me a little hesitant.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 16, 2009, 3:36 p.m. UTC | #4

On Thu, 16 Apr 2009, David Stevens wrote:

> This isn't what I suggested-- you have the default backwards. It must
> default
> to current behavior, or it's pointless.

If it would default to the current behavior then it would be incompatible
with the behavior of other operating systems and the surprising behavior
of the Linux multicast stack would continue to exist. The unusual behavior
needs to be switched on if wanted for legacy or other reasons.

> The text you have with it is overstated, too. Of course applications using
> your model can still receive unexpected data-- it does not reserve the
> port or multicast address to just your sender or to multicast traffic
> alone.

The application will no longer receive traffic from multicast groups that
it did not subscribe to. Yes unicast can still result in unexpected
traffic.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 16, 2009, 3:36 p.m. UTC | #5

On Thu, 16 Apr 2009, Neil Horman wrote:

> I think your comment is reveresed here isn't it?  the default you have below is
> that mc_all is set, which defaults you to the existing behavior, rather than the
> new behavior introduced by this patch.

mc_all is 0 by default.

> Ack to the patch though
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Neil

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 16, 2009, 3:39 p.m. UTC | #6

On Thu, 16 Apr 2009, Vlad Yasevich wrote:

> > -	if (!ipv4_is_multicast(loc_addr))
> > +	if (ipv4_is_lbcast(loc_addr) || !ipv4_is_multicast(loc_addr))
> >  		return 1;
>
> I don't think this change is needed.  ipv4_is_lbcast() checks if the
> address is 255.255.255.255.  That address is already !ipv4_is_multicast().
>
> Subnet broadcasts are also !ipv4_is_multicast.

ok will drop this.

> >  	{
> >  		struct msghdr msg;
>
> You might need to set inet->mc_all to 1 in inet_create() since I am not sure if
> we want to change the default behavior.  The knowledge that some apps have
> a very "unique" way of doing multicast makes me a little hesitant.

Those "unique" applications would only be able to run on Linux.
Application mostly are written for multiple Unix variants. Since the
other Unix variants have changed their default behavior it is reasonable
to also change the default under Linux.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Neil Horman April 16, 2009, 5:44 p.m. UTC | #7

On Thu, Apr 16, 2009 at 11:36:56AM -0400, Christoph Lameter wrote:
> On Thu, 16 Apr 2009, Neil Horman wrote:
> 
> > I think your comment is reveresed here isn't it?  the default you have below is
> > that mc_all is set, which defaults you to the existing behavior, rather than the
> > new behavior introduced by this patch.
> 
> mc_all is 0 by default.
> 
> > Ack to the patch though
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Neil
> 
> Thanks.
> 
I'm sorry, I misread it (confused the definiton of a bitfield with its default
value.  As Dave noted, the default needs to be the current behavior, not your
new behavior.  Until thats changed, I rescind my Ack
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 16, 2009, 7:12 p.m. UTC | #8

On Thu, 16 Apr 2009, Neil Horman wrote:

> I'm sorry, I misread it (confused the definiton of a bitfield with its default
> value.  As Dave noted, the default needs to be the current behavior, not your
> new behavior.  Until thats changed, I rescind my Ack

Well guess then we need the global proc setting after all. With the
current misbehavior as a default applications need to be rebuilt and
source code that is running on multiple OSes now would have to customized
to special case for Linux.

So add a global proc setting to determine the initial setting of IP_MULTICAST_ALL?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Stevens April 16, 2009, 8:56 p.m. UTC | #9

> Well guess then we need the global proc setting after all. With the
> current misbehavior as a default applications need to be rebuilt and

        The current behavior, as either your or Vlad's RFC quotes pointed
out as easily as the history to go with it, is exactly the expected 
behavior
for decades. I think it is not misbehavior so much as your misconception,
though a common one.

> source code that is running on multiple OSes now would have to 
customized
> to special case for Linux.

        No, actually. If you write it for the current behavior, it'll work
fine on an OS like Solaris that has departed from the original socket
behavior. If you're sloppy and don't handle unexpected traffic, it'll be
wrong on both-- you just won't know it until someone runs something with
the same port and multicast address on your network and wrecks your app.

> So add a global proc setting to determine the initial setting of 
IP_MULTICAST_ALL?

        This breaks unknown existing applications that are correctly
written. I think it's clearly wrong to change the behavior of someone
else's socket to match your idea of how it should've been done 25 years
too late. An option that enables new behavior for your own socket, which
must be a new app, is fine. Adding a socket option as part of a port
is no great hurdle, and I'm guessing you aren't trying to run a Solaris
binary on Linux. So what's the problem?

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 16, 2009, 9:04 p.m. UTC | #10

On Thu, 16 Apr 2009, David Stevens wrote:

> must be a new app, is fine. Adding a socket option as part of a port
> is no great hurdle, and I'm guessing you aren't trying to run a Solaris
> binary on Linux. So what's the problem?

Guess its the obvious: Software should run on multiple OSes without
too much special casing. Linux is the only special case that I am aware of
that misbehaves.

Adding a socket is no easy thing given the architecture of the software
(and of other software) that did not consider that Linux faithfully
replicating bugs from 25 years ago that no longer exist in other OSes.

Cannot imagine there to be too much software out there that relies on this
strange behavior. Otherwise the software would not work on various other
platforms.

Can you give us a list of products that verifiably rely on the current
behavior?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vlad Yasevich April 16, 2009, 9:19 p.m. UTC | #11

David Stevens wrote:
>> Well guess then we need the global proc setting after all. With the
>> current misbehavior as a default applications need to be rebuilt and
> 
>         The current behavior, as either your or Vlad's RFC quotes pointed
> out as easily as the history to go with it, is exactly the expected behavior
> for decades. I think it is not misbehavior so much as your misconception,
> though a common one.
> 

What seems to be happening though, is that there is an expectation that
this behavior would change with advent of IGMPv3, which adds the additional
filtering text.  Now, we could point out that there is no normative text
that requires this filtering on groups, only on sources, but the expectation
is still there.

>> source code that is running on multiple OSes now would have to  customized
>> to special case for Linux.
> 
>         No, actually. If you write it for the current behavior, it'll work
> fine on an OS like Solaris that has departed from the original socket
> behavior. If you're sloppy and don't handle unexpected traffic, it'll be
> wrong on both-- you just won't know it until someone runs something with
> the same port and multicast address on your network and wrecks your app.

I'd have to reluctantly agree here.  Any application that expects original
multicast behavior will be broken by a system-wide change.  I think existing
applications have already figured out all the workarounds they need.

> 
>> So add a global proc setting to determine the initial setting of 
> IP_MULTICAST_ALL?
> 
>         This breaks unknown existing applications that are correctly
> written. I think it's clearly wrong to change the behavior of someone
> else's socket to match your idea of how it should've been done 25 years
> too late. An option that enables new behavior for your own socket, which
> must be a new app, is fine. Adding a socket option as part of a port
> is no great hurdle, and I'm guessing you aren't trying to run a Solaris
> binary on Linux. So what's the problem?
> 
>                                                                 +-DLS

I wonder how BSD and Solaris got away with it?  They both filter on multicast
groups and source addresses.  This is not meant as rhetorical or provocative,
just genuinely wondering.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Stevens April 16, 2009, 9:54 p.m. UTC | #12

Christoph Lameter <cl@linux.com> wrote on 04/16/2009 02:04:30 PM:

> Guess its the obvious: Software should run on multiple OSes without
> too much special casing. Linux is the only special case that I am aware 
of
> that misbehaves.

        All flavors of UNIX did it this way originally. I never tried
it on Windows. I heard years ago when Solaris changed their behavior
and it's been reported in this thread that current BSD does, too.
But, again, this is not in the least misbehavior. It simply doesn't
follow your model of how you thought it behaved. Linux does exactly
what Steve Deering wanted multicasting to do when he wrote the RFC
for it. It adds an address on the interface, and the binding determines
whether it's delivered to a particular socket or not. That is the
"ANY" in INADDR_ANY, just like unicasting. If you want particular
addresses only, the bind system call does that already. It makes
perfect sense to me.
 
> Adding a socket is no easy thing given the architecture of the software
> (and of other software) that did not consider that Linux faithfully
> replicating bugs from 25 years ago that no longer exist in other OSes.

        I don't have any say in what other OSes do, but I'd call it a bug
in them, too. 

> Cannot imagine there to be too much software out there that relies on 
this
> strange behavior. Otherwise the software would not work on various other
> platforms.

        I don't know the extent of your survey, but Linux legacy is the
problem with changing the default behavior for sockets other than your
app. You don't need any special code at all-- write them all to assume
they may receive packets not for them, because they are broken if they
don't. That works fine on Solaris, too.

> Can you give us a list of products that verifiably rely on the current
> behavior?

        I don't do app surveys any more than you do OS surveys. But
I don't want to change the semantics of multicast sockets and you do.
Can you guarantee nothing will break from this change?

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller April 16, 2009, 10:15 p.m. UTC | #13

From: Christoph Lameter <cl@linux.com>
Date: Thu, 16 Apr 2009 11:36:10 -0400 (EDT)

> On Thu, 16 Apr 2009, David Stevens wrote:
> 
>> This isn't what I suggested-- you have the default backwards. It must
>> default
>> to current behavior, or it's pointless.
> 
> If it would default to the current behavior then it would be incompatible
> with the behavior of other operating systems and the surprising behavior
> of the Linux multicast stack would continue to exist. The unusual behavior
> needs to be switched on if wanted for legacy or other reasons.

Umm, no.

We don't break existing applications "by default".

You're being entirely selfish here, you want your application to work
without having to specify the socket option to get the new behavior.

Well guess what?  Under Linux you will have to!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller April 16, 2009, 10:16 p.m. UTC | #14

From: Christoph Lameter <cl@linux.com>
Date: Thu, 16 Apr 2009 15:12:43 -0400 (EDT)

> On Thu, 16 Apr 2009, Neil Horman wrote:
> 
>> I'm sorry, I misread it (confused the definiton of a bitfield with its default
>> value.  As Dave noted, the default needs to be the current behavior, not your
>> new behavior.  Until thats changed, I rescind my Ack
> 
> Well guess then we need the global proc setting after all.

No Christoph, do this right.

Linux by default will behave the way it has for 15+ years.  And if an
application wants new behavior, you have to ask for it.

End of story.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller April 16, 2009, 10:19 p.m. UTC | #15

From: Christoph Lameter <cl@linux.com>
Date: Thu, 16 Apr 2009 17:04:30 -0400 (EDT)

> Can you give us a list of products that verifiably rely on the current
> behavior?

Christoph just drop this, we're not creating a system-wide default
selection that backs away from 15+ years of precedence.

Maybe Solaris has so few users that it's OK for them to go down
that path, but for us it's unacceptable to do things like this.

Fix your application.  And as David noted, it will be not only
more robust, but also still work on those "other systems."

So even your "works on all systems" argument is groundless.  If
you make it work under Linux it will in fact work on all systems,
and be more robust in the case of other applications using the
same multicast address and port.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller April 16, 2009, 10:20 p.m. UTC | #16

From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Thu, 16 Apr 2009 17:19:14 -0400

> I wonder how BSD and Solaris got away with it?  They both filter on
> multicast groups and source addresses.  This is not meant as
> rhetorical or provocative, just genuinely wondering.

Smaller user base.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Stevens April 16, 2009, 10:22 p.m. UTC | #17

Vlad Yasevich wrote on 04/16/2009 02:19:14 PM:

> What seems to be happening though, is that there is an expectation that
> this behavior would change with advent of IGMPv3, which adds the 
additional
> filtering text.  Now, we could point out that there is no normative text
> that requires this filtering on groups, only on sources, but the 
expectation
> is still there.

        I have no such expectation. :-) The additional filters are 
(already)
applied per-socket, but existing apps not using source filters behave as
they did before IGMPv3. That's what I'd expect.
        The RFC you quoted for SSM applies to only the SSM address space,
mentions this behavior explicitly as the norm for outside of that space,
and Linux doesn't support that RFC. If it did, it would include an
address range check as part of it.

> I wonder how BSD and Solaris got away with it?  They both filter on 
multicast
> groups and source addresses.  This is not meant as rhetorical or 
provocative,
> just genuinely wondering.

        I think in practice, it doesn't come up much. That's why people
seem so surprised to learn it works this way, and not the way they
thought it did after using it, sometimes for years. But the documentation
doesn't say a join limits what you receive on a socket, or that it
has to be the same socket you're doing I/O on; people simply assume it.

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

stephen hemminger April 16, 2009, 11:30 p.m. UTC | #18

On Thu, 16 Apr 2009 15:22:49 -0700
David Stevens <dlstevens@us.ibm.com> wrote:

> Vlad Yasevich wrote on 04/16/2009 02:19:14 PM:
> 
> > What seems to be happening though, is that there is an expectation that
> > this behavior would change with advent of IGMPv3, which adds the 
> additional
> > filtering text.  Now, we could point out that there is no normative text
> > that requires this filtering on groups, only on sources, but the 
> expectation
> > is still there.
> 
>         I have no such expectation. :-) The additional filters are 
> (already)
> applied per-socket, but existing apps not using source filters behave as
> they did before IGMPv3. That's what I'd expect.
>         The RFC you quoted for SSM applies to only the SSM address space,
> mentions this behavior explicitly as the norm for outside of that space,
> and Linux doesn't support that RFC. If it did, it would include an
> address range check as part of it.
> 
> > I wonder how BSD and Solaris got away with it?  They both filter on 
> multicast
> > groups and source addresses.  This is not meant as rhetorical or 
> provocative,
> > just genuinely wondering.
> 
>         I think in practice, it doesn't come up much. That's why people
> seem so surprised to learn it works this way, and not the way they
> thought it did after using it, sometimes for years. But the documentation
> doesn't say a join limits what you receive on a socket, or that it
> has to be the same socket you're doing I/O on; people simply assume it.
> 
>                                                                 +-DLS

You could always use packet/socket filter to keep the packets from
coming out to user space.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vlad Yasevich April 17, 2009, 12:01 a.m. UTC | #19

David Stevens wrote:
> Vlad Yasevich wrote on 04/16/2009 02:19:14 PM:
> 
>> What seems to be happening though, is that there is an expectation that
>> this behavior would change with advent of IGMPv3, which adds the 
> additional
>> filtering text.  Now, we could point out that there is no normative text
>> that requires this filtering on groups, only on sources, but the 
> expectation
>> is still there.
> 
>         I have no such expectation. :-) The additional filters are 
> (already)
> applied per-socket, but existing apps not using source filters behave as
> they did before IGMPv3. That's what I'd expect.
>         The RFC you quoted for SSM applies to only the SSM address space,
> mentions this behavior explicitly as the norm for outside of that space,
> and Linux doesn't support that RFC. If it did, it would include an
> address range check as part of it.

Yes, after reading more of SSM spec, it definitely only applies to SSM
addresses that we don't support yet.  Just to clear this one item up,
I think the expectation comes from the IGMPv3 spec:

     Filtering of packets based upon a socket's multicast reception
     state is a new feature of this service interface.  The previous
     service interface [RFC1112] described no filtering based upon
     multicast join state; rather, a join on a socket simply caused the
     host to join a group on the given interface, and packets destined
     for that group could be delivered to all sockets whether they had
     joined or not.

I could be inferred from this rather vague text that in addition to source
filtering, group filters should be done.  Thus the expectation that we've
been dealing with.

That's the last I'll mention this, since most salient points have been
agreed on.

Thanks
-vlad

> 
>> I wonder how BSD and Solaris got away with it?  They both filter on 
> multicast
>> groups and source addresses.  This is not meant as rhetorical or 
> provocative,
>> just genuinely wondering.
> 
>         I think in practice, it doesn't come up much. That's why people
> seem so surprised to learn it works this way, and not the way they
> thought it did after using it, sometimes for years. But the documentation
> doesn't say a join limits what you receive on a socket, or that it
> has to be the same socket you're doing I/O on; people simply assume it.
> 
>                                                                 +-DLS
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 17, 2009, 1:56 p.m. UTC | #20

On Thu, 16 Apr 2009, David Miller wrote:

> No Christoph, do this right.
>
	> Linux by default will behave the way it has for 15+ years.  And if an
> application wants new behavior, you have to ask for it.
>
> End of story.

This is not right. All other OSes filter multicast traffic according to
the multicast groups subscribed too (and that includes the evil one).
There is no requirement of asking for "new" behavior. Why should multicast
applications have to add special code to request something that comes by
default on other platforms?

The old behavior does not seem to be usable anyways and its certainly
looks buggy if multicast packets are duplicated by the kernel and sent to
applications that never have asked for it. And OS should do the sane thing
by default and not only if someone asks for it.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nivedita Singhvi April 17, 2009, 3:37 p.m. UTC | #21

Christoph Lameter wrote:
> On Thu, 16 Apr 2009, David Miller wrote:
> 
>> No Christoph, do this right.
>>
> 	> Linux by default will behave the way it has for 15+ years.  And if an
>> application wants new behavior, you have to ask for it.
>>
>> End of story.
> 
> This is not right. All other OSes filter multicast traffic according to
> the multicast groups subscribed too (and that includes the evil one).
> There is no requirement of asking for "new" behavior. Why should multicast
> applications have to add special code to request something that comes by
> default on other platforms?

I need the current behaviour to not change, as it would
break some people I support.  DaveM is making the right
decision here, and I fully support this.

And I'm one of those people working on low latency and
hoping messaging clients get better in their multicast
usage..just that this is not one of those ways.

Ideally, you could tweak OS environment configuration
setting, if you don't want per socket. But it cannot
be the default.

thanks,
Nivedita

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 17, 2009, 4:02 p.m. UTC | #22

On Fri, 17 Apr 2009, Nivedita Singhvi wrote:

> I need the current behaviour to not change, as it would
> break some people I support.  DaveM is making the right
> decision here, and I fully support this.

People or applications? There are applications that only run on Linux and
fail on other OS? How does this work? Special casing depending on the OS
running?

> Ideally, you could tweak OS environment configuration
> setting, if you don't want per socket. But it cannot
> be the default.

Would you support an additional OS config variable that would set the
default for socket operations? Then we could have a per socket option that
would allow overriding the OS config variable?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nivedita Singhvi April 17, 2009, 4:28 p.m. UTC | #23

Christoph Lameter wrote:

>> Ideally, you could tweak OS environment configuration
>> setting, if you don't want per socket. But it cannot
>> be the default.
> 
> Would you support an additional OS config variable that would set the
> default for socket operations? Then we could have a per socket option that
> would allow overriding the OS config variable?

That would be my choice personally, because it would be
easier than scripting some solution to modify potentially
hundreds of sockets on a system...

Does that sound acceptable?

thanks,
Nivedita
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Stevens April 17, 2009, 9:31 p.m. UTC | #24

netdev-owner@vger.kernel.org wrote on 04/17/2009 06:56:04 AM:

> On Thu, 16 Apr 2009, David Miller wrote:
> 
> > No Christoph, do this right.
> >
>    > Linux by default will behave the way it has for 15+ years.  And if 
an
> > application wants new behavior, you have to ask for it.
> >
> > End of story.
> 
> This is not right. All other OSes filter multicast traffic according to
> the multicast groups subscribed too (and that includes the evil one).

        This is not true.

> There is no requirement of asking for "new" behavior. Why should 
multicast
> applications have to add special code to request something that comes by
> default on other platforms?

        Linux is not Solaris. I think Solaris is wrong to change the
behavior from the original BSD behavior, but it should be no surprise
that there are other differences in the API's, too. It's not difficult
to write code that works as intended on both, and the case Solaris is
trying to avoid is not really avoided since you can still receive
unicast traffic, or totally unrelated multicast traffic on the shared
port and multicast address space. If the app doesn't use the port to
distinguish it, it simply should bind the multicast address it wants,
use PKTINFO, SO_BINDTODEVICE or the like as well. In your case, multiple
sockets or filtering based on the "to" address are possibilties that
work on Solaris too, and fix more unintended traffic problems than
just a different group.
        A per-socket option is a more trivial way to do this, but
turning it on for sockets that want the existing, intended and
long-standing behavior is obviously wrong.

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller April 17, 2009, 10:24 p.m. UTC | #25

From: Christoph Lameter <cl@linux.com>
Date: Fri, 17 Apr 2009 12:02:19 -0400 (EDT)

> On Fri, 17 Apr 2009, Nivedita Singhvi wrote:
> 
>> I need the current behaviour to not change, as it would
>> break some people I support.  DaveM is making the right
>> decision here, and I fully support this.
> 
> People or applications? There are applications that only run on Linux and
> fail on other OS? How does this work? Special casing depending on the OS
> running?

Christoph I just want to let you know that I'm totally ignoring
everything further you say on this issue, becuase you're way out of
line and totally ignoring the real issues here.

What's next?  Tomorrow, if you think Linux's open() system call
behavior doesn't suit your needs, I want you to send a sysctl patch to
Al Viro that changes the system wide behavior and we'll see how far
you get with that.

The fact is, you cannot just say "oops we didn't mean to do that" when
something has behaved a certain way, visible to users, for more that
15 years.

And the fact is, WE DID MEAN to do things this way.

As David Stevens explained, the original creator of multicasting, the
original BSD code, and the RFCs, INTENDED this behavior from the very
beginning.

You want to ignore all of this, as if none of it matters and that what
you want to achieve is so much more important.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 20, 2009, 4:43 p.m. UTC | #26

On Fri, 17 Apr 2009, David Stevens wrote:

>         Linux is not Solaris. I think Solaris is wrong to change the
> behavior from the original BSD behavior, but it should be no surprise
> that there are other differences in the API's, too. It's not difficult
> to write code that works as intended on both, and the case Solaris is
> trying to avoid is not really avoided since you can still receive
> unicast traffic, or totally unrelated multicast traffic on the shared
> port and multicast address space. If the app doesn't use the port to

By that you mean unrelated multicast traffic destined to the same
multicast address and port?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Lameter (Ampere) April 20, 2009, 6:10 p.m. UTC | #27

On Fri, 17 Apr 2009, David Miller wrote:

> And the fact is, WE DID MEAN to do things this way.

I fully agree. We meant to do this.

> As David Stevens explained, the original creator of multicasting, the
> original BSD code, and the RFCs, INTENDED this behavior from the very
> beginning.
>
> You want to ignore all of this, as if none of it matters and that what
> you want to achieve is so much more important.

I am not ignoring it. It seems just that other OSes have moved from this
and we are one of the last holdouts. Its not only Solaris but also BSD and
Windoze. Best to have a solution that is consistent across multiple OSes.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Stevens April 20, 2009, 6:46 p.m. UTC | #28

Christoph Lameter <cl@linux.com> wrote on 04/20/2009 09:43:45 AM:

> On Fri, 17 Apr 2009, David Stevens wrote:
> 
> >         Linux is not Solaris. I think Solaris is wrong to change the
> > behavior from the original BSD behavior, but it should be no surprise
> > that there are other differences in the API's, too. It's not difficult
> > to write code that works as intended on both, and the case Solaris is
> > trying to avoid is not really avoided since you can still receive
> > unicast traffic, or totally unrelated multicast traffic on the shared
> > port and multicast address space. If the app doesn't use the port to
> 
> By that you mean unrelated multicast traffic destined to the same
> multicast address and port?

        Yes. If neither the port nor the multicast address are
registered than anyone on your network can use them for anything. 
Even if they are registered, someone may still use it; sending
requires no special privilege, and neither does joing groups or
binding to ports above 1024. Anyone on your network, or within
your multicast routing domain, may reuse both (even if they
intend it for a different machine) and your app will receive
them.

        I think generally the best approach is to bind to the
particular multicast address and use SO_BINDTODEVICE if it
matters to the app. But the app still has to handle receiving
data from a different source or totally unrelated data;
it certainly can receive those, because anyone can send those.

        I can see the value of a per-socket, default-off option
in the case where you want multiple groups on a single socket,
and I encourage you to submit that as a patch. It reduces the
work the receiver has to do, but doesn't eliminate it. The
way I'd do that is to use multiple sockets, one bound to each
group, but ok. As long as it doesn't change the existing
behavior out from under existing, unknown apps.

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

PATCH: Multicast: Filter multicast traffic per socket mc_list

Commit Message

Comments

Patch