diff mbox

igmp: Staggered igmp report intervals for unsolicited igmp reports

Message ID alpine.DEB.2.00.1009221631520.32661@router.home
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Christoph Lameter (Ampere) Sept. 22, 2010, 9:33 p.m. UTC
On Wed, 22 Sep 2010, Bob Arendt wrote:

> multicast traffic is received. While IGMPv2 defines an "Unsolicited Report
> Interval" default of 10 seconds, it appears that this is a significant enough
> issue that the later IGMPv3 document calls out a default of 1 second, and
> goes on to define a "Robustness Variable" and talks about the same case that
> Christoph is trying to mitigate.

Actually that suggests a different way to reach the same goal:


Subject: igmp: Make unsolicited report interval conform to RFC3376

RFC3376 specifies a shorter time interval for sending igmp joins.
This can address issues where joins are slow because the initial join is
frequently lost.

Also increment the frequency so that we get a 10 reports send over a
few seconds.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 net/ipv4/igmp.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Stevens Sept. 22, 2010, 9:41 p.m. UTC | #1
Christoph Lameter <cl@linux.com> wrote on 09/22/2010 02:33:14 PM:

 This can address issues where joins are slow because the initial join is
> frequently lost.
> 
> Also increment the frequency so that we get a 10 reports send over a
> few seconds.

        Except you want to conform and not conform at the same time. :-)
IGMPv2 should be: default count 2, interval 10secs
IGMPv3 should be: default count 2, interval 1sec

...and no way is it a good idea to send 10 unsolicited reports on an
Ethernet.

I think system-wide defaults must be as suggested (which allows for
v3 being shortened to 1sec, but not v2) and if you want to use longer
values, you should have either a *per-interface* tunable [ie, the default
value for your interface only] or make these per-interface variables and
have the IB code bump them up for IB interfaces only. An attached
Ethernet on the same system shouldn't be using larger values unless
bumped for some reason by an administrator.

There is no problem with current values on Ethernet; lets not create
one. :-)

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter (Ampere) Sept. 23, 2010, 3:37 p.m. UTC | #2
On Wed, 22 Sep 2010, David Stevens wrote:

> >
> > Also increment the frequency so that we get a 10 reports send over a
> > few seconds.
>
>         Except you want to conform and not conform at the same time. :-)
> IGMPv2 should be: default count 2, interval 10secs
> IGMPv3 should be: default count 2, interval 1sec

This is during the period of unsolicited igmp reports. We do not know if
this group is managed using V3 or V2 since no igmp query/report has been
received yet.

> ...and no way is it a good idea to send 10 unsolicited reports on an
> Ethernet.

Why would that be an issue?

The IGMPv2 RFC has no strict limit and RFC3376
mentions that the retransmission occurs "Robustness Variable" times
minus one. Choosing 10 for the "Robustness Variable" is certainly ok.

If we do not increase the number of reports but just limit the interval
then the chance of outages of a second or so during mc group creation
causing routers missing igmp reports is significantly increased.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens Sept. 27, 2010, 7:24 p.m. UTC | #3
Christoph Lameter <cl@linux.com> wrote on 09/23/2010 08:37:48 AM:

> 
> On Wed, 22 Sep 2010, David Stevens wrote:
> 
> > >
> > > Also increment the frequency so that we get a 10 reports send over a
> > > few seconds.
> >
> >         Except you want to conform and not conform at the same time. 
:-)
> > IGMPv2 should be: default count 2, interval 10secs
> > IGMPv3 should be: default count 2, interval 1sec
> 
> This is during the period of unsolicited igmp reports. We do not know if
> this group is managed using V3 or V2 since no igmp query/report has been
> received yet.

        The default is IGMPv3 unless a v2 querier is present. You can 
force
it to be IGMPv2 with by having an IGMPv2 querier on the network or by 
using
the force_igmp_version tunable.

> > ...and no way is it a good idea to send 10 unsolicited reports on an
> > Ethernet.
> 
> Why would that be an issue?

        Because the traffic for all joins is multiplied by >3. If you're
joining 1 group, maybe that wouldn't be an issue, but what if I join
100, and what if hundreds of other hosts on that network do too? And
applications that dynamically join and leave groups may do this 
"normally."
Even 3 reports on switched networks with low loss is really unnecessary
overkill; 10 is just wasted bandwidth.

> The IGMPv2 RFC has no strict limit and RFC3376
> mentions that the retransmission occurs "Robustness Variable" times
> minus one. Choosing 10 for the "Robustness Variable" is certainly ok.

        Both of them specify the default value and say a querier is the
mechanism for changing that. If you want to follow the RFC, the default
is "2", not "10." While it'd be reasonable for a sysadmin to tune this
per-interface without a querier, it's not reasonable to make all linux
systems on all networks more than triple the number of reports they send
from the RFC-specified default. Right?!? :-)
 
> If we do not increase the number of reports but just limit the interval
> then the chance of outages of a second or so during mc group creation
> causing routers missing igmp reports is significantly increased.

        If you can't send on a group for 1 second, all of the initial
IGMPv3 reports will be lost about half of the time if we make that
conformant (it looks like it now uses the 10sec v2 time instead of the
1 sec v3 time it should). That's a problem IB needs to solve. Ideally,
you wouldn't want to return from the hardware join until you can actually
send the reports, but I expect there are locks held and that can't be 1 
second
of spinning on a processor. So, I think you really should put a queue in
IB for that hardware multicast address and send those packets when/if you
get positive acknowledgement (much as done for ARP completion, but maybe
queue more than 1) from the fabric that you can use it. If you don't get
any sort of ACK for that, then you can instrument a delay for it, but
any fixed number you use may be either too big or too small for a
particular fabric.

                                                                +-DLS



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/net/ipv4/igmp.c
===================================================================
--- linux-2.6.orig/net/ipv4/igmp.c	2010-09-22 16:28:17.000000000 -0500
+++ linux-2.6/net/ipv4/igmp.c	2010-09-22 16:28:54.000000000 -0500
@@ -114,9 +114,9 @@ 

 #define IGMP_V1_Router_Present_Timeout		(400*HZ)
 #define IGMP_V2_Router_Present_Timeout		(400*HZ)
-#define IGMP_Unsolicited_Report_Interval	(10*HZ)
+#define IGMP_Unsolicited_Report_Interval	(HZ)
 #define IGMP_Query_Response_Interval		(10*HZ)
-#define IGMP_Unsolicited_Report_Count		2
+#define IGMP_Unsolicited_Report_Count		10


 #define IGMP_Initial_Report_Delay		(1)