diff mbox

[2/2] sctp: fix heartbeat count of path failure

Message ID 1250665268-29770-2-git-send-email-chunbo.luo@windriver.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

chunbo.luo@windriver.com Aug. 19, 2009, 7:01 a.m. UTC
RFC4960 Section 8.2 defined that the transport should enter INACTIVE
state only when the value in the error counter exceeds the protocol 
parameter 'Path.Max.Retrans' of that destination address. This means 
that the transport should enter INACTIVE state after pathmaxrxt+1
heartbeats are not acknowledged.


Signed-off-by: Chunbo Luo <chunbo.luo@windriver.com>
---
 net/sctp/sm_sideeffect.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Vlad Yasevich Aug. 19, 2009, 2:48 p.m. UTC | #1
Chunbo Luo wrote:
> RFC4960 Section 8.2 defined that the transport should enter INACTIVE
> state only when the value in the error counter exceeds the protocol 
> parameter 'Path.Max.Retrans' of that destination address. This means 
> that the transport should enter INACTIVE state after pathmaxrxt+1
> heartbeats are not acknowledged.
> 
> 
> Signed-off-by: Chunbo Luo <chunbo.luo@windriver.com>

NAK.  This patch seems to resurface periodically and I have to keep
explaining that it's wrong.

Every time we send a HB, we tick up the error count and clear it when
the HB-ACK is received.  Each HB is separate and not a retransmission,
so we once we reach the pathmaxrxt, we've already sent max+1 HB, so we
have time out.  Walk through the code with some values and you'll see
what I mean.

-vlad

> ---
>  net/sctp/sm_sideeffect.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 86426aa..0e2e269 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -447,7 +447,7 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
>  		asoc->overall_error_count++;
>  
>  	if (transport->state != SCTP_INACTIVE &&
> -	    (transport->error_count++ >= transport->pathmaxrxt)) {
> +	    (transport->error_count++ > transport->pathmaxrxt)) {
>  		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
>  					 " transport IP: port:%d failed.\n",
>  					 asoc,

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
chunbo.luo@windriver.com Aug. 20, 2009, 1:36 a.m. UTC | #2
On Wed, 2009-08-19 at 10:48 -0400, Vlad Yasevich wrote:
> Chunbo Luo wrote:
> > RFC4960 Section 8.2 defined that the transport should enter INACTIVE
> > state only when the value in the error counter exceeds the protocol 
> > parameter 'Path.Max.Retrans' of that destination address. This means 
> > that the transport should enter INACTIVE state after pathmaxrxt+1
> > heartbeats are not acknowledged.
> > 
> > 
> > Signed-off-by: Chunbo Luo <chunbo.luo@windriver.com>
> 
> NAK.  This patch seems to resurface periodically and I have to keep
> explaining that it's wrong.
> 
> Every time we send a HB, we tick up the error count and clear it when
> the HB-ACK is received.  Each HB is separate and not a retransmission,
> so we once we reach the pathmaxrxt, we've already sent max+1 HB, so we
> have time out.  Walk through the code with some values and you'll see
> what I mean.

Although we've already sent max+1 HB, but the code set the transport to
INACTIVE state immediately, which is equal to not sending the max+1 HB
at all.  We should wait for a next period and make sure the max+1 HB was
not acknowledged.

Chunbo 

> 
> -vlad
> 
> > ---
> >  net/sctp/sm_sideeffect.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> > index 86426aa..0e2e269 100644
> > --- a/net/sctp/sm_sideeffect.c
> > +++ b/net/sctp/sm_sideeffect.c
> > @@ -447,7 +447,7 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
> >  		asoc->overall_error_count++;
> >  
> >  	if (transport->state != SCTP_INACTIVE &&
> > -	    (transport->error_count++ >= transport->pathmaxrxt)) {
> > +	    (transport->error_count++ > transport->pathmaxrxt)) {
> >  		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
> >  					 " transport IP: port:%d failed.\n",
> >  					 asoc,
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Aug. 20, 2009, 2:10 p.m. UTC | #3
Luo Chunbo wrote:
> On Wed, 2009-08-19 at 10:48 -0400, Vlad Yasevich wrote:
>> Chunbo Luo wrote:
>>> RFC4960 Section 8.2 defined that the transport should enter INACTIVE
>>> state only when the value in the error counter exceeds the protocol 
>>> parameter 'Path.Max.Retrans' of that destination address. This means 
>>> that the transport should enter INACTIVE state after pathmaxrxt+1
>>> heartbeats are not acknowledged.
>>>
>>>
>>> Signed-off-by: Chunbo Luo <chunbo.luo@windriver.com>
>> NAK.  This patch seems to resurface periodically and I have to keep
>> explaining that it's wrong.
>>
>> Every time we send a HB, we tick up the error count and clear it when
>> the HB-ACK is received.  Each HB is separate and not a retransmission,
>> so we once we reach the pathmaxrxt, we've already sent max+1 HB, so we
>> have time out.  Walk through the code with some values and you'll see
>> what I mean.
> 
> Although we've already sent max+1 HB, but the code set the transport to
> INACTIVE state immediately, which is equal to not sending the max+1 HB
> at all.  We should wait for a next period and make sure the max+1 HB was
> not acknowledged.

Ok, I just re-read the section, and although we don't quite wait long
enough to mark the transport DOWN, this patch will cause us to send an
extra HB chunk, thus exceeding pathmaxrxt by 2.

What the spec really says is that an error is incremented when an
outstanding HB is not acknowledged.

So this code needs a bit of a rework.  This patch is still NAKed.

-vlad

> 
> Chunbo 
> 
>> -vlad
>>
>>> ---
>>>  net/sctp/sm_sideeffect.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>>> index 86426aa..0e2e269 100644
>>> --- a/net/sctp/sm_sideeffect.c
>>> +++ b/net/sctp/sm_sideeffect.c
>>> @@ -447,7 +447,7 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
>>>  		asoc->overall_error_count++;
>>>  
>>>  	if (transport->state != SCTP_INACTIVE &&
>>> -	    (transport->error_count++ >= transport->pathmaxrxt)) {
>>> +	    (transport->error_count++ > transport->pathmaxrxt)) {
>>>  		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
>>>  					 " transport IP: port:%d failed.\n",
>>>  					 asoc,
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 86426aa..0e2e269 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -447,7 +447,7 @@  static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
 		asoc->overall_error_count++;
 
 	if (transport->state != SCTP_INACTIVE &&
-	    (transport->error_count++ >= transport->pathmaxrxt)) {
+	    (transport->error_count++ > transport->pathmaxrxt)) {
 		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
 					 " transport IP: port:%d failed.\n",
 					 asoc,