Message ID | 66945532-2470-4240-b213-bc36791b934b@huawei.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net-next] tcp: Optimize the recovery of tcp when lack of SACK | expand |
On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote: > > From: Junwei Hu <hujunwei4@huawei.com> > > In the document of RFC2582(https://tools.ietf.org/html/rfc2582) > introduced two separate scenarios for tcp congestion control: > There are two separate scenarios in which the TCP sender could > receive three duplicate acknowledgements acknowledging "send_high" > but no more than "send_high". One scenario would be that the data > sender transmitted four packets with sequence numbers higher than > "send_high", that the first packet was dropped in the network, and > the following three packets triggered three duplicate > acknowledgements acknowledging "send_high". The second scenario > would be that the sender unnecessarily retransmitted three packets > below "send_high", and that these three packets triggered three > duplicate acknowledgements acknowledging "send_high". In the absence > of SACK, the TCP sender in unable to distinguish between these two > scenarios. > > We encountered the second scenario when the third-party switches > does not support SACK, and I use kprobes to find that tcp kept in > CA_Loss state when high_seq equal to snd_nxt. > > All of the packets is acked if high_seq equal to snd_nxt, the TCP > sender is able to distinguish between these two scenarios in > described RFC2582. So the current state can be switched. Can you please elaborate on how the sender is able to distinguish between the two scenarios, after your patch? It seems to me that with this proposed patch, there is the risk of spurious fast recoveries due to 3 dupacks in the second second scenario (the sender unnecessarily retransmitted three packets below "send_high"). Can you please post a packetdrill test to demonstrate that with this patch the TCP sender does not spuriously enter fast recovery in such a scenario? > This patch enhance the TCP congestion control algorithm for lack > of SACK. You describe this as an enhancement. Can you please elaborate on the drawback/downside of staying in CA_Loss in this case you are describing (where you used kprobes to find that TCP stayed in CA_Loss state when high_seq was equal to snd_nxt)? > Signed-off-by: Junwei Hu <hujunwei4@huawei.com> > Reviewed-by: XiaoGang Wang <wangxiaogang3@huawei.com> > Reviewed-by: Yiting Jin <jinyiting@huawei.com> > --- > net/ipv4/tcp_input.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 9615e72..d5573123 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -2385,7 +2385,8 @@ static bool tcp_try_undo_recovery(struct sock *sk) > } else if (tp->rack.reo_wnd_persist) { > tp->rack.reo_wnd_persist--; > } > - if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) { > + if (tp->snd_una == tp->high_seq && > + tcp_is_reno(tp) && tp->snd_nxt > tp->high_seq) { To deal with sequence number wrap-around, sequence number comparisons in TCP need to use the before() and after() helpers, rather than comparison operators. Here it seems the patch should use after() rather than >. However, I think the larger concern is the concern mentioned above. best, neal
On 2020/7/17 22:44, Neal Cardwell wrote: > On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote: >> >> From: Junwei Hu <hujunwei4@huawei.com> >> >> In the document of RFC2582(https://tools.ietf.org/html/rfc2582) >> introduced two separate scenarios for tcp congestion control: > > Can you please elaborate on how the sender is able to distinguish > between the two scenarios, after your patch? > > It seems to me that with this proposed patch, there is the risk of > spurious fast recoveries due to 3 dupacks in the second second > scenario (the sender unnecessarily retransmitted three packets below > "send_high"). Can you please post a packetdrill test to demonstrate > that with this patch the TCP sender does not spuriously enter fast > recovery in such a scenario? > Hi neal, Thanks for you quick reply! What I want to says is when these three numbers: snd_una, high_seq and snd_nxt are the same, that means all data outstanding when the Loss state began have successfully been acknowledged. So the sender is time to exits to the Open state. I'm not sure whether my understanding is correct. >> This patch enhance the TCP congestion control algorithm for lack >> of SACK. > > You describe this as an enhancement. Can you please elaborate on the > drawback/downside of staying in CA_Loss in this case you are > describing (where you used kprobes to find that TCP stayed in CA_Loss > state when high_seq was equal to snd_nxt)? > I tried, but I can't reproduce it by packetdrill. This problem appeared in our production environment. Here is part of the trace message: First ack: #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485) packets_out=4 retrans_out=1 sacked_out=0 lost_out=4 snd_nxt=3427491485 snd_una=3427485917 high_seq=3427491485 reordering=1 mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1 #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) prior_snd_una=3427485917 num_dupack=0 packets_out=0 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427491485 snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1 As we can see by func tcp_fastretrans_alert icsk_ca_state remains CA_Loss (4), and the numbers: snd_nxt, snd_una and high_seq are the same. first dup ack: #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485) packets_out=2 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269 snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2 #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) num_dupack=1 packets_out=2 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269 snd_una=3427491485 high_seq=3427491485 reordering=1 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2 second dup ack: #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485) packets_out=4 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427497053 snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=4 So, I really hope someone can answer whether my understanding is correct. > To deal with sequence number wrap-around, sequence number comparisons > in TCP need to use the before() and after() helpers, rather than > comparison operators. Here it seems the patch should use after() > rather than >. However, I think the larger concern is the concern > mentioned above. > If this patch is useful, I will modify this. Regards Junwei
On Sat, Jul 18, 2020 at 6:43 AM hujunwei <hujunwei4@huawei.com> wrote: > > > On 2020/7/17 22:44, Neal Cardwell wrote: > > On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote: > >> > >> From: Junwei Hu <hujunwei4@huawei.com> > >> > >> In the document of RFC2582(https://tools.ietf.org/html/rfc2582) > >> introduced two separate scenarios for tcp congestion control: > > > > Can you please elaborate on how the sender is able to distinguish > > between the two scenarios, after your patch? > > > > It seems to me that with this proposed patch, there is the risk of > > spurious fast recoveries due to 3 dupacks in the second second > > scenario (the sender unnecessarily retransmitted three packets below > > "send_high"). Can you please post a packetdrill test to demonstrate > > that with this patch the TCP sender does not spuriously enter fast > > recovery in such a scenario? > > > Hi neal, > Thanks for you quick reply! > What I want to says is when these three numbers: snd_una, high_seq and > snd_nxt are the same, that means all data outstanding > when the Loss state began have successfully been acknowledged. Yes, that seems true. > So the sender is time to exits to the Open state. > I'm not sure whether my understanding is correct. I don't think we want the sender to exit to the CA_Open state in these circumstances. I think section 5 ("Avoiding Multiple Fast Retransmits") of RFC 2582 states convincingly that senders should take steps to avoid having duplicate acknowledgements at high_seq trigger a new fast recovery. The Linux TCP implements those steps by *not* exiting to the Open state, and instead staying in CA_Loss or CA_Recovery. To make things more concrete, here is the kind of timeline/scenario I am concerned about with your proposed patch. I have not had cycles to cook a packetdrill test like this, but this is the basic idea: [connection does not have SACK or TCP timestamps enabled] app writes 4*SMSS Send packets P1, P2, P3, P4 TLP, spurious retransmit of P4 spurious RTO, set cwnd to 1, enter CA_Loss, retransmit P1 receive ACK for P1 (original copy) slow-start, increase cwnd to 2, retransmit P2, P3 receive ACK for P2 (original copy) slow-start, increase cwnd to 3, retransmit P4 receive ACK for P3 (original copy) slow-start, increase cwnd to 4 receive ACK for P4 (original copy) slow-start, increase cwnd to 5 [with your patch, at this point the sender does not meet the conditions for "Hold old state until something *above* high_seq is ACKed.", so sender exits CA_Loss and enters Open] app writes 4*MSS send P5, P6, P7, P8 receive dupack for P4 (due to spurious TLP retransmit of P4) receive dupack for P4 (due to spurious CA_Loss retransmit of P1) receive dupack for P4 (due to spurious CA_Loss retransmit of P2) [with your patch, at this point we risk spuriously entering fast recovery because we have received 3 duplicate ACKs for P4] A packetdrill test that shows that this is not the behavior of your proposed patch would help support your proposed patch (presuming > is replaced by after()). best, neal > >> This patch enhance the TCP congestion control algorithm for lack > >> of SACK. > > > > You describe this as an enhancement. Can you please elaborate on the > > drawback/downside of staying in CA_Loss in this case you are > > describing (where you used kprobes to find that TCP stayed in CA_Loss > > state when high_seq was equal to snd_nxt)? > > > I tried, but I can't reproduce it by packetdrill. This problem appeared > in our production environment. Here is part of the trace message: > > First ack: > #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485) > packets_out=4 retrans_out=1 sacked_out=0 lost_out=4 snd_nxt=3427491485 > snd_una=3427485917 high_seq=3427491485 reordering=1 mss_cache=1392 > icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1 > > #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) prior_snd_una=3427485917 > num_dupack=0 packets_out=0 retrans_out=0 sacked_out=0 lost_out=0 > snd_nxt=3427491485 snd_una=3427491485 high_seq=3427491485 reordering=1 > mss_cache=1392 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=1 > > As we can see by func tcp_fastretrans_alert icsk_ca_state remains CA_Loss (4), > and the numbers: snd_nxt, snd_una and high_seq are the same. > > first dup ack: > #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485) > packets_out=2 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269 > snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392 > icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2 > > #tcp_fastretrans_alert: (tcp_fastretrans_alert+0x0/0x7b0) num_dupack=1 packets_out=2 > retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427494269 snd_una=3427491485 > high_seq=3427491485 reordering=1 icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=2 > > second dup ack: > #tcp_ack: (tcp_ack+0x0/0x920) skb_tcp_seq=0x1dc21196 skb_tcp_ack_seq=0x9d5e4bcc(3427491485) > packets_out=4 retrans_out=0 sacked_out=0 lost_out=0 snd_nxt=3427497053 > snd_una=3427491485 high_seq=3427491485 reordering=1 mss_cache=1392 > icsk_ca_state=4 sack_ok=0 undo_retrans=1 snd_cwnd=4 > > So, I really hope someone can answer whether my understanding is correct. > > > To deal with sequence number wrap-around, sequence number comparisons > > in TCP need to use the before() and after() helpers, rather than > > comparison operators. Here it seems the patch should use after() > > rather than >. However, I think the larger concern is the concern > > mentioned above. > > > If this patch is useful, I will modify this. > > Regards Junwei >
On 2020/7/19 11:29, Neal Cardwell wrote: > On Sat, Jul 18, 2020 at 6:43 AM hujunwei <hujunwei4@huawei.com> wrote: >> >> >> On 2020/7/17 22:44, Neal Cardwell wrote: >>> On Fri, Jul 17, 2020 at 7:43 AM hujunwei <hujunwei4@huawei.com> wrote: >>>> >>>> From: Junwei Hu <hujunwei4@huawei.com> >>>> >>>> In the document of RFC2582(https://tools.ietf.org/html/rfc2582) >>>> introduced two separate scenarios for tcp congestion control: >>> >>> Can you please elaborate on how the sender is able to distinguish >>> between the two scenarios, after your patch? >>> >>> It seems to me that with this proposed patch, there is the risk of >>> spurious fast recoveries due to 3 dupacks in the second second >>> scenario (the sender unnecessarily retransmitted three packets below >>> "send_high"). Can you please post a packetdrill test to demonstrate >>> that with this patch the TCP sender does not spuriously enter fast >>> recovery in such a scenario? >>> >> Hi neal, >> Thanks for you quick reply! >> What I want to says is when these three numbers: snd_una, high_seq and >> snd_nxt are the same, that means all data outstanding >> when the Loss state began have successfully been acknowledged. > > Yes, that seems true. > >> So the sender is time to exits to the Open state. >> I'm not sure whether my understanding is correct. > > I don't think we want the sender to exit to the CA_Open state in these > circumstances. I think section 5 ("Avoiding Multiple Fast > Retransmits") of RFC 2582 states convincingly that senders should take > steps to avoid having duplicate acknowledgements at high_seq trigger a > new fast recovery. The Linux TCP implements those steps by *not* > exiting to the Open state, and instead staying in CA_Loss or > CA_RecoverThanks neal, After reading the section 5 of RFC 2582, I think you are right. > > To make things more concrete, here is the kind of timeline/scenario I > am concerned about with your proposed patch. I have not had cycles to > cook a packetdrill test like this, but this is the basic idea: > > [connection does not have SACK or TCP timestamps enabled] > app writes 4*SMSS > Send packets P1, P2, P3, P4 > TLP, spurious retransmit of P4 > spurious RTO, set cwnd to 1, enter CA_Loss, retransmit P1 > receive ACK for P1 (original copy) > slow-start, increase cwnd to 2, retransmit P2, P3 > receive ACK for P2 (original copy) > slow-start, increase cwnd to 3, retransmit P4 > receive ACK for P3 (original copy) > slow-start, increase cwnd to 4 > receive ACK for P4 (original copy) > slow-start, increase cwnd to 5 > [with your patch, at this point the sender does not meet the > conditions for "Hold old state until something *above* high_seq is ACKed.", > so sender exits CA_Loss and enters Open] > app writes 4*MSS > send P5, P6, P7, P8 > receive dupack for P4 (due to spurious TLP retransmit of P4) > receive dupack for P4 (due to spurious CA_Loss retransmit of P1) > receive dupack for P4 (due to spurious CA_Loss retransmit of P2) > [with your patch, at this point we risk spuriously entering > fast recovery because we have received 3 duplicate ACKs for P4] > > A packetdrill test that shows that this is not the behavior of your > proposed patch would help support your proposed patch (presuming > is > replaced by after()). > > best, > neal Thank you for your detailed answer, I felt that I learned a lot. Regards Junwei
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9615e72..d5573123 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2385,7 +2385,8 @@ static bool tcp_try_undo_recovery(struct sock *sk) } else if (tp->rack.reo_wnd_persist) { tp->rack.reo_wnd_persist--; } - if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) { + if (tp->snd_una == tp->high_seq && + tcp_is_reno(tp) && tp->snd_nxt > tp->high_seq) { /* Hold old state until something *above* high_seq * is ACKed. For Reno it is MUST to prevent false * fast retransmits (RFC2582). SACK TCP is safe. */