Message ID | 1363625464-21633-2-git-send-email-dmitry@broadcom.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: "Dmitry Kravkov" <dmitry@broadcom.com> Date: Mon, 18 Mar 2013 18:51:04 +0200 > The patch drives FW to perform RSS for GRE traffic, > based on inner headers. > > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote: > The patch drives FW to perform RSS for GRE traffic, > based on inner headers. > > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> > --- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 +++ > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 23 ++++++++++++----------- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 9 +++++++++ > 3 files changed, 24 insertions(+), 11 deletions(-) This works very well. Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx() (this was a patch from Tom Herbert, commit 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping when rx'ing over tunnel ) Meaning we hit a single cpu for the GRO stuff in ip_gre. I have to think about it. Another question is : Can bnx2x check the tcp checksum if GRE encapsulated ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet > Sent: Tuesday, March 19, 2013 2:08 AM > To: Dmitry Kravkov > Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski > Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic > > On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote: > > The patch drives FW to perform RSS for GRE traffic, > > based on inner headers. > > > > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> > > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> > > --- > > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 +++ > > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 23 ++++++++++++----------- > > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 9 +++++++++ > > 3 files changed, 24 insertions(+), 11 deletions(-) > > This works very well. > > Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx() > > (this was a patch from Tom Herbert, commit > 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping > when rx'ing over tunnel ) > > Meaning we hit a single cpu for the GRO stuff in ip_gre. > > I have to think about it. > > > Another question is : > > Can bnx2x check the tcp checksum if GRE encapsulated ? > Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth?
Can the HW calculate and return a 1s complement sum of the entire packet (or a large portion there-of)? Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP relevant portions should still be simpler (well faster) than calculating the TCP checksum. I'm pretty sure that some relationship between 1s complement sum of all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum could be pulled out of a hat with some deeper thought. (similarly for IPv4/GRE/IPv6/TCP and other combinations) What portions of the packet can the HW/FW [partially] checksum - and return the value to the driver for further processing? Can it return 1s complement sum of data portion of outer IPv4 (ie. in IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes) Maciej Żenczykowski, Kernel Networking Developer @ Google On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote: >> -----Original Message----- >> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet >> Sent: Tuesday, March 19, 2013 2:08 AM >> To: Dmitry Kravkov >> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski >> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic >> >> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote: >> > The patch drives FW to perform RSS for GRE traffic, >> > based on inner headers. >> > >> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> >> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> >> > --- >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 +++ >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 23 ++++++++++++----------- >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 9 +++++++++ >> > 3 files changed, 24 insertions(+), 11 deletions(-) >> >> This works very well. >> >> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx() >> >> (this was a patch from Tom Herbert, commit >> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping >> when rx'ing over tunnel ) >> >> Meaning we hit a single cpu for the GRO stuff in ip_gre. >> >> I have to think about it. >> >> >> Another question is : >> >> Can bnx2x check the tcp checksum if GRE encapsulated ? >> > Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth? > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Maciej Żenczykowski [mailto:maze@google.com] > Sent: Tuesday, March 19, 2013 11:18 AM > To: Dmitry Kravkov > Cc: Eric Dumazet; davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert > Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic > > Can the HW calculate and return a 1s complement sum of the entire > packet (or a large portion there-of)? No, it can't :( > Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP > relevant portions should still be simpler (well faster) than > calculating the TCP checksum. > I'm pretty sure that some relationship between 1s complement sum of > all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum > could be pulled out of a hat with some deeper thought. (similarly for > IPv4/GRE/IPv6/TCP and other combinations) > > What portions of the packet can the HW/FW [partially] checksum - and > return the value to the driver for further processing? > Can it return 1s complement sum of data portion of outer IPv4 (ie. in > IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes) > > Maciej Żenczykowski, Kernel Networking Developer @ Google > > On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote: > >> -----Original Message----- > >> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet > >> Sent: Tuesday, March 19, 2013 2:08 AM > >> To: Dmitry Kravkov > >> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski > >> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic > >> > >> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote: > >> > The patch drives FW to perform RSS for GRE traffic, > >> > based on inner headers. > >> > > >> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> > >> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> > >> > --- > >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 +++ > >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 23 ++++++++++++----------- > >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 9 +++++++++ > >> > 3 files changed, 24 insertions(+), 11 deletions(-) > >> > >> This works very well. > >> > >> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx() > >> > >> (this was a patch from Tom Herbert, commit > >> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping > >> when rx'ing over tunnel ) > >> > >> Meaning we hit a single cpu for the GRO stuff in ip_gre. > >> > >> I have to think about it. > >> > >> > >> Another question is : > >> > >> Can bnx2x check the tcp checksum if GRE encapsulated ? > >> > > Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM > validation for huge packets. Is it worth? > >
Please Maciej do not top post on lkml or netdev mailing lists. On Tue, 2013-03-19 at 02:18 -0700, Maciej Żenczykowski wrote: > Can the HW calculate and return a 1s complement sum of the entire > packet (or a large portion there-of)? > Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP > relevant portions should still be simpler (well faster) than > calculating the TCP checksum. > I'm pretty sure that some relationship between 1s complement sum of > all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum > could be pulled out of a hat with some deeper thought. (similarly for > IPv4/GRE/IPv6/TCP and other combinations) > > What portions of the packet can the HW/FW [partially] checksum - and > return the value to the driver for further processing? > Can it return 1s complement sum of data portion of outer IPv4 (ie. in > IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes) > I assume Dmitry was speaking of this possibility, and our stack should handle this just fine. NIC providing these kind of checksums set : skb->ip_summed = CHECKSUM_COMPLETE; skb->csum = csum; before feeding the packet to the stack. When we pull some header, we have to call skb_postpull_rcsum() to adjust the skb->csum so that final check can be done. About 20 drivers currently provide these kind of checksumming. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > Sent: Tuesday, March 19, 2013 2:26 PM > To: Maciej Żenczykowski > Cc: Dmitry Kravkov; davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert > Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic > > Please Maciej do not top post on lkml or netdev mailing lists. > > On Tue, 2013-03-19 at 02:18 -0700, Maciej Żenczykowski wrote: > > Can the HW calculate and return a 1s complement sum of the entire > > packet (or a large portion there-of)? > > Fixing that up to be only of the outer IPv4, inner IPv4 and inner TCP > > relevant portions should still be simpler (well faster) than > > calculating the TCP checksum. > > I'm pretty sure that some relationship between 1s complement sum of > > all bytes, outer IPv4 checksum, inner IPv4 checksum and TCP checksum > > could be pulled out of a hat with some deeper thought. (similarly for > > IPv4/GRE/IPv6/TCP and other combinations) > > > > What portions of the packet can the HW/FW [partially] checksum - and > > return the value to the driver for further processing? > > Can it return 1s complement sum of data portion of outer IPv4 (ie. in > > IPv4/GRE/IPv4/TCP return a 1s complement sum of GRE/IPv4/TCP bytes) > > > > I assume Dmitry was speaking of this possibility, and our stack should > handle this just fine. In case of tunneling bnx2x HW can not provide csum of any portion of the packet. Flag for XSUM_NO_VALIDATION on cqe will be set for all gre packets. As a result driver will leave: skb->ip_summed = CHECKSUM_NONE; > > NIC providing these kind of checksums set : > > skb->ip_summed = CHECKSUM_COMPLETE; > skb->csum = csum; > > before feeding the packet to the stack. > > When we pull some header, we have to call skb_postpull_rcsum() > to adjust the skb->csum so that final check can be done. > > About 20 drivers currently provide these kind of checksumming. >
Dmitry, On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote: >> -----Original Message----- >> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet >> Sent: Tuesday, March 19, 2013 2:08 AM >> To: Dmitry Kravkov >> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski >> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic >> >> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote: >> > The patch drives FW to perform RSS for GRE traffic, >> > based on inner headers. >> > >> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> >> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> >> > --- >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 +++ >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 23 ++++++++++++----------- >> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 9 +++++++++ >> > 3 files changed, 24 insertions(+), 11 deletions(-) >> >> This works very well. >> >> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx() >> >> (this was a patch from Tom Herbert, commit >> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping >> when rx'ing over tunnel ) >> >> Meaning we hit a single cpu for the GRO stuff in ip_gre. >> >> I have to think about it. >> >> >> Another question is : >> >> Can bnx2x check the tcp checksum if GRE encapsulated ? >> > Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth? Could you elaborate on what you meant above? (I'm looking for any form of h/w assist for csum validation for GRO/TPA pkts since csum computation can be expensive and as you said below CHECKSUM_COMPLETE is out of the question.) Thanks, Jerry > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 17, 2013 at 8:53 PM, Jerry Chu <hkchu@google.com> wrote: > Dmitry, > > On Mon, Mar 18, 2013 at 11:30 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote: >>> -----Original Message----- >>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet >>> Sent: Tuesday, March 19, 2013 2:08 AM >>> To: Dmitry Kravkov >>> Cc: davem@davemloft.net; netdev@vger.kernel.org; Eilon Greenstein; Tom Herbert; Maciej Żenczykowski >>> Subject: Re: [PATCH net-next 2/2] bnx2x: add RSS capability for GRE traffic >>> >>> On Mon, 2013-03-18 at 18:51 +0200, Dmitry Kravkov wrote: >>> > The patch drives FW to perform RSS for GRE traffic, >>> > based on inner headers. >>> > >>> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> >>> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> >>> > --- >>> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 +++ >>> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 23 ++++++++++++----------- >>> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h | 9 +++++++++ >>> > 3 files changed, 24 insertions(+), 11 deletions(-) >>> >>> This works very well. >>> >>> Problem is we skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx() >>> >>> (this was a patch from Tom Herbert, commit >>> 693019e90ca45d881109d32c0c6d29adf03f6447 (net: reset skb queue mapping >>> when rx'ing over tunnel ) >>> >>> Meaning we hit a single cpu for the GRO stuff in ip_gre. >>> >>> I have to think about it. >>> >>> >>> Another question is : >>> >>> Can bnx2x check the tcp checksum if GRE encapsulated ? >>> >> Current HW can't provide this. Probably, it's possible to separate CSUM from GRO/TPA then stack will have to handle CSUM validation for huge packets. Is it worth? > > Could you elaborate on what you meant above? (I'm looking for any form > of h/w assist for > csum validation for GRO/TPA pkts since csum computation can be expensive and as > you said below CHECKSUM_COMPLETE is out of the question.) > Current bnx2x HW is not able to perform CSUM validation for encapsulated packets, so in any case host needs to do that. Today GRO/TPA feature depends on CSUM, but theoretically (i did not investigate it) and probably HW can provide aggregated packets w/o csum validation - this will save headers processing for host. > Thanks, > > Jerry > >> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote: > > Current bnx2x HW is not able to perform CSUM validation for > encapsulated packets, so in any case host needs to do that. > Today GRO/TPA feature depends on CSUM, but theoretically (i did not > investigate it) and probably HW can provide aggregated packets w/o > csum validation - this will save headers processing for host. I am not sure I understand this. Aggregation cannot be done if csums are not validated. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 17, 2013 at 10:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote: > >> >> Current bnx2x HW is not able to perform CSUM validation for >> encapsulated packets, so in any case host needs to do that. >> Today GRO/TPA feature depends on CSUM, but theoretically (i did not >> investigate it) and probably HW can provide aggregated packets w/o >> csum validation - this will save headers processing for host. > > I am not sure I understand this. > > Aggregation cannot be done if csums are not validated. > > Thanks. Eric. So we can't do with current bnx2x HW. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 17, 2013 at 12:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote: > >> >> Current bnx2x HW is not able to perform CSUM validation for >> encapsulated packets, so in any case host needs to do that. >> Today GRO/TPA feature depends on CSUM, but theoretically (i did not >> investigate it) and probably HW can provide aggregated packets w/o >> csum validation - this will save headers processing for host. > > I am not sure I understand this. > > Aggregation cannot be done if csums are not validated. Unless all the csums from aggregated pkts are also aggregated into one (so that the host computes s/w csum on the large pkt and validates against the aggregated csum...) There may be some performance benefit for doing this but it may arguably weaken already weak 1's complement csum. Jerry > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-08-18 at 04:55 -0700, Jerry Chu wrote: > On Sat, Aug 17, 2013 at 12:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote: > > > >> > >> Current bnx2x HW is not able to perform CSUM validation for > >> encapsulated packets, so in any case host needs to do that. > >> Today GRO/TPA feature depends on CSUM, but theoretically (i did not > >> investigate it) and probably HW can provide aggregated packets w/o > >> csum validation - this will save headers processing for host. > > > > I am not sure I understand this. > > > > Aggregation cannot be done if csums are not validated. > > Unless all the csums from aggregated pkts are also aggregated into one > (so that the host computes s/w csum on the large pkt and validates against > the aggregated csum...) There may be some performance benefit for doing > this but it may arguably weaken already weak 1's complement csum. To aggregate two packets, you first have to make sure the checksums of each packet are ok. If the hardware does not validate checksum, then it cannot aggregate packets. Sure, CHECKSUM_COMPLETE support would be nice, so that linux can perform aggregation without having to bring into cpu cache the whole frame. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
+mwdalton, uday On Sun, Aug 18, 2013 at 8:37 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sun, 2013-08-18 at 04:55 -0700, Jerry Chu wrote: >> On Sat, Aug 17, 2013 at 12:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > On Sat, 2013-08-17 at 21:52 +0300, Dmitry Kravkov wrote: >> > >> >> >> >> Current bnx2x HW is not able to perform CSUM validation for >> >> encapsulated packets, so in any case host needs to do that. >> >> Today GRO/TPA feature depends on CSUM, but theoretically (i did not >> >> investigate it) and probably HW can provide aggregated packets w/o >> >> csum validation - this will save headers processing for host. >> > >> > I am not sure I understand this. >> > >> > Aggregation cannot be done if csums are not validated. >> >> Unless all the csums from aggregated pkts are also aggregated into one >> (so that the host computes s/w csum on the large pkt and validates against >> the aggregated csum...) There may be some performance benefit for doing >> this but it may arguably weaken already weak 1's complement csum. > > To aggregate two packets, you first have to make sure the checksums of > each packet are ok. If the hardware does not validate checksum, then it > cannot aggregate packets. It can if the csum validation is deferred. But this requires the stack code to aggregate csum from individual pkt header, which can get ugly as one needs to remove the header part from the individual csum first. It may also weaken the already weak TCP csum as i mentioned previously. Also one corrupted pkt will cause the whole GRO pkt to be thrown away. But I won't worry about it for the data center environment where pkt corruption is rare. > > Sure, CHECKSUM_COMPLETE support would be nice, so that linux can perform > aggregation without having to bring into cpu cache the whole frame. I wish CHECKSUM_COMPLETE can be supported by the h/w - we are currently running an older kernel w/o RSS for GRE support so all the GRE traffic between two IPs hits the same NIC queue. With my GRO-GRE patch (to be upstreamed later) I'm 20% below line rate (7.1Gbps) on 10GbE multi-TCP stream test because RSS will saturate a single CPU that is handling softirq. The majority of CPU cost was from bcopy (~20%) followed by csum computation (4%). If I comment out the csum validation in the GRO code then I hit line rate (9.1Gbps) right away on the same test. In order for me to saturate the line rate but without h/w CHECKSUM_COMPLETE support, one other alternative is to back port the RSS for GRE support, which seems a daunting task (if not please tell me). If I defer the csum validation after GRO then RPS will spread out traffic and the csum cost so we are no longer bottlenecked on one CPU (and hopefully hit the line rate). Jerry > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-08-18 at 14:11 -0700, Jerry Chu wrote: > It can if the csum validation is deferred. But this requires the stack code to > aggregate csum from individual pkt header, which can get ugly as one needs > to remove the header part from the individual csum first. You can _not_ aggregate packets _before_ validating their checksum, or else you could end up with a correct final checksum while two or more segments had buggy checksums. Checksum are already quite weak, please do not make them even weaker ;) GRO should not throw out any segments, it is not its role. If a router receives a bunch of packets, it should forward them regardless of the (TCP) checksum being good or not. The final receiver has full responsibility for ultimately validate the checksums and update various SNMP counters. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 18, 2013 at 4:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Sun, 2013-08-18 at 14:11 -0700, Jerry Chu wrote: > >> It can if the csum validation is deferred. But this requires the stack code to >> aggregate csum from individual pkt header, which can get ugly as one needs >> to remove the header part from the individual csum first. > > You can _not_ aggregate packets _before_ validating their checksum, or > else you could end up with a correct final checksum while two or more > segments had buggy checksums. Checksum are already quite weak, please > do not make them even weaker ;) Yes I know how weak the 16bit 1's complement inet csum (see my previous email). I've had the pleasure to debug a number of bugs in the past where 16bit words were flipped by the buggy h/w hence went undetected by the weak TCP csum. But I also think it's a tradeoff call. For WAN it's probably a bad idea but for data center traffic, pkt corruption is usually rare. If the performance gain is large enough, perhaps it justifies this hack (?) > > GRO should not throw out any segments, it is not its role. I did not suggest GRO to throw away any pkt. GRO would just fix the csum field in the header of the aggregated pkt and passes it on. > > If a router receives a bunch of packets, it should forward them > regardless of the (TCP) checksum being good or not. It won't affect the forwarding path (i think). > > The final receiver has full responsibility for ultimately validate the > checksums and update various SNMP counters. When the GRO pkt with the csum field fixed (aggregated) get to the receiving TCP endpoint, it will be csum validated by the TCP stack because skb->ip_summed == CHECKSUM_NONE, and the pkt will be tossed if the csum check fails like non-GRO pkts. To be clear, this hack is not something I'd be proud of and I doubt the performance gain will be enough to justify it. This idea grew more out of desperation for not having csum offload for encap'ed pkts, which I think is a P0 bug! Jerry > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h index 8f96372..f9098d8 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h @@ -973,6 +973,9 @@ static inline int bnx2x_func_start(struct bnx2x *bp) else /* CHIP_IS_E1X */ start_params->network_cos_mode = FW_WRR; + start_params->gre_tunnel_mode = IPGRE_TUNNEL; + start_params->gre_tunnel_rss = GRE_INNER_HEADERS_RSS; + return bnx2x_func_state_change(bp, &func_params); } diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c index 66ab259..5bdc1d6 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c @@ -5679,17 +5679,18 @@ static inline int bnx2x_func_send_start(struct bnx2x *bp, memset(rdata, 0, sizeof(*rdata)); /* Fill the ramrod data with provided parameters */ - rdata->function_mode = (u8)start_params->mf_mode; - rdata->sd_vlan_tag = cpu_to_le16(start_params->sd_vlan_tag); - rdata->path_id = BP_PATH(bp); - rdata->network_cos_mode = start_params->network_cos_mode; - - /* - * No need for an explicit memory barrier here as long we would - * need to ensure the ordering of writing to the SPQ element - * and updating of the SPQ producer which involves a memory - * read and we will have to put a full memory barrier there - * (inside bnx2x_sp_post()). + rdata->function_mode = (u8)start_params->mf_mode; + rdata->sd_vlan_tag = cpu_to_le16(start_params->sd_vlan_tag); + rdata->path_id = BP_PATH(bp); + rdata->network_cos_mode = start_params->network_cos_mode; + rdata->gre_tunnel_mode = start_params->gre_tunnel_mode; + rdata->gre_tunnel_rss = start_params->gre_tunnel_rss; + + /* No need for an explicit memory barrier here as long we would + * need to ensure the ordering of writing to the SPQ element + * and updating of the SPQ producer which involves a memory + * read and we will have to put a full memory barrier there + * (inside bnx2x_sp_post()). */ return bnx2x_sp_post(bp, RAMROD_CMD_ID_COMMON_FUNCTION_START, 0, diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h index 064dba2..35479da 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h @@ -1123,6 +1123,15 @@ struct bnx2x_func_start_params { /* Function cos mode */ u8 network_cos_mode; + + /* NVGRE classification enablement */ + u8 nvgre_clss_en; + + /* NO_GRE_TUNNEL/NVGRE_TUNNEL/L2GRE_TUNNEL/IPGRE_TUNNEL */ + u8 gre_tunnel_mode; + + /* GRE_OUTER_HEADERS_RSS/GRE_INNER_HEADERS_RSS/NVGRE_KEY_ENTROPY_RSS */ + u8 gre_tunnel_rss; }; struct bnx2x_func_switch_update_params {