From patchwork Fri Sep 11 08:48:41 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alin Serdean X-Patchwork-Id: 516626 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (unknown [IPv6:2600:3c00::f03c:91ff:fe6e:bdf7]) by ozlabs.org (Postfix) with ESMTP id 7AFD2140180 for ; Fri, 11 Sep 2015 18:49:17 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 7D7AE10C4B; Fri, 11 Sep 2015 01:49:16 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx3v1.cudamail.com (mx3.cudamail.com [64.34.241.5]) by archives.nicira.com (Postfix) with ESMTPS id BABAF10C4A for ; Fri, 11 Sep 2015 01:49:14 -0700 (PDT) Received: from bar3.cudamail.com (bar1 [192.168.15.1]) by mx3v1.cudamail.com (Postfix) with ESMTP id 0942D61912B for ; Fri, 11 Sep 2015 02:49:14 -0600 (MDT) X-ASG-Debug-ID: 1441961350-03dd7b5d4607250001-byXFYA Received: from mx3-pf3.cudamail.com ([192.168.14.3]) by bar3.cudamail.com with ESMTP id JccOFG8F19ByG6Hf (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 11 Sep 2015 02:49:10 -0600 (MDT) X-Barracuda-Envelope-From: aserdean@cloudbasesolutions.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.14.3 Received: from unknown (HELO cbssmtp1.cloudbase.local) (91.232.152.5) by mx3-pf3.cudamail.com with SMTP; 11 Sep 2015 08:49:04 -0000 Received-SPF: pass (mx3-pf3.cudamail.com: SPF record at cloudbasesolutions.com designates 91.232.152.5 as permitted sender) X-Barracuda-Apparent-Source-IP: 91.232.152.5 X-Barracuda-RBL-IP: 91.232.152.5 Received: from localhost (localhost [127.0.0.1]) by cbssmtp1.cloudbase.local (Postfix) with ESMTP id 5F815425BD for ; Fri, 11 Sep 2015 11:49:03 +0300 (EEST) X-Virus-Scanned: amavisd-new at cloudbasesolutions.com Received: from cbssmtp1.cloudbase.local ([127.0.0.1]) by localhost (cbssmtp1.cloudbase.local [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8A2CMymWjpvr for ; Fri, 11 Sep 2015 11:48:42 +0300 (EEST) Received: from CBSEX1.cloudbase.local (unknown [10.77.78.3]) by cbssmtp1.cloudbase.local (Postfix) with ESMTP id 9C2F54057C for ; Fri, 11 Sep 2015 11:48:42 +0300 (EEST) Received: from CBSEX1.cloudbase.local ([10.77.78.3]) by CBSEX1.cloudbase.local ([10.77.78.3]) with mapi id 14.03.0224.002; Fri, 11 Sep 2015 10:48:42 +0200 X-CudaMail-Envelope-Sender: aserdean@cloudbasesolutions.com From: Alin Serdean To: "dev@openvswitch.org" X-CudaMail-MID: CM-V3-910002827 X-CudaMail-DTE: 091115 X-CudaMail-Originating-IP: 91.232.152.5 Thread-Topic: [PATCH] datapath-windows: Compute checksums for VXLAN inner packets X-ASG-Orig-Subj: [##CM-V3-910002827##][PATCH] datapath-windows: Compute checksums for VXLAN inner packets Thread-Index: AQHQ7G6oTZY/lcnK5ECUL4SfdlASRw== Date: Fri, 11 Sep 2015 08:48:41 +0000 Message-ID: <1441961323-11256-1-git-send-email-aserdean@cloudbasesolutions.com> Accept-Language: en-US, it-IT Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.77.78.1] MIME-Version: 1.0 X-GBUdb-Analysis: 0, 91.232.152.5, Ugly c=0.237282 p=-0.2 Source Normal X-MessageSniffer-Rules: 0-0-0-24504-c X-Barracuda-Connect: UNKNOWN[192.168.14.3] X-Barracuda-Start-Time: 1441961350 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.10 X-Barracuda-Spam-Status: No, SCORE=0.10 using per-user scores of TAG_LEVEL=3.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=3.0 tests=RDNS_NONE X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.22434 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_NONE Delivered to trusted network by a host with no rDNS Subject: [ovs-dev] [PATCH] datapath-windows: Compute checksums for VXLAN inner packets X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@openvswitch.org Sender: "dev" Windows does not support VXLAN hardware offloading. Currently we do not compute IP/TCP/UDP checksums for the inner packet. This patch computes the checksums mentioned above in regards with the enabled settings. i.e. if IP checksum offloading is enabled for the inner packet we compute it. The same applies for TCP and UDP packets. This patch also revizes the computation of ones' complement over different memory blocks, in the case the lengths are odd. Also per documentation: https://msdn.microsoft.com/en-us/library/windows/hardware/ff568840%28v=vs.85%29.aspx set the TCP flags FIN and PSH only for the last segment in the case LSO is enabled. Signed-off-by: Alin Gabriel Serdean --- This patch is also intended for branch 2.4 --- datapath-windows/ovsext/BufferMgmt.c | 32 ++++++++++---- datapath-windows/ovsext/Checksum.c | 64 +++++++++++++++++---------- datapath-windows/ovsext/Vxlan.c | 83 +++++++++++++++++++++++++++++++----- 3 files changed, 138 insertions(+), 41 deletions(-) diff --git a/datapath-windows/ovsext/BufferMgmt.c b/datapath-windows/ovsext/BufferMgmt.c index 3550e20..bf53fc3 100644 --- a/datapath-windows/ovsext/BufferMgmt.c +++ b/datapath-windows/ovsext/BufferMgmt.c @@ -1116,7 +1116,8 @@ GetSegmentHeaderInfo(PNET_BUFFER_LIST nbl, * -------------------------------------------------------------------------- */ static NDIS_STATUS -FixSegmentHeader(PNET_BUFFER nb, UINT16 segmentSize, UINT32 seqNumber) +FixSegmentHeader(PNET_BUFFER nb, UINT16 segmentSize, UINT32 seqNumber, + BOOLEAN lastPacket, UINT16 packetCounter) { EthHdr *dstEth; IPHdr *dstIP; @@ -1141,18 +1142,29 @@ FixSegmentHeader(PNET_BUFFER nb, UINT16 segmentSize, UINT32 seqNumber) /* Fix IP length and checksum */ ASSERT(dstIP->protocol == IPPROTO_TCP); dstIP->tot_len = htons(segmentSize + dstIP->ihl * 4 + TCP_HDR_LEN(dstTCP)); + dstIP->id += packetCounter; dstIP->check = 0; dstIP->check = IPChecksum((UINT8 *)dstIP, dstIP->ihl * 4, 0); /* Fix TCP checksum */ dstTCP->seq = htonl(seqNumber); - dstTCP->check = - IPPseudoChecksum((UINT32 *)&dstIP->saddr, - (UINT32 *)&dstIP->daddr, - IPPROTO_TCP, segmentSize + TCP_HDR_LEN(dstTCP)); + + if (dstTCP->fin) { + dstTCP->fin = lastPacket; + } + if (dstTCP->psh) { + dstTCP->psh = lastPacket; + } + + UINT16 csumLength = segmentSize + TCP_HDR_LEN(dstTCP);; + dstTCP->check = IPPseudoChecksum(&dstIP->saddr, + &dstIP->daddr, + IPPROTO_TCP, + csumLength); dstTCP->check = CalculateChecksumNB(nb, - (UINT16)(NET_BUFFER_DATA_LENGTH(nb) - sizeof *dstEth - dstIP->ihl * 4), - sizeof *dstEth + dstIP->ihl * 4); + csumLength, + sizeof *dstEth + dstIP->ihl * 4); + return STATUS_SUCCESS; } @@ -1190,6 +1202,7 @@ OvsTcpSegmentNBL(PVOID ovsContext, NDIS_STATUS status; UINT16 segmentSize; ULONG copiedSize; + UINT16 packetCounter = 0; srcCtx = (POVS_BUFFER_CONTEXT)NET_BUFFER_LIST_CONTEXT_DATA_START(nbl); if (srcCtx == NULL || srcCtx->magic != OVS_CTX_MAGIC) { @@ -1232,7 +1245,9 @@ OvsTcpSegmentNBL(PVOID ovsContext, goto nblcopy_error; } - status = FixSegmentHeader(newNb, segmentSize, seqNumber); + status = FixSegmentHeader(newNb, segmentSize, seqNumber, + NET_BUFFER_NEXT_NB(newNb) == NULL, + packetCounter); if (status != NDIS_STATUS_SUCCESS) { goto nblcopy_error; } @@ -1241,6 +1256,7 @@ OvsTcpSegmentNBL(PVOID ovsContext, /* Move on to the next segment */ size -= segmentSize; seqNumber += segmentSize; + packetCounter++; } status = OvsAllocateNBLContext(context, newNbl); diff --git a/datapath-windows/ovsext/Checksum.c b/datapath-windows/ovsext/Checksum.c index 510a094..5d9b035 100644 --- a/datapath-windows/ovsext/Checksum.c +++ b/datapath-windows/ovsext/Checksum.c @@ -68,34 +68,48 @@ CalculateOnesComplement(UINT8 *start, { UINT64 sum = 0, val; UINT64 *src = (UINT64 *)start; - union { - UINT32 val; - UINT8 b8[4]; - } tmp; - while (totalLength > 7) { val = *src; - sum += (val >> 32) + (val & 0xffffffff); + sum += val; + if (sum < val) sum++; src++; totalLength -= 8; } + + start = (UINT8 *)src; + if (totalLength > 3) { - sum += *(UINT32 *)src; - src = (UINT64 *)((UINT8 *)src + 4); + UINT32 val = *(UINT32 *)start; + sum += val; + if (sum < val) sum++; + start += 4; totalLength -= 4; } - start = (UINT8 *)src; - tmp.val = 0; - switch (totalLength) { - case 3: - tmp.b8[2] = start[2]; - case 2: - tmp.b8[1] = start[1]; - case 1: - tmp.b8[0] = start[0]; - sum += tmp.val; + + if (totalLength > 1) { + UINT16 val = *(UINT16 *)start; + sum += val; + if (sum < val) sum++; + start += 2; + totalLength -= 2; } - sum = (isEvenStart ? sum : swap64(sum)) + initial; + + if (totalLength > 0) { + UINT8 val = *start; + sum += val; + if (sum < val) sum++; + start += 1; + totalLength -= 1; + } + ASSERT(totalLength == 0); + + if (!isEvenStart) { + sum = _byteswap_uint64(sum); + } + + sum += initial; + if (sum < initial) sum++; + return sum; } @@ -428,6 +442,7 @@ CalculateChecksumNB(const PNET_BUFFER nb, ULONG firstMdlLen; /* Running count of bytes in remainder of the MDLs including current. */ ULONG packetLen; + BOOLEAN swapEnd = 1 & csumDataLen; if ((nb == NULL) || (csumDataLen == 0) || (offset >= NET_BUFFER_DATA_LENGTH(nb)) @@ -482,10 +497,8 @@ CalculateChecksumNB(const PNET_BUFFER nb, while (csumDataLen && (currentMdl != NULL)) { ASSERT(mdlLen < 65536); csLen = MIN((UINT16) mdlLen, csumDataLen); - //XXX Not handling odd bytes yet. - ASSERT(((csLen & 0x1) == 0) || csumDataLen <= mdlLen); - csum = CalculateOnesComplement(src, csLen, csum, TRUE); + csum = CalculateOnesComplement(src, csLen, csum, !(1 & csumDataLen)); fold64(csum); csumDataLen -= csLen; @@ -504,9 +517,14 @@ CalculateChecksumNB(const PNET_BUFFER nb, } } + fold64(csum); ASSERT(csumDataLen == 0); ASSERT((csum & ~0xffff) == 0); - return (UINT16) ~csum; + csum = (UINT16)~csum; + if (swapEnd) { + return _byteswap_ushort((UINT16)csum); + } + return (UINT16)csum; } /* diff --git a/datapath-windows/ovsext/Vxlan.c b/datapath-windows/ovsext/Vxlan.c index 2364f28..a179fbe 100644 --- a/datapath-windows/ovsext/Vxlan.c +++ b/datapath-windows/ovsext/Vxlan.c @@ -152,9 +152,9 @@ OvsCleanupVxlanTunnel(PIRP irp, if (vxlanPort->filterID != 0) { status = OvsTunnelFilterDelete(irp, - vxlanPort->filterID, - callback, - tunnelContext); + vxlanPort->filterID, + callback, + tunnelContext); } else { OvsFreeMemoryWithTag(vport->priv, OVS_VXLAN_POOL_TAG); vport->priv = NULL; @@ -190,6 +190,9 @@ OvsDoEncapVxlan(POVS_VPORT_ENTRY vport, POVS_VXLAN_VPORT vportVxlan; UINT32 headRoom = OvsGetVxlanTunHdrSize(); UINT32 packetLength; + NDIS_TCP_IP_CHECKSUM_NET_BUFFER_LIST_INFO csumInfo; + csumInfo.Value = NET_BUFFER_LIST_INFO(curNbl, + TcpIpChecksumNetBufferListInfo); /* * XXX: the assumption currently is that the NBL is owned by OVS, and @@ -198,20 +201,24 @@ OvsDoEncapVxlan(POVS_VPORT_ENTRY vport, */ curNb = NET_BUFFER_LIST_FIRST_NB(curNbl); packetLength = NET_BUFFER_DATA_LENGTH(curNb); + if (layers->isTcp) { NDIS_TCP_LARGE_SEND_OFFLOAD_NET_BUFFER_LIST_INFO tsoInfo; tsoInfo.Value = NET_BUFFER_LIST_INFO(curNbl, TcpLargeSendNetBufferListInfo); - OVS_LOG_TRACE("MSS %u packet len %u", tsoInfo.LsoV1Transmit.MSS, packetLength); + OVS_LOG_TRACE("MSS %u packet len %u", tsoInfo.LsoV1Transmit.MSS, + packetLength); if (tsoInfo.LsoV1Transmit.MSS) { OVS_LOG_TRACE("l4Offset %d", layers->l4Offset); *newNbl = OvsTcpSegmentNBL(switchContext, curNbl, layers, - tsoInfo.LsoV1Transmit.MSS, headRoom); + tsoInfo.LsoV1Transmit.MSS, headRoom); if (*newNbl == NULL) { OVS_LOG_ERROR("Unable to segment NBL"); return NDIS_STATUS_FAILURE; } + /* Clear out LSO flags after this point */ + NET_BUFFER_LIST_INFO(curNbl, TcpLargeSendNetBufferListInfo) = 0; } } @@ -226,6 +233,61 @@ OvsDoEncapVxlan(POVS_VPORT_ENTRY vport, OVS_LOG_ERROR("Unable to copy NBL"); return NDIS_STATUS_FAILURE; } + /* + * To this point we do not have VXLAN offloading. + * Apply defined checksums + */ + curNb = NET_BUFFER_LIST_FIRST_NB(*newNbl); + curMdl = NET_BUFFER_CURRENT_MDL(curNb); + bufferStart = (PUINT8)MmGetSystemAddressForMdlSafe(curMdl, LowPagePriority); + bufferStart += NET_BUFFER_CURRENT_MDL_OFFSET(curNb); + + if (layers->isIPv4) { + IPHdr *ip = (IPHdr *)(bufferStart + layers->l3Offset); + + if (csumInfo.Transmit.IpHeaderChecksum) { + ip->check = 0; + ip->check = IPChecksum((UINT8 *)ip, 4 * ip->ihl, 0); + } + + if (layers->isTcp && csumInfo.Transmit.TcpChecksum) { + UINT16 csumLength = (UINT16)(packetLength - layers->l4Offset); + TCPHdr *tcp = (TCPHdr *)(bufferStart + layers->l4Offset); + tcp->check = IPPseudoChecksum(&ip->saddr, &ip->daddr, + IPPROTO_TCP, csumLength); + tcp->check = CalculateChecksumNB(curNb, csumLength, + (UINT32)(layers->l4Offset)); + } + else if (layers->isUdp && csumInfo.Transmit.UdpChecksum) { + UINT16 csumLength = (UINT16)(packetLength - layers->l4Offset); + UDPHdr *udp = (UDPHdr *)((PCHAR)ip + sizeof *ip); + udp->check = IPPseudoChecksum(&ip->saddr, &ip->daddr, + IPPROTO_UDP, csumLength); + udp->check = CalculateChecksumNB(curNb, csumLength, + (UINT32)(layers->l4Offset)); + } + } else if (layers->isIPv6) { + IPv6Hdr *ip = (IPv6Hdr *)(bufferStart + layers->l3Offset); + + if (layers->isTcp && csumInfo.Transmit.TcpChecksum) { + UINT16 csumLength = (UINT16)(packetLength - layers->l4Offset); + TCPHdr *tcp = (TCPHdr *)(bufferStart + layers->l4Offset); + tcp->check = IPv6PseudoChecksum((UINT32 *) &ip->saddr, + (UINT32 *) &ip->daddr, + IPPROTO_TCP, csumLength); + tcp->check = CalculateChecksumNB(curNb, csumLength, + (UINT32)(layers->l4Offset)); + } + else if (layers->isUdp && csumInfo.Transmit.UdpChecksum) { + UINT16 csumLength = (UINT16)(packetLength - layers->l4Offset); + UDPHdr *udp = (UDPHdr *)((PCHAR)ip + sizeof *ip); + udp->check = IPPseudoChecksum((UINT32 *) &ip->saddr, + (UINT32 *)&ip->daddr, + IPPROTO_UDP, csumLength); + udp->check = CalculateChecksumNB(curNb, csumLength, + (UINT32)(layers->l4Offset)); + } + } } curNbl = *newNbl; @@ -257,9 +319,6 @@ OvsDoEncapVxlan(POVS_VPORT_ENTRY vport, sizeof ethHdr->Destination + sizeof ethHdr->Source); ethHdr->Type = htons(ETH_TYPE_IPV4); - // XXX: question: there are fields in the OvsIPv4TunnelKey for ttl and such, - // should we use those values instead? or will they end up being - // uninitialized; /* IP header */ ipHdr = (IPHdr *)((PCHAR)ethHdr + sizeof *ethHdr); @@ -277,8 +336,12 @@ OvsDoEncapVxlan(POVS_VPORT_ENTRY vport, ASSERT(tunKey->src == fwdInfo->srcIpAddr || tunKey->src == 0); ipHdr->saddr = fwdInfo->srcIpAddr; ipHdr->daddr = fwdInfo->dstIpAddr; - ipHdr->check = 0; - ipHdr->check = IPChecksum((UINT8 *)ipHdr, sizeof *ipHdr, 0); + + /* Compute IP checksum only if the NIC does not support offloading */ + if (!csumInfo.Transmit.IpHeaderChecksum) { + ipHdr->check = 0; + ipHdr->check = IPChecksum((UINT8 *)ipHdr, sizeof *ipHdr, 0); + } /* UDP header */ udpHdr = (UDPHdr *)((PCHAR)ipHdr + sizeof *ipHdr);