From patchwork Sun Oct 30 13:30:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Finucane X-Patchwork-Id: 688949 X-Patchwork-Delegate: rbryant@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3t6JRp5SXBz9t1T for ; Mon, 31 Oct 2016 00:36:22 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="key not found in DNS" (0-bit key; unprotected) header.d=that.guru header.i=@that.guru header.b=q8U0v7gy; dkim-atps=neutral Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 049221030F; Sun, 30 Oct 2016 06:36:22 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id A774C102F0 for ; Sun, 30 Oct 2016 06:36:20 -0700 (PDT) Received: from bar5.cudamail.com (localhost [127.0.0.1]) by mx1e3.cudamail.com (Postfix) with ESMTPS id 3BD3E42037A for ; Sun, 30 Oct 2016 07:36:20 -0600 (MDT) X-ASG-Debug-ID: 1477834578-09eadd0f98243d50001-byXFYA Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com with ESMTP id dNsHSZTNo7o92O13 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sun, 30 Oct 2016 07:36:18 -0600 (MDT) X-Barracuda-Envelope-From: stephen@that.guru X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1 Received: from unknown (HELO cat.maple.relay.mailchannels.net) (23.83.214.31) by mx1-pf1.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted); 30 Oct 2016 13:36:18 -0000 Received-SPF: none (mx1-pf1.cudamail.com: domain at that.guru does not designate permitted sender hosts) X-Barracuda-Apparent-Source-IP: 23.83.214.31 X-Barracuda-RBL-IP: 23.83.214.31 X-Sender-Id: mxroute|x-authuser|stephen@that.guru Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 86F61E08C2 for ; Sun, 30 Oct 2016 13:34:08 +0000 (UTC) Received: from one.mxroute.com (ip-10-107-69-155.us-west-2.compute.internal [10.107.69.155]) by relay.mailchannels.net (Postfix) with ESMTPA id 9DFFDE1165 for ; Sun, 30 Oct 2016 13:34:07 +0000 (UTC) X-Sender-Id: mxroute|x-authuser|stephen@that.guru Received: from one.mxroute.com ([UNAVAILABLE]. [10.102.194.57]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.7.8); Sun, 30 Oct 2016 13:34:08 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: mxroute|x-authuser|stephen@that.guru X-MailChannels-Auth-Id: mxroute X-MC-Loop-Signature: 1477834448070:3469375922 X-MC-Ingress-Time: 1477834448070 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=that.guru; s=default; h=References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=4p1C/iRubOGO1vk1Ch2Wbo2PL9nCa1lAx/nhU1UkruM=; b=q8U0v7gyDrOo/c2+vvVJxzX901 RQ1WU+ukaywBbFFSmFrL5FA0ALZZBnrq/O/FOL5vQ9MP18EadkSP5G4kHPhA0K2G9yzBI33nb1Z6L 3T0CEqrehZjpJD9c3g2DXODy+6ZAcRMo2JupNqjsyLli+6ZZYqHoPEfa+hr3nBjAh4qi0q2Yfy5mQ nug8csb20XdMqCFxeVCQA3iV/jGkHBz/+WxvrafZyz0p+dcy8JZCHYGKCFC0KzsRQtnG2pGrv8ZiO 3sT5kmj3jnQLXIT5isRMbXT+ctrSHTqHkq+laIDLJgiOilYQwJb+r1A0zujMuzUhkSMQ3Q4XmG7Jl HPB4CoMw==; X-CudaMail-Envelope-Sender: stephen@that.guru From: Stephen Finucane To: dev@openvswitch.org X-CudaMail-MID: CM-E1-1029006713 X-CudaMail-DTE: 103016 X-CudaMail-Originating-IP: 23.83.214.31 Date: Sun, 30 Oct 2016 13:30:02 +0000 X-ASG-Orig-Subj: [##CM-E1-1029006713##][PATCH 16/23] doc: Convert datapath/README to rST Message-Id: <1477834209-11414-17-git-send-email-stephen@that.guru> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1477834209-11414-1-git-send-email-stephen@that.guru> References: <1477834209-11414-1-git-send-email-stephen@that.guru> X-OutGoing-Spam-Status: No, score=-9.2 X-AuthUser: stephen@that.guru X-Barracuda-Connect: UNKNOWN[192.168.24.1] X-Barracuda-Start-Time: 1477834578 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 1.10 X-Barracuda-Spam-Status: No, SCORE=1.10 using global scores of TAG_LEVEL=3.5 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=4.0 tests=BSF_SC0_MV0713, BSF_SC5_MJ1963, DKIM_SIGNED, RDNS_NONE X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.34168 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature 0.10 RDNS_NONE Delivered to trusted network by a host with no rDNS 0.50 BSF_SC0_MV0713 Custom rule MV0713 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 Subject: [ovs-dev] [PATCH 16/23] doc: Convert datapath/README to rST X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" Signed-off-by: Stephen Finucane --- datapath/Modules.mk | 2 +- datapath/README.md | 265 ----------------------------------------------- datapath/README.rst | 265 +++++++++++++++++++++++++++++++++++++++++++++++ lib/dpif.h | 2 +- vtep/README.ovs-vtep.rst | 2 +- 5 files changed, 268 insertions(+), 268 deletions(-) delete mode 100644 datapath/README.md create mode 100644 datapath/README.rst diff --git a/datapath/Modules.mk b/datapath/Modules.mk index 8e9a169..2ffab2b 100644 --- a/datapath/Modules.mk +++ b/datapath/Modules.mk @@ -46,7 +46,7 @@ openvswitch_headers = \ vport-netdev.h openvswitch_extras = \ - README.md + README.rst dist_sources = $(foreach module,$(dist_modules),$($(module)_sources)) dist_headers = $(foreach module,$(dist_modules),$($(module)_headers)) diff --git a/datapath/README.md b/datapath/README.md deleted file mode 100644 index 8faecc0..0000000 --- a/datapath/README.md +++ /dev/null @@ -1,265 +0,0 @@ -Open vSwitch datapath developer documentation -============================================= - -The Open vSwitch kernel module allows flexible userspace control over -flow-level packet processing on selected network devices. It can be -used to implement a plain Ethernet switch, network device bonding, -VLAN processing, network access control, flow-based network control, -and so on. - -The kernel module implements multiple "datapaths" (analogous to -bridges), each of which can have multiple "vports" (analogous to ports -within a bridge). Each datapath also has associated with it a "flow -table" that userspace populates with "flows" that map from keys based -on packet headers and metadata to sets of actions. The most common -action forwards the packet to another vport; other actions are also -implemented. - -When a packet arrives on a vport, the kernel module processes it by -extracting its flow key and looking it up in the flow table. If there -is a matching flow, it executes the associated actions. If there is -no match, it queues the packet to userspace for processing (as part of -its processing, userspace will likely set up a flow to handle further -packets of the same type entirely in-kernel). - - -Flow key compatibility ----------------------- - -Network protocols evolve over time. New protocols become important -and existing protocols lose their prominence. For the Open vSwitch -kernel module to remain relevant, it must be possible for newer -versions to parse additional protocols as part of the flow key. It -might even be desirable, someday, to drop support for parsing -protocols that have become obsolete. Therefore, the Netlink interface -to Open vSwitch is designed to allow carefully written userspace -applications to work with any version of the flow key, past or future. - -To support this forward and backward compatibility, whenever the -kernel module passes a packet to userspace, it also passes along the -flow key that it parsed from the packet. Userspace then extracts its -own notion of a flow key from the packet and compares it against the -kernel-provided version: - - - If userspace's notion of the flow key for the packet matches the - kernel's, then nothing special is necessary. - - - If the kernel's flow key includes more fields than the userspace - version of the flow key, for example if the kernel decoded IPv6 - headers but userspace stopped at the Ethernet type (because it - does not understand IPv6), then again nothing special is - necessary. Userspace can still set up a flow in the usual way, - as long as it uses the kernel-provided flow key to do it. - - - If the userspace flow key includes more fields than the - kernel's, for example if userspace decoded an IPv6 header but - the kernel stopped at the Ethernet type, then userspace can - forward the packet manually, without setting up a flow in the - kernel. This case is bad for performance because every packet - that the kernel considers part of the flow must go to userspace, - but the forwarding behavior is correct. (If userspace can - determine that the values of the extra fields would not affect - forwarding behavior, then it could set up a flow anyway.) - -How flow keys evolve over time is important to making this work, so -the following sections go into detail. - - -Flow key format ---------------- - -A flow key is passed over a Netlink socket as a sequence of Netlink -attributes. Some attributes represent packet metadata, defined as any -information about a packet that cannot be extracted from the packet -itself, e.g. the vport on which the packet was received. Most -attributes, however, are extracted from headers within the packet, -e.g. source and destination addresses from Ethernet, IP, or TCP -headers. - -The header file defines the exact format of the -flow key attributes. For informal explanatory purposes here, we write -them as comma-separated strings, with parentheses indicating arguments -and nesting. For example, the following could represent a flow key -corresponding to a TCP packet that arrived on vport 1: - - in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), - eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, - frag=no), tcp(src=49163, dst=80) - -Often we ellipsize arguments not important to the discussion, e.g.: - - in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) - - -Wildcarded flow key format --------------------------- - -A wildcarded flow is described with two sequences of Netlink attributes -passed over the Netlink socket. A flow key, exactly as described above, and an -optional corresponding flow mask. - -A wildcarded flow can represent a group of exact match flows. Each '1' bit -in the mask specifies an exact match with the corresponding bit in the flow key. -A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit -of an incoming packet. Using a wildcarded flow can improve the flow set up rate -by reducing the number of new flows that need to be processed by the user space -program. - -Support for the mask Netlink attribute is optional for both the kernel and user -space program. The kernel can ignore the mask attribute, installing an exact -match flow, or reduce the number of don't care bits in the kernel to less than -what was specified by the user space program. In this case, variations in bits -that the kernel does not implement will simply result in additional flow setups. -The kernel module will also work with user space programs that neither support -nor supply flow mask attributes. - -Since the kernel may ignore or modify wildcard bits, it can be difficult for -the userspace program to know exactly what matches are installed. There are -two possible approaches: reactively install flows as they miss the kernel -flow table (and therefore not attempt to determine wildcard changes at all) -or use the kernel's response messages to determine the installed wildcards. - -When interacting with userspace, the kernel should maintain the match portion -of the key exactly as originally installed. This will provides a handle to -identify the flow for all future operations. However, when reporting the -mask of an installed flow, the mask should include any restrictions imposed -by the kernel. - -The behavior when using overlapping wildcarded flows is undefined. It is the -responsibility of the user space program to ensure that any incoming packet -can match at most one flow, wildcarded or not. The current implementation -performs best-effort detection of overlapping wildcarded flows and may reject -some but not all of them. However, this behavior may change in future versions. - - -Unique flow identifiers ------------------------ - -An alternative to using the original match portion of a key as the handle for -flow identification is a unique flow identifier, or "UFID". UFIDs are optional -for both the kernel and user space program. - -User space programs that support UFID are expected to provide it during flow -setup in addition to the flow, then refer to the flow using the UFID for all -future operations. The kernel is not required to index flows by the original -flow key if a UFID is specified. - - -Basic rule for evolving flow keys ---------------------------------- - -Some care is needed to really maintain forward and backward -compatibility for applications that follow the rules listed under -"Flow key compatibility" above. - -The basic rule is obvious: - - ------------------------------------------------------------------ - New network protocol support must only supplement existing flow - key attributes. It must not change the meaning of already defined - flow key attributes. - ------------------------------------------------------------------ - -This rule does have less-obvious consequences so it is worth working -through a few examples. Suppose, for example, that the kernel module -did not already implement VLAN parsing. Instead, it just interpreted -the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the -packet. The flow key for any packet with an 802.1Q header would look -essentially like this, ignoring metadata: - - eth(...), eth_type(0x8100) - -Naively, to add VLAN support, it makes sense to add a new "vlan" flow -key attribute to contain the VLAN tag, then continue to decode the -encapsulated headers beyond the VLAN tag using the existing field -definitions. With this change, a TCP packet in VLAN 10 would have a -flow key much like this: - - eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) - -But this change would negatively affect a userspace application that -has not been updated to understand the new "vlan" flow key attribute. -The application could, following the flow compatibility rules above, -ignore the "vlan" attribute that it does not understand and therefore -assume that the flow contained IP packets. This is a bad assumption -(the flow only contains IP packets if one parses and skips over the -802.1Q header) and it could cause the application's behavior to change -across kernel versions even though it follows the compatibility rules. - -The solution is to use a set of nested attributes. This is, for -example, why 802.1Q support uses nested attributes. A TCP packet in -VLAN 10 is actually expressed as: - - eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), - ip(proto=6, ...), tcp(...))) - -Notice how the "eth_type", "ip", and "tcp" flow key attributes are -nested inside the "encap" attribute. Thus, an application that does -not understand the "vlan" key will not see either of those attributes -and therefore will not misinterpret them. (Also, the outer eth_type -is still 0x8100, not changed to 0x0800.) - -Handling malformed packets --------------------------- - -Don't drop packets in the kernel for malformed protocol headers, bad -checksums, etc. This would prevent userspace from implementing a -simple Ethernet switch that forwards every packet. - -Instead, in such a case, include an attribute with "empty" content. -It doesn't matter if the empty content could be valid protocol values, -as long as those values are rarely seen in practice, because userspace -can always forward all packets with those values to userspace and -handle them individually. - -For example, consider a packet that contains an IP header that -indicates protocol 6 for TCP, but which is truncated just after the IP -header, so that the TCP header is missing. The flow key for this -packet would include a tcp attribute with all-zero src and dst, like -this: - - eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) - -As another example, consider a packet with an Ethernet type of 0x8100, -indicating that a VLAN TCI should follow, but which is truncated just -after the Ethernet type. The flow key for this packet would include -an all-zero-bits vlan and an empty encap attribute, like this: - - eth(...), eth_type(0x8100), vlan(0), encap() - -Unlike a TCP packet with source and destination ports 0, an -all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka -VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan -attribute expressly to allow this situation to be distinguished. -Thus, the flow key in this second example unambiguously indicates a -missing or malformed VLAN TCI. - -Other rules ------------ - -The other rules for flow keys are much less subtle: - - - Duplicate attributes are not allowed at a given nesting level. - - - Ordering of attributes is not significant. - - - When the kernel sends a given flow key to userspace, it always - composes it the same way. This allows userspace to hash and - compare entire flow keys that it may not be able to fully - interpret. - - -Coding rules -============ - -Compatibility -------------- - -Please implement the headers and codes for compatibility with older kernel -in linux/compat/ directory. All public functions should be exported using -EXPORT_SYMBOL macro. Public function replacing the same-named kernel -function should be prefixed with 'rpl_'. Otherwise, the function should be -prefixed with 'ovs_'. For special case when it is not possible to follow -this rule (e.g., the pskb_expand_head() function), the function name must -be added to linux/compat/build-aux/export-check-whitelist, otherwise, the -compilation check 'check-export-symbol' will fail. diff --git a/datapath/README.rst b/datapath/README.rst new file mode 100644 index 0000000..47e0e23 --- /dev/null +++ b/datapath/README.rst @@ -0,0 +1,265 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +======================================= +Open vSwitch Datapath Development Guide +======================================= + +The Open vSwitch kernel module allows flexible userspace control over +flow-level packet processing on selected network devices. It can be used to +implement a plain Ethernet switch, network device bonding, VLAN processing, +network access control, flow-based network control, and so on. + +The kernel module implements multiple "datapaths" (analogous to bridges), each +of which can have multiple "vports" (analogous to ports within a bridge). Each +datapath also has associated with it a "flow table" that userspace populates +with "flows" that map from keys based on packet headers and metadata to sets of +actions. The most common action forwards the packet to another vport; other +actions are also implemented. + +When a packet arrives on a vport, the kernel module processes it by extracting +its flow key and looking it up in the flow table. If there is a matching flow, +it executes the associated actions. If there is no match, it queues the packet +to userspace for processing (as part of its processing, userspace will likely +set up a flow to handle further packets of the same type entirely in-kernel). + +Flow Key Compatibility +---------------------- + +Network protocols evolve over time. New protocols become important and +existing protocols lose their prominence. For the Open vSwitch kernel module +to remain relevant, it must be possible for newer versions to parse additional +protocols as part of the flow key. It might even be desirable, someday, to +drop support for parsing protocols that have become obsolete. Therefore, the +Netlink interface to Open vSwitch is designed to allow carefully written +userspace applications to work with any version of the flow key, past or +future. + +To support this forward and backward compatibility, whenever the kernel module +passes a packet to userspace, it also passes along the flow key that it parsed +from the packet. Userspace then extracts its own notion of a flow key from the +packet and compares it against the kernel-provided version: + +- If userspace's notion of the flow key for the packet matches the kernel's, + then nothing special is necessary. + +- If the kernel's flow key includes more fields than the userspace version of + the flow key, for example if the kernel decoded IPv6 headers but userspace + stopped at the Ethernet type (because it does not understand IPv6), then + again nothing special is necessary. Userspace can still set up a flow in the + usual way, as long as it uses the kernel-provided flow key to do it. + +- If the userspace flow key includes more fields than the kernel's, for example + if userspace decoded an IPv6 header but the kernel stopped at the Ethernet + type, then userspace can forward the packet manually, without setting up a + flow in the kernel. This case is bad for performance because every packet + that the kernel considers part of the flow must go to userspace, but the + forwarding behavior is correct. (If userspace can determine that the values + of the extra fields would not affect forwarding behavior, then it could set + up a flow anyway.) + +How flow keys evolve over time is important to making this work, so +the following sections go into detail. + +Flow Key Format +--------------- + +A flow key is passed over a Netlink socket as a sequence of Netlink attributes. +Some attributes represent packet metadata, defined as any information about a +packet that cannot be extracted from the packet itself, e.g. the vport on which +the packet was received. Most attributes, however, are extracted from headers +within the packet, e.g. source and destination addresses from Ethernet, IP, or +TCP headers. + +The ```` header file defines the exact format of the flow +key attributes. For informal explanatory purposes here, we write them as +comma-separated strings, with parentheses indicating arguments and nesting. +For example, the following could represent a flow key corresponding to a TCP +packet that arrived on vport 1:: + + in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), + eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, + frag=no), tcp(src=49163, dst=80) + +Often we ellipsize arguments not important to the discussion, e.g.:: + + in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) + +Wildcarded Flow Key Format +-------------------------- + +A wildcarded flow is described with two sequences of Netlink attributes passed +over the Netlink socket. A flow key, exactly as described above, and an +optional corresponding flow mask. + +A wildcarded flow can represent a group of exact match flows. Each ``1`` bit +in the mask specifies an exact match with the corresponding bit in the flow key. +A ``0`` bit specifies a don't care bit, which will match either a ``1`` or +``0`` bit of an incoming packet. Using a wildcarded flow can improve the flow +set up rate by reducing the number of new flows that need to be processed by +the user space program. + +Support for the mask Netlink attribute is optional for both the kernel and user +space program. The kernel can ignore the mask attribute, installing an exact +match flow, or reduce the number of don't care bits in the kernel to less than +what was specified by the user space program. In this case, variations in bits +that the kernel does not implement will simply result in additional flow +setups. The kernel module will also work with user space programs that neither +support nor supply flow mask attributes. + +Since the kernel may ignore or modify wildcard bits, it can be difficult for +the userspace program to know exactly what matches are installed. There are two +possible approaches: reactively install flows as they miss the kernel flow +table (and therefore not attempt to determine wildcard changes at all) or use +the kernel's response messages to determine the installed wildcards. + +When interacting with userspace, the kernel should maintain the match portion +of the key exactly as originally installed. This will provides a handle to +identify the flow for all future operations. However, when reporting the mask +of an installed flow, the mask should include any restrictions imposed by the +kernel. + +The behavior when using overlapping wildcarded flows is undefined. It is the +responsibility of the user space program to ensure that any incoming packet can +match at most one flow, wildcarded or not. The current implementation performs +best-effort detection of overlapping wildcarded flows and may reject some but +not all of them. However, this behavior may change in future versions. + +Unique Flow Identifiers +----------------------- + +An alternative to using the original match portion of a key as the handle for +flow identification is a unique flow identifier, or "UFID". UFIDs are optional +for both the kernel and user space program. + +User space programs that support UFID are expected to provide it during flow +setup in addition to the flow, then refer to the flow using the UFID for all +future operations. The kernel is not required to index flows by the original +flow key if a UFID is specified. + +Basic Rule for Evolving Flow Keys +--------------------------------- + +Some care is needed to really maintain forward and backward compatibility for +applications that follow the rules listed under "Flow key compatibility" above. + +The basic rule is obvious: + + New network protocol support must only supplement existing flow key + attributes. It must not change the meaning of already defined flow key + attributes. + +This rule does have less-obvious consequences so it is worth working through a +few examples. Suppose, for example, that the kernel module did not already +implement VLAN parsing. Instead, it just interpreted the 802.1Q TPID +(``0x8100``) as the Ethertype then stopped parsing the packet. The flow key +for any packet with an 802.1Q header would look essentially like this, ignoring +metadata:: + + eth(...), eth_type(0x8100) + +Naively, to add VLAN support, it makes sense to add a new "vlan" flow key +attribute to contain the VLAN tag, then continue to decode the encapsulated +headers beyond the VLAN tag using the existing field definitions. With this +change, a TCP packet in VLAN 10 would have a flow key much like this:: + + eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) + +But this change would negatively affect a userspace application that has not +been updated to understand the new "vlan" flow key attribute. The application +could, following the flow compatibility rules above, ignore the "vlan" +attribute that it does not understand and therefore assume that the flow +contained IP packets. This is a bad assumption (the flow only contains IP +packets if one parses and skips over the 802.1Q header) and it could cause the +application's behavior to change across kernel versions even though it follows +the compatibility rules. + +The solution is to use a set of nested attributes. This is, for example, why +802.1Q support uses nested attributes. A TCP packet in VLAN 10 is actually +expressed as:: + + eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), + ip(proto=6, ...), tcp(...))) + +Notice how the ``eth_type``, ``ip``, and ``tcp`` flow key attributes are nested +inside the ``encap`` attribute. Thus, an application that does not understand +the ``vlan`` key will not see either of those attributes and therefore will not +misinterpret them. (Also, the outer ``eth_type`` is still ``0x8100``, not +changed to ``0x0800``) + +Handling Malformed Packets +-------------------------- + +Don't drop packets in the kernel for malformed protocol headers, bad checksums, +etc. This would prevent userspace from implementing a simple Ethernet switch +that forwards every packet. + +Instead, in such a case, include an attribute with "empty" content. It doesn't +matter if the empty content could be valid protocol values, as long as those +values are rarely seen in practice, because userspace can always forward all +packets with those values to userspace and handle them individually. + +For example, consider a packet that contains an IP header that indicates +protocol 6 for TCP, but which is truncated just after the IP header, so that +the TCP header is missing. The flow key for this packet would include a tcp +attribute with all-zero ``src`` and ``dst``, like this:: + + eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) + +As another example, consider a packet with an Ethernet type of 0x8100, +indicating that a VLAN TCI should follow, but which is truncated just after the +Ethernet type. The flow key for this packet would include an all-zero-bits +vlan and an empty encap attribute, like this:: + + eth(...), eth_type(0x8100), vlan(0), encap() + +Unlike a TCP packet with source and destination ports 0, an all-zero-bits VLAN +TCI is not that rare, so the CFI bit (aka VLAN_TAG_PRESENT inside the kernel) +is ordinarily set in a vlan attribute expressly to allow this situation to be +distinguished. Thus, the flow key in this second example unambiguously +indicates a missing or malformed VLAN TCI. + +Other Rules +----------- + +The other rules for flow keys are much less subtle: + +- Duplicate attributes are not allowed at a given nesting level. + +- Ordering of attributes is not significant. + +- When the kernel sends a given flow key to userspace, it always composes it + the same way. This allows userspace to hash and compare entire flow keys + that it may not be able to fully interpret. + +Coding Rules +------------ + +Implement the headers and codes for compatibility with older kernel in +``linux/compat/`` directory. All public functions should be exported using +``EXPORT_SYMBOL`` macro. Public function replacing the same-named kernel +function should be prefixed with ``rpl_``. Otherwise, the function should be +prefixed with ``ovs_``. For special case when it is not possible to follow +this rule (e.g., the ``pskb_expand_head()`` function), the function name must +be added to ``linux/compat/build-aux/export-check-whitelist``, otherwise, the +compilation check ``check-export-symbol`` will fail. diff --git a/lib/dpif.h b/lib/dpif.h index cade046..e69087d 100644 --- a/lib/dpif.h +++ b/lib/dpif.h @@ -113,7 +113,7 @@ * * In Open vSwitch userspace, "struct flow" is the typical way to describe * a flow, but the datapath interface uses a different data format to - * allow ABI forward- and backward-compatibility. datapath/README.md + * allow ABI forward- and backward-compatibility. datapath/README.rst * describes the rationale and design. Refer to OVS_KEY_ATTR_* and * "struct ovs_key_*" in include/odp-netlink.h for details. * lib/odp-util.h defines several functions for working with these flows. diff --git a/vtep/README.ovs-vtep.rst b/vtep/README.ovs-vtep.rst index 9e9883b..75f03d0 100644 --- a/vtep/README.ovs-vtep.rst +++ b/vtep/README.ovs-vtep.rst @@ -154,7 +154,7 @@ using the debian packages as mentioned in step 2 of the "Requirements" section. 6. Start the VTEP emulator. If you installed the components following the `installation guide <../INSTALL.rst>`__ file, run the following from the - same directory as this README.md: + same directory as this README: ::