deleted file mode 100644
@@ -1,1093 +0,0 @@
-Design Decisions In Open vSwitch
-================================
-
-This document describes design decisions that went into implementing
-Open vSwitch. While we believe these to be reasonable decisions, it is
-impossible to predict how Open vSwitch will be used in all environments.
-Understanding assumptions made by Open vSwitch is critical to a
-successful deployment. The end of this document contains contact
-information that can be used to let us know how we can make Open vSwitch
-more generally useful.
-
-Asynchronous Messages
-=====================
-
-Over time, Open vSwitch has added many knobs that control whether a
-given controller receives OpenFlow asynchronous messages. This
-section describes how all of these features interact.
-
-First, a service controller never receives any asynchronous messages
-unless it changes its miss_send_len from the service controller
-default of zero in one of the following ways:
-
- - Sending an OFPT_SET_CONFIG message with nonzero miss_send_len.
-
- - Sending any NXT_SET_ASYNC_CONFIG message: as a side effect, this
- message changes the miss_send_len to
- OFP_DEFAULT_MISS_SEND_LEN (128) for service controllers.
-
-Second, OFPT_FLOW_REMOVED and NXT_FLOW_REMOVED messages are generated
-only if the flow that was removed had the OFPFF_SEND_FLOW_REM flag
-set.
-
-Third, OFPT_PACKET_IN and NXT_PACKET_IN messages are sent only to
-OpenFlow controller connections that have the correct connection ID
-(see "struct nx_controller_id" and "struct nx_action_controller"):
-
- - For packet-in messages generated by a NXAST_CONTROLLER action,
- the controller ID specified in the action.
-
- - For other packet-in messages, controller ID zero. (This is the
- default ID when an OpenFlow controller does not configure one.)
-
-Finally, Open vSwitch consults a per-connection table indexed by the
-message type, reason code, and current role. The following table
-shows how this table is initialized by default when an OpenFlow
-connection is made. An entry labeled "yes" means that the message is
-sent, an entry labeled "---" means that the message is suppressed.
-
-```
- master/
- message and reason code other slave
- ---------------------------------------- ------- -----
- OFPT_PACKET_IN / NXT_PACKET_IN
- OFPR_NO_MATCH yes ---
- OFPR_ACTION yes ---
- OFPR_INVALID_TTL --- ---
- OFPR_ACTION_SET (OF1.4+) yes ---
- OFPR_GROUP (OF1.4+) yes ---
-
- OFPT_FLOW_REMOVED / NXT_FLOW_REMOVED
- OFPRR_IDLE_TIMEOUT yes ---
- OFPRR_HARD_TIMEOUT yes ---
- OFPRR_DELETE yes ---
- OFPRR_GROUP_DELETE (OF1.4+) yes ---
- OFPRR_METER_DELETE (OF1.4+) yes ---
- OFPRR_EVICTION (OF1.4+) yes ---
-
- OFPT_PORT_STATUS
- OFPPR_ADD yes yes
- OFPPR_DELETE yes yes
- OFPPR_MODIFY yes yes
-
- OFPT_ROLE_REQUEST / OFPT_ROLE_REPLY (OF1.4+)
- OFPCRR_MASTER_REQUEST --- ---
- OFPCRR_CONFIG --- ---
- OFPCRR_EXPERIMENTER --- ---
-
- OFPT_TABLE_STATUS (OF1.4+)
- OFPTR_VACANCY_DOWN --- ---
- OFPTR_VACANCY_UP --- ---
-
- OFPT_REQUESTFORWARD (OF1.4+)
- OFPRFR_GROUP_MOD --- ---
- OFPRFR_METER_MOD --- ---
-```
-
-The NXT_SET_ASYNC_CONFIG message directly sets all of the values in
-this table for the current connection. The
-OFPC_INVALID_TTL_TO_CONTROLLER bit in the OFPT_SET_CONFIG message
-controls the setting for OFPR_INVALID_TTL for the "master" role.
-
-
-OFPAT_ENQUEUE
-=============
-
-The OpenFlow 1.0 specification requires the output port of the OFPAT_ENQUEUE
-action to "refer to a valid physical port (i.e. < OFPP_MAX) or OFPP_IN_PORT".
-Although OFPP_LOCAL is not less than OFPP_MAX, it is an 'internal' port which
-can have QoS applied to it in Linux. Since we allow the OFPAT_ENQUEUE to apply
-to 'internal' ports whose port numbers are less than OFPP_MAX, we interpret
-OFPP_LOCAL as a physical port and support OFPAT_ENQUEUE on it as well.
-
-
-OFPT_FLOW_MOD
-=============
-
-The OpenFlow specification for the behavior of OFPT_FLOW_MOD is
-confusing. The following tables summarize the Open vSwitch
-implementation of its behavior in the following categories:
-
- - "match on priority": Whether the flow_mod acts only on flows
- whose priority matches that included in the flow_mod message.
-
- - "match on out_port": Whether the flow_mod acts only on flows
- that output to the out_port included in the flow_mod message (if
- out_port is not OFPP_NONE). OpenFlow 1.1 and later have a
- similar feature (not listed separately here) for out_group.
-
- - "match on flow_cookie": Whether the flow_mod acts only on flows
- whose flow_cookie matches an optional controller-specified value
- and mask.
-
- - "updates flow_cookie": Whether the flow_mod changes the
- flow_cookie of the flow or flows that it matches to the
- flow_cookie included in the flow_mod message.
-
- - "updates OFPFF_ flags": Whether the flow_mod changes the
- OFPFF_SEND_FLOW_REM flag of the flow or flows that it matches to
- the setting included in the flags of the flow_mod message.
-
- - "honors OFPFF_CHECK_OVERLAP": Whether the OFPFF_CHECK_OVERLAP
- flag in the flow_mod is significant.
-
- - "updates idle_timeout" and "updates hard_timeout": Whether the
- idle_timeout and hard_timeout in the flow_mod, respectively,
- have an effect on the flow or flows matched by the flow_mod.
-
- - "updates idle timer": Whether the flow_mod resets the per-flow
- timer that measures how long a flow has been idle.
-
- - "updates hard timer": Whether the flow_mod resets the per-flow
- timer that measures how long it has been since a flow was
- modified.
-
- - "zeros counters": Whether the flow_mod resets per-flow packet
- and byte counters to zero.
-
- - "may add a new flow": Whether the flow_mod may add a new flow to
- the flow table. (Obviously this is always true for "add"
- commands but in some OpenFlow versions "modify" and
- "modify-strict" can also add new flows.)
-
- - "sends flow_removed message": Whether the flow_mod generates a
- flow_removed message for the flow or flows that it affects.
-
-An entry labeled "yes" means that the flow mod type does have the
-indicated behavior, "---" means that it does not, an empty cell means
-that the property is not applicable, and other values are explained
-below the table.
-
-OpenFlow 1.0
-------------
-
-```
- MODIFY DELETE
- ADD MODIFY STRICT DELETE STRICT
- === ====== ====== ====== ======
-match on priority yes --- yes --- yes
-match on out_port --- --- --- yes yes
-match on flow_cookie --- --- --- --- ---
-match on table_id --- --- --- --- ---
-controller chooses table_id --- --- ---
-updates flow_cookie yes yes yes
-updates OFPFF_SEND_FLOW_REM yes + +
-honors OFPFF_CHECK_OVERLAP yes + +
-updates idle_timeout yes + +
-updates hard_timeout yes + +
-resets idle timer yes + +
-resets hard timer yes yes yes
-zeros counters yes + +
-may add a new flow yes yes yes
-sends flow_removed message --- --- --- % %
-
-(+) "modify" and "modify-strict" only take these actions when they
- create a new flow, not when they update an existing flow.
-
-(%) "delete" and "delete_strict" generates a flow_removed message if
- the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set.
- (Each controller can separately control whether it wants to
- receive the generated messages.)
-```
-
-OpenFlow 1.1
-------------
-
-OpenFlow 1.1 makes these changes:
-
- - The controller now must specify the table_id of the flow match
- searched and into which a flow may be inserted. Behavior for a
- table_id of 255 is undefined.
-
- - A flow_mod, except an "add", can now match on the flow_cookie.
-
- - When a flow_mod matches on the flow_cookie, "modify" and
- "modify-strict" never insert a new flow.
-
-```
- MODIFY DELETE
- ADD MODIFY STRICT DELETE STRICT
- === ====== ====== ====== ======
-match on priority yes --- yes --- yes
-match on out_port --- --- --- yes yes
-match on flow_cookie --- yes yes yes yes
-match on table_id yes yes yes yes yes
-controller chooses table_id yes yes yes
-updates flow_cookie yes --- ---
-updates OFPFF_SEND_FLOW_REM yes + +
-honors OFPFF_CHECK_OVERLAP yes + +
-updates idle_timeout yes + +
-updates hard_timeout yes + +
-resets idle timer yes + +
-resets hard timer yes yes yes
-zeros counters yes + +
-may add a new flow yes # #
-sends flow_removed message --- --- --- % %
-
-(+) "modify" and "modify-strict" only take these actions when they
- create a new flow, not when they update an existing flow.
-
-(%) "delete" and "delete_strict" generates a flow_removed message if
- the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set.
- (Each controller can separately control whether it wants to
- receive the generated messages.)
-
-(#) "modify" and "modify-strict" only add a new flow if the flow_mod
- does not match on any bits of the flow cookie
-```
-
-OpenFlow 1.2
-------------
-
-OpenFlow 1.2 makes these changes:
-
- - Only "add" commands ever add flows, "modify" and "modify-strict"
- never do.
-
- - A new flag OFPFF_RESET_COUNTS now controls whether "modify" and
- "modify-strict" reset counters, whereas previously they never
- reset counters (except when they inserted a new flow).
-
-```
- MODIFY DELETE
- ADD MODIFY STRICT DELETE STRICT
- === ====== ====== ====== ======
-match on priority yes --- yes --- yes
-match on out_port --- --- --- yes yes
-match on flow_cookie --- yes yes yes yes
-match on table_id yes yes yes yes yes
-controller chooses table_id yes yes yes
-updates flow_cookie yes --- ---
-updates OFPFF_SEND_FLOW_REM yes --- ---
-honors OFPFF_CHECK_OVERLAP yes --- ---
-updates idle_timeout yes --- ---
-updates hard_timeout yes --- ---
-resets idle timer yes --- ---
-resets hard timer yes yes yes
-zeros counters yes & &
-may add a new flow yes --- ---
-sends flow_removed message --- --- --- % %
-
-(%) "delete" and "delete_strict" generates a flow_removed message if
- the deleted flow or flows have the OFPFF_SEND_FLOW_REM flag set.
- (Each controller can separately control whether it wants to
- receive the generated messages.)
-
-(&) "modify" and "modify-strict" reset counters if the
- OFPFF_RESET_COUNTS flag is specified.
-```
-
-OpenFlow 1.3
-------------
-
-OpenFlow 1.3 makes these changes:
-
- - Behavior for a table_id of 255 is now defined, for "delete" and
- "delete-strict" commands, as meaning to delete from all tables.
- A table_id of 255 is now explicitly invalid for other commands.
-
- - New flags OFPFF_NO_PKT_COUNTS and OFPFF_NO_BYT_COUNTS for "add"
- operations.
-
-The table for 1.3 is the same as the one shown above for 1.2.
-
-
-OpenFlow 1.4
------------
-
-OpenFlow 1.4 makes these changes:
-
- - Adds the "importance" field to flow_mods, but it does not
- explicitly specify which kinds of flow_mods set the importance.
- For consistency, Open vSwitch uses the same rule for importance
- as for idle_timeout and hard_timeout, that is, only an "ADD"
- flow_mod sets the importance. (This issue has been filed with
- the ONF as EXT-496.)
-
- - Eviction Mechanism to automatically delete entries of lower
- importance to make space for newer entries.
-
-
-OpenFlow 1.4 Bundles
-====================
-
-Open vSwitch makes all flow table modifications atomically, i.e., any
-datapath packet only sees flow table configurations either before or
-after any change made by any flow_mod. For example, if a controller
-removes all flows with a single OpenFlow "flow_mod", no packet sees an
-intermediate version of the OpenFlow pipeline where only some of the
-flows have been deleted.
-
-It should be noted that Open vSwitch caches datapath flows, and that
-the cached flows are NOT flushed immediately when a flow table
-changes. Instead, the datapath flows are revalidated against the new
-flow table as soon as possible, and usually within one second of the
-modification. This design amortizes the cost of datapath cache
-flushing across multiple flow table changes, and has a significant
-performance effect during simultaneous heavy flow table churn and high
-traffic load. This means that different cached datapath flows may
-have been computed based on a different flow table configurations, but
-each of the datapath flows is guaranteed to have been computed over a
-coherent view of the flow tables, as described above.
-
-With OpenFlow 1.4 bundles this atomicity can be extended across an
-arbitrary set of flow_mods. Bundles are supported for flow_mod and
-port_mod messages only. For flow_mods, both 'atomic' and 'ordered'
-bundle flags are trivially supported, as all bundled messages are
-executed in the order they were added and all flow table modifications
-are now atomic to the datapath. Port mods may not appear in atomic
-bundles, as port status modifications are not atomic.
-
-To support bundles, ovs-ofctl has a '--bundle' option that makes the
-flow mod commands ('add-flow', 'add-flows', 'mod-flows', 'del-flows',
-and 'replace-flows') use an OpenFlow 1.4 bundle to operate the
-modifications as a single atomic transaction. If any of the flow mods
-in a transaction fail, none of them are executed. All flow mods in a
-bundle appear to datapath lookups simultaneously.
-
-Furthermore, ovs-ofctl 'add-flow' and 'add-flows' commands now accept
-arbitrary flow mods as an input by allowing the flow specification to
-start with an explicit 'add', 'modify', 'modify_strict', 'delete', or
-'delete_strict' keyword. A missing keyword is treated as 'add', so
-this is fully backwards compatible. With the new '--bundle' option
-all the flow mods are executed as a single atomic transaction using an
-OpenFlow 1.4 bundle. Without the '--bundle' option the flow mods are
-executed in order up to the first failing flow_mod, and in case of an
-error the earlier successful flow_mods are not rolled back.
-
-
-OFPT_PACKET_IN
-==============
-
-The OpenFlow 1.1 specification for OFPT_PACKET_IN is confusing. The
-definition in OF1.1 openflow.h is[*]:
-
-```
- /* Packet received on port (datapath -> controller). */
- struct ofp_packet_in {
- struct ofp_header header;
- uint32_t buffer_id; /* ID assigned by datapath. */
- uint32_t in_port; /* Port on which frame was received. */
- uint32_t in_phy_port; /* Physical Port on which frame was received. */
- uint16_t total_len; /* Full length of frame. */
- uint8_t reason; /* Reason packet is being sent (one of OFPR_*) */
- uint8_t table_id; /* ID of the table that was looked up */
- uint8_t data[0]; /* Ethernet frame, halfway through 32-bit word,
- so the IP header is 32-bit aligned. The
- amount of data is inferred from the length
- field in the header. Because of padding,
- offsetof(struct ofp_packet_in, data) ==
- sizeof(struct ofp_packet_in) - 2. */
- };
- OFP_ASSERT(sizeof(struct ofp_packet_in) == 24);
-```
-
-The confusing part is the comment on the data[] member. This comment
-is a leftover from OF1.0 openflow.h, in which the comment was correct:
-sizeof(struct ofp_packet_in) is 20 in OF1.0 and offsetof(struct
-ofp_packet_in, data) is 18. When OF1.1 was written, the structure
-members were changed but the comment was carelessly not updated, and
-the comment became wrong: sizeof(struct ofp_packet_in) and
-offsetof(struct ofp_packet_in, data) are both 24 in OF1.1.
-
-That leaves the question of how to implement ofp_packet_in in OF1.1.
-The OpenFlow reference implementation for OF1.1 does not include any
-padding, that is, the first byte of the encapsulated frame immediately
-follows the 'table_id' member without a gap. Open vSwitch therefore
-implements it the same way for compatibility.
-
-For an earlier discussion, please see the thread archived at:
-https://mailman.stanford.edu/pipermail/openflow-discuss/2011-August/002604.html
-
-[*] The quoted definition is directly from OF1.1. Definitions used
- inside OVS omit the 8-byte ofp_header members, so the sizes in
- this discussion are 8 bytes larger than those declared in OVS
- header files.
-
-
-VLAN Matching
-=============
-
-The 802.1Q VLAN header causes more trouble than any other 4 bytes in
-networking. More specifically, three versions of OpenFlow and Open
-vSwitch have among them four different ways to match the contents and
-presence of the VLAN header. The following table describes how each
-version works.
-
- Match NXM OF1.0 OF1.1 OF1.2
- ----- --------- ----------- ----------- ------------
- [1] 0000/0000 ????/1,??/? ????/1,??/? 0000/0000,--
- [2] 0000/ffff ffff/0,??/? ffff/0,??/? 0000/ffff,--
- [3] 1xxx/1fff 0xxx/0,??/1 0xxx/0,??/1 1xxx/ffff,--
- [4] z000/f000 ????/1,0y/0 fffe/0,0y/0 1000/1000,0y
- [5] zxxx/ffff 0xxx/0,0y/0 0xxx/0,0y/0 1xxx/ffff,0y
- [6] 0000/0fff <none> <none> <none>
- [7] 0000/f000 <none> <none> <none>
- [8] 0000/efff <none> <none> <none>
- [9] 1001/1001 <none> <none> 1001/1001,--
- [10] 3000/3000 <none> <none> <none>
- [11] 1000/1000 <none> fffe/0,??/1 1000/1000,--
-
-Each column is interpreted as follows.
-
- - Match: See the list below.
-
- - NXM: xxxx/yyyy means NXM_OF_VLAN_TCI_W with value xxxx and mask
- yyyy. A mask of 0000 is equivalent to omitting
- NXM_OF_VLAN_TCI(_W), a mask of ffff is equivalent to
- NXM_OF_VLAN_TCI.
-
- - OF1.0 and OF1.1: wwww/x,yy/z means dl_vlan wwww, OFPFW_DL_VLAN x,
- dl_vlan_pcp yy, and OFPFW_DL_VLAN_PCP z. If OFPFW_DL_VLAN or
- OFPFW_DL_VLAN_PCP is 1, the corresponding field value is
- wildcarded, otherwise it is matched. ? means that the given bits
- are ignored (their conventional values are 0000/x,00/0 in OF1.0,
- 0000/x,00/1 in OF1.1; x is never ignored). <none> means that the
- given match is not supported.
-
- - OF1.2: xxxx/yyyy,zz means OXM_OF_VLAN_VID_W with value xxxx and
- mask yyyy, and OXM_OF_VLAN_PCP (which is not maskable) with
- value zz. A mask of 0000 is equivalent to omitting
- OXM_OF_VLAN_VID(_W), a mask of ffff is equivalent to
- OXM_OF_VLAN_VID. -- means that OXM_OF_VLAN_PCP is omitted.
- <none> means that the given match is not supported.
-
-The matches are:
-
- [1] Matches any packet, that is, one without an 802.1Q header or with
- an 802.1Q header with any TCI value.
-
- [2] Matches only packets without an 802.1Q header.
-
- NXM: Any match with (vlan_tci == 0) and (vlan_tci_mask & 0x1000)
- != 0 is equivalent to the one listed in the table.
-
- OF1.0: The spec doesn't define behavior if dl_vlan is set to
- 0xffff and OFPFW_DL_VLAN_PCP is not set.
-
- OF1.1: The spec says explicitly to ignore dl_vlan_pcp when
- dl_vlan is set to 0xffff.
-
- OF1.2: The spec doesn't say what should happen if (vlan_vid == 0)
- and (vlan_vid_mask & 0x1000) != 0 but (vlan_vid_mask != 0x1000),
- but it would be straightforward to also interpret as [2].
-
- [3] Matches only packets that have an 802.1Q header with VID xxx (and
- any PCP).
-
- [4] Matches only packets that have an 802.1Q header with PCP y (and
- any VID).
-
- NXM: z is ((y << 1) | 1).
-
- OF1.0: The spec isn't very clear, but OVS implements it this way.
-
- OF1.2: Presumably other masks such that (vlan_vid_mask & 0x1fff)
- == 0x1000 would also work, but the spec doesn't define their
- behavior.
-
- [5] Matches only packets that have an 802.1Q header with VID xxx and
- PCP y.
-
- NXM: z is ((y << 1) | 1).
-
- OF1.2: Presumably other masks such that (vlan_vid_mask & 0x1fff)
- == 0x1fff would also work.
-
- [6] Matches packets with no 802.1Q header or with an 802.1Q header
- with a VID of 0. Only possible with NXM.
-
- [7] Matches packets with no 802.1Q header or with an 802.1Q header
- with a PCP of 0. Only possible with NXM.
-
- [8] Matches packets with no 802.1Q header or with an 802.1Q header
- with both VID and PCP of 0. Only possible with NXM.
-
- [9] Matches only packets that have an 802.1Q header with an
- odd-numbered VID (and any PCP). Only possible with NXM and
- OF1.2. (This is just an example; one can match on any desired
- VID bit pattern.)
-
-[10] Matches only packets that have an 802.1Q header with an
- odd-numbered PCP (and any VID). Only possible with NXM. (This
- is just an example; one can match on any desired VID bit
- pattern.)
-
-[11] Matches any packet with an 802.1Q header, regardless of VID or
- PCP.
-
-Additional notes:
-
- - OF1.2: The top three bits of OXM_OF_VLAN_VID are fixed to zero,
- so bits 13, 14, and 15 in the masks listed in the table may be
- set to arbitrary values, as long as the corresponding value bits
- are also zero. The suggested ffff mask for [2], [3], and [5]
- allows a shorter OXM representation (the mask is omitted) than
- the minimal 1fff mask.
-
-
-Flow Cookies
-============
-
-OpenFlow 1.0 and later versions have the concept of a "flow cookie",
-which is a 64-bit integer value attached to each flow. The treatment
-of the flow cookie has varied greatly across OpenFlow versions,
-however.
-
-In OpenFlow 1.0:
-
- - OFPFC_ADD set the cookie in the flow that it added.
-
- - OFPFC_MODIFY and OFPFC_MODIFY_STRICT updated the cookie for
- the flow or flows that it modified.
-
- - OFPST_FLOW messages included the flow cookie.
-
- - OFPT_FLOW_REMOVED messages reported the cookie of the flow
- that was removed.
-
-OpenFlow 1.1 made the following changes:
-
- - Flow mod operations OFPFC_MODIFY, OFPFC_MODIFY_STRICT,
- OFPFC_DELETE, and OFPFC_DELETE_STRICT, plus flow stats
- requests and aggregate stats requests, gained the ability to
- match on flow cookies with an arbitrary mask.
-
- - OFPFC_MODIFY and OFPFC_MODIFY_STRICT were changed to add a
- new flow, in the case of no match, only if the flow table
- modification operation did not match on the cookie field.
- (In OpenFlow 1.0, modify operations always added a new flow
- when there was no match.)
-
- - OFPFC_MODIFY and OFPFC_MODIFY_STRICT no longer updated flow
- cookies.
-
-OpenFlow 1.2 made the following changes:
-
- - OFPC_MODIFY and OFPFC_MODIFY_STRICT were changed to never
- add a new flow, regardless of whether the flow cookie was
- used for matching.
-
-Open vSwitch support for OpenFlow 1.0 implements the OpenFlow 1.0
-behavior with the following extensions:
-
- - An NXM extension field NXM_NX_COOKIE(_W) allows the NXM
- versions of OFPFC_MODIFY, OFPFC_MODIFY_STRICT, OFPFC_DELETE,
- and OFPFC_DELETE_STRICT flow_mods, plus flow stats requests
- and aggregate stats requests, to match on flow cookies with
- arbitrary masks. This is much like the equivalent OpenFlow
- 1.1 feature.
-
- - Like OpenFlow 1.1, OFPC_MODIFY and OFPFC_MODIFY_STRICT add a
- new flow if there is no match and the mask is zero (or not
- given).
-
- - The "cookie" field in OFPT_FLOW_MOD and NXT_FLOW_MOD messages
- is used as the cookie value for OFPFC_ADD commands, as
- described in OpenFlow 1.0. For OFPFC_MODIFY and
- OFPFC_MODIFY_STRICT commands, the "cookie" field is used as a
- new cookie for flows that match unless it is UINT64_MAX, in
- which case the flow's cookie is not updated.
-
- - NXT_PACKET_IN (the Nicira extended version of
- OFPT_PACKET_IN) reports the cookie of the rule that
- generated the packet, or all-1-bits if no rule generated the
- packet. (Older versions of OVS used all-0-bits instead of
- all-1-bits.)
-
-The following table shows the handling of different protocols when
-receiving OFPFC_MODIFY and OFPFC_MODIFY_STRICT messages. A mask of 0
-indicates either an explicit mask of zero or an implicit one by not
-specifying the NXM_NX_COOKIE(_W) field.
-
-```
- Match Update Add on miss Add on miss
- cookie cookie mask!=0 mask==0
- ====== ====== =========== ===========
-OpenFlow 1.0 no yes <always add on miss>
-OpenFlow 1.1 yes no no yes
-OpenFlow 1.2 yes no no no
-NXM yes yes* no yes
-
-* Updates the flow's cookie unless the "cookie" field is UINT64_MAX.
-```
-
-Multiple Table Support
-======================
-
-OpenFlow 1.0 has only rudimentary support for multiple flow tables.
-Notably, OpenFlow 1.0 does not allow the controller to specify the
-flow table to which a flow is to be added. Open vSwitch adds an
-extension for this purpose, which is enabled on a per-OpenFlow
-connection basis using the NXT_FLOW_MOD_TABLE_ID message. When the
-extension is enabled, the upper 8 bits of the 'command' member in an
-OFPT_FLOW_MOD or NXT_FLOW_MOD message designates the table to which a
-flow is to be added.
-
-The Open vSwitch software switch implementation offers 255 flow
-tables. On packet ingress, only the first flow table (table 0) is
-searched, and the contents of the remaining tables are not considered
-in any way. Tables other than table 0 only come into play when an
-NXAST_RESUBMIT_TABLE action specifies another table to search.
-
-Tables 128 and above are reserved for use by the switch itself.
-Controllers should use only tables 0 through 127.
-
-
-OFPTC_* Table Configuration
-===========================
-
-This section covers the history of the OFPTC_* table configuration
-bits across OpenFlow versions.
-
-OpenFlow 1.0 flow tables had fixed configurations.
-
-OpenFlow 1.1 enabled controllers to configure behavior upon flow table
-miss and added the OFPTC_MISS_* constants for that purpose. OFPTC_*
-did not control anything else but it was nevertheless conceptualized
-as a set of bit-fields instead of an enum. OF1.1 added the
-OFPT_TABLE_MOD message to set OFPTC_MISS_* for a flow table and added
-the 'config' field to the OFPST_TABLE reply to report the current
-setting.
-
-OpenFlow 1.2 did not change anything in this regard.
-
-OpenFlow 1.3 switched to another means to changing flow table miss
-behavior and deprecated OFPTC_MISS_* without adding any more OFPTC_*
-constants. This meant that OFPT_TABLE_MOD now had no purpose at all,
-but OF1.3 kept it around "for backward compatibility with older and
-newer versions of the specification." At the same time, OF1.3
-introduced a new message OFPMP_TABLE_FEATURES that included a field
-'config' documented as reporting the OFPTC_* values set with
-OFPT_TABLE_MOD; of course this served no real purpose because no
-OFPTC_* values are defined. OF1.3 did remove the OFPTC_* field from
-OFPMP_TABLE (previously named OFPST_TABLE).
-
-OpenFlow 1.4 defined two new OFPTC_* constants, OFPTC_EVICTION and
-OFPTC_VACANCY_EVENTS, using bits that did not overlap with
-OFPTC_MISS_* even though those bits had not been defined since OF1.2.
-OFPT_TABLE_MOD still controlled these settings. The field for OFPTC_*
-values in OFPMP_TABLE_FEATURES was renamed from 'config' to
-'capabilities' and documented as reporting the flags that are
-supported in a OFPT_TABLE_MOD message. The OFPMP_TABLE_DESC message
-newly added in OF1.4 reported the OFPTC_* setting.
-
-OpenFlow 1.5 did not change anything in this regard.
-
-The following table summarizes. The columns say:
-
- - OpenFlow version(s).
-
- - The OFPTC_* flags defined in those versions.
-
- - Whether OFPT_TABLE_MOD can modify OFPTC_* flags.
-
- - Whether OFPST_TABLE/OFPMP_TABLE reports the OFPTC_* flags.
-
- - What OFPMP_TABLE_FEATURES reports (if it exists): either the
- current configuration or the switch's capabilities.
-
- - Whether OFPMP_TABLE_DESC reports the current configuration.
-
-OpenFlow OFPTC_* flags TABLE_MOD stats? TABLE_FEATURES TABLE_DESC
---------- ----------------------- --------- ------ -------------- ----------
-OF1.0 none no[*][+] no[*] nothing[*][+] no[*][+]
-OF1.1/1.2 MISS_* yes yes nothing[+] no[+]
-OF1.3 none yes[*] no[*] config[*] no[*][+]
-OF1.4/1.5 EVICTION/VACANCY_EVENTS yes no capabilities yes
-
- [*] Nothing to report/change anyway.
-
- [+] No such message.
-
-
-IPv6
-====
-
-Open vSwitch supports stateless handling of IPv6 packets. Flows can be
-written to support matching TCP, UDP, and ICMPv6 headers within an IPv6
-packet. Deeper matching of some Neighbor Discovery messages is also
-supported.
-
-IPv6 was not designed to interact well with middle-boxes. This,
-combined with Open vSwitch's stateless nature, have affected the
-processing of IPv6 traffic, which is detailed below.
-
-Extension Headers
------------------
-
-The base IPv6 header is incredibly simple with the intention of only
-containing information relevant for routing packets between two
-endpoints. IPv6 relies heavily on the use of extension headers to
-provide any other functionality. Unfortunately, the extension headers
-were designed in such a way that it is impossible to move to the next
-header (including the layer-4 payload) unless the current header is
-understood.
-
-Open vSwitch will process the following extension headers and continue
-to the next header:
-
- * Fragment (see the next section)
- * AH (Authentication Header)
- * Hop-by-Hop Options
- * Routing
- * Destination Options
-
-When a header is encountered that is not in that list, it is considered
-"terminal". A terminal header's IPv6 protocol value is stored in
-"nw_proto" for matching purposes. If a terminal header is TCP, UDP, or
-ICMPv6, the packet will be further processed in an attempt to extract
-layer-4 information.
-
-Fragments
----------
-
-IPv6 requires that every link in the internet have an MTU of 1280 octets
-or greater (RFC 2460). As such, a terminal header (as described above in
-"Extension Headers") in the first fragment should generally be
-reachable. In this case, the terminal header's IPv6 protocol type is
-stored in the "nw_proto" field for matching purposes. If a terminal
-header cannot be found in the first fragment (one with a fragment offset
-of zero), the "nw_proto" field is set to 0. Subsequent fragments (those
-with a non-zero fragment offset) have the "nw_proto" field set to the
-IPv6 protocol type for fragments (44).
-
-Jumbograms
-----------
-
-An IPv6 jumbogram (RFC 2675) is a packet containing a payload longer
-than 65,535 octets. A jumbogram is only relevant in subnets with a link
-MTU greater than 65,575 octets, and are not required to be supported on
-nodes that do not connect to link with such large MTUs. Currently, Open
-vSwitch doesn't process jumbograms.
-
-
-In-Band Control
-===============
-
-Motivation
-----------
-
-An OpenFlow switch must establish and maintain a TCP network
-connection to its controller. There are two basic ways to categorize
-the network that this connection traverses: either it is completely
-separate from the one that the switch is otherwise controlling, or its
-path may overlap the network that the switch controls. We call the
-former case "out-of-band control", the latter case "in-band control".
-
-Out-of-band control has the following benefits:
-
- - Simplicity: Out-of-band control slightly simplifies the switch
- implementation.
-
- - Reliability: Excessive switch traffic volume cannot interfere
- with control traffic.
-
- - Integrity: Machines not on the control network cannot
- impersonate a switch or a controller.
-
- - Confidentiality: Machines not on the control network cannot
- snoop on control traffic.
-
-In-band control, on the other hand, has the following advantages:
-
- - No dedicated port: There is no need to dedicate a physical
- switch port to control, which is important on switches that have
- few ports (e.g. wireless routers, low-end embedded platforms).
-
- - No dedicated network: There is no need to build and maintain a
- separate control network. This is important in many
- environments because it reduces proliferation of switches and
- wiring.
-
-Open vSwitch supports both out-of-band and in-band control. This
-section describes the principles behind in-band control. See the
-description of the Controller table in ovs-vswitchd.conf.db(5) to
-configure OVS for in-band control.
-
-Principles
-----------
-
-The fundamental principle of in-band control is that an OpenFlow
-switch must recognize and switch control traffic without involving the
-OpenFlow controller. All the details of implementing in-band control
-are special cases of this principle.
-
-The rationale for this principle is simple. If the switch does not
-handle in-band control traffic itself, then it will be caught in a
-contradiction: it must contact the controller, but it cannot, because
-only the controller can set up the flows that are needed to contact
-the controller.
-
-The following points describe important special cases of this
-principle.
-
- - In-band control must be implemented regardless of whether the
- switch is connected.
-
- It is tempting to implement the in-band control rules only when
- the switch is not connected to the controller, using the
- reasoning that the controller should have complete control once
- it has established a connection with the switch.
-
- This does not work in practice. Consider the case where the
- switch is connected to the controller. Occasionally it can
- happen that the controller forgets or otherwise needs to obtain
- the MAC address of the switch. To do so, the controller sends a
- broadcast ARP request. A switch that implements the in-band
- control rules only when it is disconnected will then send an
- OFPT_PACKET_IN message up to the controller. The controller will
- be unable to respond, because it does not know the MAC address of
- the switch. This is a deadlock situation that can only be
- resolved by the switch noticing that its connection to the
- controller has hung and reconnecting.
-
- - In-band control must override flows set up by the controller.
-
- It is reasonable to assume that flows set up by the OpenFlow
- controller should take precedence over in-band control, on the
- basis that the controller should be in charge of the switch.
-
- Again, this does not work in practice. Reasonable controller
- implementations may set up a "last resort" fallback rule that
- wildcards every field and, e.g., sends it up to the controller or
- discards it. If a controller does that, then it will isolate
- itself from the switch.
-
- - The switch must recognize all control traffic.
-
- The fundamental principle of in-band control states, in part,
- that a switch must recognize control traffic without involving
- the OpenFlow controller. More specifically, the switch must
- recognize *all* control traffic. "False negatives", that is,
- packets that constitute control traffic but that the switch does
- not recognize as control traffic, lead to control traffic storms.
-
- Consider an OpenFlow switch that only recognizes control packets
- sent to or from that switch. Now suppose that two switches of
- this type, named A and B, are connected to ports on an Ethernet
- hub (not a switch) and that an OpenFlow controller is connected
- to a third hub port. In this setup, control traffic sent by
- switch A will be seen by switch B, which will send it to the
- controller as part of an OFPT_PACKET_IN message. Switch A will
- then see the OFPT_PACKET_IN message's packet, re-encapsulate it
- in another OFPT_PACKET_IN, and send it to the controller. Switch
- B will then see that OFPT_PACKET_IN, and so on in an infinite
- loop.
-
- Incidentally, the consequences of "false positives", where
- packets that are not control traffic are nevertheless recognized
- as control traffic, are much less severe. The controller will
- not be able to control their behavior, but the network will
- remain in working order. False positives do constitute a
- security problem.
-
- - The switch should use echo-requests to detect disconnection.
-
- TCP will notice that a connection has hung, but this can take a
- considerable amount of time. For example, with default settings
- the Linux kernel TCP implementation will retransmit for between
- 13 and 30 minutes, depending on the connection's retransmission
- timeout, according to kernel documentation. This is far too long
- for a switch to be disconnected, so an OpenFlow switch should
- implement its own connection timeout. OpenFlow OFPT_ECHO_REQUEST
- messages are the best way to do this, since they test the
- OpenFlow connection itself.
-
-Implementation
---------------
-
-This section describes how Open vSwitch implements in-band control.
-Correctly implementing in-band control has proven difficult due to its
-many subtleties, and has thus gone through many iterations. Please
-read through and understand the reasoning behind the chosen rules
-before making modifications.
-
-Open vSwitch implements in-band control as "hidden" flows, that is,
-flows that are not visible through OpenFlow, and at a higher priority
-than wildcarded flows can be set up through OpenFlow. This is done so
-that the OpenFlow controller cannot interfere with them and possibly
-break connectivity with its switches. It is possible to see all
-flows, including in-band ones, with the ovs-appctl "bridge/dump-flows"
-command.
-
-The Open vSwitch implementation of in-band control can hide traffic to
-arbitrary "remotes", where each remote is one TCP port on one IP address.
-Currently the remotes are automatically configured as the in-band OpenFlow
-controllers plus the OVSDB managers, if any. (The latter is a requirement
-because OVSDB managers are responsible for configuring OpenFlow controllers,
-so if the manager cannot be reached then OpenFlow cannot be reconfigured.)
-
-The following rules (with the OFPP_NORMAL action) are set up on any bridge
-that has any remotes:
-
- (a) DHCP requests sent from the local port.
- (b) ARP replies to the local port's MAC address.
- (c) ARP requests from the local port's MAC address.
-
-In-band also sets up the following rules for each unique next-hop MAC
-address for the remotes' IPs (the "next hop" is either the remote
-itself, if it is on a local subnet, or the gateway to reach the remote):
-
- (d) ARP replies to the next hop's MAC address.
- (e) ARP requests from the next hop's MAC address.
-
-In-band also sets up the following rules for each unique remote IP address:
-
- (f) ARP replies containing the remote's IP address as a target.
- (g) ARP requests containing the remote's IP address as a source.
-
-In-band also sets up the following rules for each unique remote (IP,port)
-pair:
-
- (h) TCP traffic to the remote's IP and port.
- (i) TCP traffic from the remote's IP and port.
-
-The goal of these rules is to be as narrow as possible to allow a
-switch to join a network and be able to communicate with the
-remotes. As mentioned earlier, these rules have higher priority
-than the controller's rules, so if they are too broad, they may
-prevent the controller from implementing its policy. As such,
-in-band actively monitors some aspects of flow and packet processing
-so that the rules can be made more precise.
-
-In-band control monitors attempts to add flows into the datapath that
-could interfere with its duties. The datapath only allows exact
-match entries, so in-band control is able to be very precise about
-the flows it prevents. Flows that miss in the datapath are sent to
-userspace to be processed, so preventing these flows from being
-cached in the "fast path" does not affect correctness. The only type
-of flow that is currently prevented is one that would prevent DHCP
-replies from being seen by the local port. For example, a rule that
-forwarded all DHCP traffic to the controller would not be allowed,
-but one that forwarded to all ports (including the local port) would.
-
-As mentioned earlier, packets that miss in the datapath are sent to
-the userspace for processing. The userspace has its own flow table,
-the "classifier", so in-band checks whether any special processing
-is needed before the classifier is consulted. If a packet is a DHCP
-response to a request from the local port, the packet is forwarded to
-the local port, regardless of the flow table. Note that this requires
-L7 processing of DHCP replies to determine whether the 'chaddr' field
-matches the MAC address of the local port.
-
-It is interesting to note that for an L3-based in-band control
-mechanism, the majority of rules are devoted to ARP traffic. At first
-glance, some of these rules appear redundant. However, each serves an
-important role. First, in order to determine the MAC address of the
-remote side (controller or gateway) for other ARP rules, we must allow
-ARP traffic for our local port with rules (b) and (c). If we are
-between a switch and its connection to the remote, we have to
-allow the other switch's ARP traffic to through. This is done with
-rules (d) and (e), since we do not know the addresses of the other
-switches a priori, but do know the remote's or gateway's. Finally,
-if the remote is running in a local guest VM that is not reached
-through the local port, the switch that is connected to the VM must
-allow ARP traffic based on the remote's IP address, since it will
-not know the MAC address of the local port that is sending the traffic
-or the MAC address of the remote in the guest VM.
-
-With a few notable exceptions below, in-band should work in most
-network setups. The following are considered "supported" in the
-current implementation:
-
- - Locally Connected. The switch and remote are on the same
- subnet. This uses rules (a), (b), (c), (h), and (i).
-
- - Reached through Gateway. The switch and remote are on
- different subnets and must go through a gateway. This uses
- rules (a), (b), (c), (h), and (i).
-
- - Between Switch and Remote. This switch is between another
- switch and the remote, and we want to allow the other
- switch's traffic through. This uses rules (d), (e), (h), and
- (i). It uses (b) and (c) indirectly in order to know the MAC
- address for rules (d) and (e). Note that DHCP for the other
- switch will not work unless an OpenFlow controller explicitly lets this
- switch pass the traffic.
-
- - Between Switch and Gateway. This switch is between another
- switch and the gateway, and we want to allow the other switch's
- traffic through. This uses the same rules and logic as the
- "Between Switch and Remote" configuration described earlier.
-
- - Remote on Local VM. The remote is a guest VM on the
- system running in-band control. This uses rules (a), (b), (c),
- (h), and (i).
-
- - Remote on Local VM with Different Networks. The remote
- is a guest VM on the system running in-band control, but the
- local port is not used to connect to the remote. For
- example, an IP address is configured on eth0 of the switch. The
- remote's VM is connected through eth1 of the switch, but an
- IP address has not been configured for that port on the switch.
- As such, the switch will use eth0 to connect to the remote,
- and eth1's rules about the local port will not work. In the
- example, the switch attached to eth0 would use rules (a), (b),
- (c), (h), and (i) on eth0. The switch attached to eth1 would use
- rules (f), (g), (h), and (i).
-
-The following are explicitly *not* supported by in-band control:
-
- - Specify Remote by Name. Currently, the remote must be
- identified by IP address. A naive approach would be to permit
- all DNS traffic. Unfortunately, this would prevent the
- controller from defining any policy over DNS. Since switches
- that are located behind us need to connect to the remote,
- in-band cannot simply add a rule that allows DNS traffic from
- the local port. The "correct" way to support this is to parse
- DNS requests to allow all traffic related to a request for the
- remote's name through. Due to the potential security
- problems and amount of processing, we decided to hold off for
- the time-being.
-
- - Differing Remotes for Switches. All switches must know
- the L3 addresses for all the remotes that other switches
- may use, since rules need to be set up to allow traffic related
- to those remotes through. See rules (f), (g), (h), and (i).
-
- - Differing Routes for Switches. In order for the switch to
- allow other switches to connect to a remote through a
- gateway, it allows the gateway's traffic through with rules (d)
- and (e). If the routes to the remote differ for the two
- switches, we will not know the MAC address of the alternate
- gateway.
-
-
-Action Reproduction
-===================
-
-It seems likely that many controllers, at least at startup, use the
-OpenFlow "flow statistics" request to obtain existing flows, then
-compare the flows' actions against the actions that they expect to
-find. Before version 1.8.0, Open vSwitch always returned exact,
-byte-for-byte copies of the actions that had been added to the flow
-table. The current version of Open vSwitch does not always do this in
-some exceptional cases. This section lists the exceptions that
-controller authors must keep in mind if they compare actual actions
-against desired actions in a bytewise fashion:
-
- - Open vSwitch zeros padding bytes in action structures,
- regardless of their values when the flows were added.
-
- - Open vSwitch "normalizes" the instructions in OpenFlow 1.1
- (and later) in the following way:
-
- * OVS sorts the instructions into the following order:
- Apply-Actions, Clear-Actions, Write-Actions,
- Write-Metadata, Goto-Table.
-
- * OVS drops Apply-Actions instructions that have empty
- action lists.
-
- * OVS drops Write-Actions instructions that have empty
- action sets.
-
-Please report other discrepancies, if you notice any, so that we can
-fix or document them.
-
-
-Suggestions
-===========
-
-Suggestions to improve Open vSwitch are welcome at discuss@openvswitch.org.
new file mode 100644
@@ -0,0 +1,1163 @@
+..
+ Licensed under the Apache License, Version 2.0 (the "License"); you may
+ not use this file except in compliance with the License. You may obtain
+ a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ License for the specific language governing permissions and limitations
+ under the License.
+
+ Convention for heading levels in Open vSwitch documentation:
+
+ ======= Heading 0 (reserved for the title in a document)
+ ------- Heading 1
+ ~~~~~~~ Heading 2
+ +++++++ Heading 3
+ ''''''' Heading 4
+
+ Avoid deeper levels because they do not render well.
+
+================================
+Design Decisions In Open vSwitch
+================================
+
+This document describes design decisions that went into implementing Open
+vSwitch. While we believe these to be reasonable decisions, it is impossible
+to predict how Open vSwitch will be used in all environments. Understanding
+assumptions made by Open vSwitch is critical to a successful deployment. The
+end of this document contains contact information that can be used to let us
+know how we can make Open vSwitch more generally useful.
+
+Asynchronous Messages
+---------------------
+
+Over time, Open vSwitch has added many knobs that control whether a given
+controller receives OpenFlow asynchronous messages. This section describes how
+all of these features interact.
+
+First, a service controller never receives any asynchronous messages unless it
+changes its miss_send_len from the service controller default of zero in one of
+the following ways:
+
+- Sending an ``OFPT_SET_CONFIG`` message with nonzero ``miss_send_len``.
+
+- Sending any ``NXT_SET_ASYNC_CONFIG`` message: as a side effect, this message
+ changes the ``miss_send_len`` to ``OFP_DEFAULT_MISS_SEND_LEN`` (128) for
+ service controllers.
+
+Second, ``OFPT_FLOW_REMOVED`` and ``NXT_FLOW_REMOVED`` messages are generated
+only if the flow that was removed had the ``OFPFF_SEND_FLOW_REM`` flag set.
+
+Third, ``OFPT_PACKET_IN`` and ``NXT_PACKET_IN`` messages are sent only to
+OpenFlow controller connections that have the correct connection ID (see
+``struct nx_controller_id`` and ``struct nx_action_controller``):
+
+- For packet-in messages generated by a ``NXAST_CONTROLLER`` action, the
+ controller ID specified in the action.
+
+- For other packet-in messages, controller ID zero. (This is the default ID
+ when an OpenFlow controller does not configure one.)
+
+Finally, Open vSwitch consults a per-connection table indexed by the message
+type, reason code, and current role. The following table shows how this table
+is initialized by default when an OpenFlow connection is made. An entry
+labeled ``yes`` means that the message is sent, an entry labeled ``---`` means
+that the message is suppressed.
+
+.. table:: ``OFPT_PACKET_IN`` / ``NXT_PACKET_IN``
+
+ =========================================== ======= =====
+ master/
+ message and reason code other slave
+ =========================================== ======= =====
+ ``OFPR_NO_MATCH`` yes ---
+ ``OFPR_ACTION`` yes ---
+ ``OFPR_INVALID_TTL`` --- ---
+ ``OFPR_ACTION_SET`` (OF1.4+) yes ---
+ ``OFPR_GROUP`` (OF1.4+) yes ---
+ =========================================== ======= =====
+
+.. table:: ``OFPT_FLOW_REMOVED`` / ``NXT_FLOW_REMOVED``
+
+ =========================================== ======= =====
+ master/
+ message and reason code other slave
+ =========================================== ======= =====
+ ``OFPRR_IDLE_TIMEOUT`` yes ---
+ ``OFPRR_HARD_TIMEOUT`` yes ---
+ ``OFPRR_DELETE`` yes ---
+ ``OFPRR_GROUP_DELETE`` (OF1.4+) yes ---
+ ``OFPRR_METER_DELETE`` (OF1.4+) yes ---
+ ``OFPRR_EVICTION`` (OF1.4+) yes ---
+ =========================================== ======= =====
+
+.. table:: ``OFPT_PORT_STATUS``
+
+ =========================================== ======= =====
+ master/
+ message and reason code other slave
+ =========================================== ======= =====
+ ``OFPPR_ADD`` yes yes
+ ``OFPPR_DELETE`` yes yes
+ ``OFPPR_MODIFY`` yes yes
+ =========================================== ======= =====
+
+.. table:: ``OFPT_ROLE_REQUEST`` / ``OFPT_ROLE_REPLY`` (OF1.4+)
+
+ =========================================== ======= =====
+ master/
+ message and reason code other slave
+ =========================================== ======= =====
+ ``OFPCRR_MASTER_REQUEST`` --- ---
+ ``OFPCRR_CONFIG`` --- ---
+ ``OFPCRR_EXPERIMENTER`` --- ---
+ =========================================== ======= =====
+
+.. table:: ``OFPT_TABLE_STATUS`` (OF1.4+)
+
+ =========================================== ======= =====
+ master/
+ message and reason code other slave
+ =========================================== ======= =====
+ ``OFPTR_VACANCY_DOWN`` --- ---
+ ``OFPTR_VACANCY_UP`` --- ---
+ =========================================== ======= =====
+
+
+.. table:: ``OFPT_REQUESTFORWARD`` (OF1.4+)
+
+ =========================================== ======= =====
+ master/
+ message and reason code other slave
+ =========================================== ======= =====
+ ``OFPRFR_GROUP_MOD`` --- ---
+ ``OFPRFR_METER_MOD`` --- ---
+ =========================================== ======= =====
+
+The ``NXT_SET_ASYNC_CONFIG`` message directly sets all of the values in this
+table for the current connection. The ``OFPC_INVALID_TTL_TO_CONTROLLER`` bit
+in the ``OFPT_SET_CONFIG`` message controls the setting for
+``OFPR_INVALID_TTL`` for the "master" role.
+
+``OFPAT_ENQUEUE``
+-----------------
+
+The OpenFlow 1.0 specification requires the output port of the
+``OFPAT_ENQUEUE`` action to "refer to a valid physical port (i.e. <
+``OFPP_MAX``) or ``OFPP_IN_PORT``". Although ``OFPP_LOCAL`` is not less than
+``OFPP_MAX``, it is an 'internal' port which can have QoS applied to it in
+Linux. Since we allow the ``OFPAT_ENQUEUE`` to apply to 'internal' ports whose
+port numbers are less than ``OFPP_MAX``, we interpret ``OFPP_LOCAL`` as a
+physical port and support ``OFPAT_ENQUEUE`` on it as well.
+
+``OFPT_FLOW_MOD``
+-----------------
+
+The OpenFlow specification for the behavior of ``OFPT_FLOW_MOD`` is confusing.
+The following tables summarize the Open vSwitch implementation of its behavior
+in the following categories:
+
+"match on priority"
+ Whether the ``flow_mod`` acts only on flows whose priority matches that
+ included in the ``flow_mod`` message.
+
+"match on out_port"
+ Whether the ``flow_mod`` acts only on flows that output to the out_port
+ included in the flow_mod message (if out_port is not ``OFPP_NONE``).
+ OpenFlow 1.1 and later have a similar feature (not listed separately here)
+ for ``out_group``.
+
+"match on flow_cookie":
+ Whether the ``flow_mod`` acts only on flows whose ``flow_cookie`` matches an
+ optional controller-specified value and mask.
+
+"updates flow_cookie":
+ Whether the ``flow_mod`` changes the ``flow_cookie`` of the flow or flows
+ that it matches to the ``flow_cookie`` included in the flow_mod message.
+
+"updates ``OFPFF_`` flags":
+ Whether the flow_mod changes the ``OFPFF_SEND_FLOW_REM`` flag of the flow or
+ flows that it matches to the setting included in the flags of the flow_mod
+ message.
+
+"honors ``OFPFF_CHECK_OVERLAP``":
+ Whether the ``OFPFF_CHECK_OVERLAP`` flag in the flow_mod is significant.
+
+"updates ``idle_timeout``" and "updates ``hard_timeout``":
+ Whether the ``idle_timeout`` and hard_timeout in the ``flow_mod``,
+ respectively, have an effect on the flow or flows matched by the
+ ``flow_mod``.
+
+"updates idle timer":
+ Whether the ``flow_mod`` resets the per-flow timer that measures how long a
+ flow has been idle.
+
+"updates hard timer":
+ Whether the ``flow_mod`` resets the per-flow timer that measures how long it
+ has been since a flow was modified.
+
+"zeros counters":
+ Whether the ``flow_mod`` resets per-flow packet and byte counters to zero.
+
+"may add a new flow":
+ Whether the ``flow_mod`` may add a new flow to the flow table. (Obviously
+ this is always true for "add" commands but in some OpenFlow versions "modify"
+ and "modify-strict" can also add new flows.)
+
+"sends ``flow_removed`` message":
+ Whether the flow_mod generates a flow_removed message for the flow or flows
+ that it affects.
+
+An entry labeled ``yes`` means that the flow mod type does have the indicated
+behavior, ``---`` means that it does not, an empty cell means that the property
+is not applicable, and other values are explained below the table.
+
+OpenFlow 1.0
+~~~~~~~~~~~~
+
+================================ === ====== ====== ====== ======
+ MODIFY DELETE
+RULE ADD MODIFY STRICT DELETE STRICT
+================================ === ====== ====== ====== ======
+match on ``priority`` yes --- yes --- yes
+match on ``out_port`` --- --- --- yes yes
+match on ``flow_cookie`` --- --- --- --- ---
+match on ``table_id`` --- --- --- --- ---
+controller chooses ``table_id`` --- --- ---
+updates ``flow_cookie`` yes yes yes
+updates ``OFPFF_SEND_FLOW_REM`` yes + +
+honors ``OFPFF_CHECK_OVERLAP`` yes + +
+updates ``idle_timeout`` yes + +
+updates ``hard_timeout`` yes + +
+resets idle timer yes + +
+resets hard timer yes yes yes
+zeros counters yes + +
+may add a new flow yes yes yes
+sends ``flow_removed`` message --- --- --- % %
+================================ === ====== ====== ====== ======
+
+where:
+
+``+``
+ "modify" and "modify-strict" only take these actions when they create a new
+ flow, not when they update an existing flow.
+
+``%``
+ "delete" and "delete_strict" generates a flow_removed message if the deleted
+ flow or flows have the ``OFPFF_SEND_FLOW_REM`` flag set. (Each controller
+ can separately control whether it wants to receive the generated messages.)
+
+OpenFlow 1.1
+~~~~~~~~~~~~
+
+OpenFlow 1.1 makes these changes:
+
+- The controller now must specify the ``table_id`` of the flow match searched
+ and into which a flow may be inserted. Behavior for a ``table_id`` of 255 is
+ undefined.
+
+- A ``flow_mod``, except an "add", can now match on the ``flow_cookie``.
+
+- When a ``flow_mod`` matches on the ``flow_cookie``, "modify" and
+ "modify-strict" never insert a new flow.
+
+================================ === ====== ====== ====== ======
+ MODIFY DELETE
+RULE ADD MODIFY STRICT DELETE STRICT
+================================ === ====== ====== ====== ======
+match on ``priority`` yes --- yes --- yes
+match on ``out_port`` --- --- --- yes yes
+match on ``flow_cookie`` --- yes yes yes yes
+match on ``table_id`` yes yes yes yes yes
+controller chooses ``table_id`` yes yes yes
+updates ``flow_cookie`` yes --- ---
+updates ``OFPFF_SEND_FLOW_REM`` yes + +
+honors ``OFPFF_CHECK_OVERLAP`` yes + +
+updates ``idle_timeout`` yes + +
+updates ``hard_timeout`` yes + +
+resets idle timer yes + +
+resets hard timer yes yes yes
+zeros counters yes + +
+may add a new flow yes # #
+sends ``flow_removed`` message --- --- --- % %
+================================ === ====== ====== ====== ======
+
+where:
+
+``+``
+ "modify" and "modify-strict" only take these actions when they create a new
+ flow, not when they update an existing flow.
+
+``%``
+ "delete" and "delete_strict" generates a flow_removed message if the deleted
+ flow or flows have the ``OFPFF_SEND_FLOW_REM`` flag set. (Each controller
+ can separately control whether it wants to receive the generated messages.)
+
+``#``
+ "modify" and "modify-strict" only add a new flow if the flow_mod does not
+ match on any bits of the flow cookie
+
+OpenFlow 1.2
+~~~~~~~~~~~~
+
+OpenFlow 1.2 makes these changes:
+
+- Only "add" commands ever add flows, "modify" and "modify-strict" never do.
+
+- A new flag ``OFPFF_RESET_COUNTS`` now controls whether "modify" and
+ "modify-strict" reset counters, whereas previously they never reset counters
+ (except when they inserted a new flow).
+
+================================ === ====== ====== ====== ======
+ MODIFY DELETE
+RULE ADD MODIFY STRICT DELETE STRICT
+================================ === ====== ====== ====== ======
+match on ``priority`` yes --- yes --- yes
+match on ``out_port`` --- --- --- yes yes
+match on ``flow_cookie`` --- yes yes yes yes
+match on ``table_id`` yes yes yes yes yes
+controller chooses ``table_id`` yes yes yes
+updates ``flow_cookie`` yes --- ---
+updates ``OFPFF_SEND_FLOW_REM`` yes --- ---
+honors ``OFPFF_CHECK_OVERLAP`` yes --- ---
+updates ``idle_timeout`` yes --- ---
+updates ``hard_timeout`` yes --- ---
+resets idle timer yes --- ---
+resets hard timer yes yes yes
+zeros counters yes & &
+may add a new flow yes --- ---
+sends ``flow_removed`` message --- --- --- % %
+================================ === ====== ====== ====== ======
+
+``%``
+ "delete" and "delete_strict" generates a flow_removed message if the deleted
+ flow or flows have the ``OFPFF_SEND_FLOW_REM`` flag set. (Each controller
+ can separately control whether it wants to receive the generated messages.)
+
+``&``
+ "modify" and "modify-strict" reset counters if the ``OFPFF_RESET_COUNTS``
+ flag is specified.
+
+OpenFlow 1.3
+~~~~~~~~~~~~
+
+OpenFlow 1.3 makes these changes:
+
+- Behavior for a table_id of 255 is now defined, for "delete" and
+ "delete-strict" commands, as meaning to delete from all tables. A table_id
+ of 255 is now explicitly invalid for other commands.
+
+- New flags ``OFPFF_NO_PKT_COUNTS`` and ``OFPFF_NO_BYT_COUNTS`` for "add"
+ operations.
+
+The table for 1.3 is the same as the one shown above for 1.2.
+
+OpenFlow 1.4
+~~~~~~~~~~~~
+
+OpenFlow 1.4 makes these changes:
+
+- Adds the "importance" field to ``flow_mods``, but it does not explicitly
+ specify which kinds of ``flow_mods`` set the importance. For consistency,
+ Open vSwitch uses the same rule for importance as for ``idle_timeout`` and
+ ``hard_timeout``, that is, only an "ADD" flow_mod sets the importance. (This
+ issue has been filed with the ONF as EXT-496.)
+
+.. TODO(stephenfin) Link to EXT-496
+
+- Eviction Mechanism to automatically delete entries of lower importance to
+ make space for newer entries.
+
+OpenFlow 1.4 Bundles
+--------------------
+
+Open vSwitch makes all flow table modifications atomically, i.e., any datapath
+packet only sees flow table configurations either before or after any change
+made by any ``flow_mod``. For example, if a controller removes all flows with
+a single OpenFlow ``flow_mod``, no packet sees an intermediate version of the
+OpenFlow pipeline where only some of the flows have been deleted.
+
+It should be noted that Open vSwitch caches datapath flows, and that the cached
+flows are *NOT* flushed immediately when a flow table changes. Instead, the
+datapath flows are revalidated against the new flow table as soon as possible,
+and usually within one second of the modification. This design amortizes the
+cost of datapath cache flushing across multiple flow table changes, and has a
+significant performance effect during simultaneous heavy flow table churn and
+high traffic load. This means that different cached datapath flows may have
+been computed based on a different flow table configurations, but each of the
+datapath flows is guaranteed to have been computed over a coherent view of the
+flow tables, as described above.
+
+With OpenFlow 1.4 bundles this atomicity can be extended across an arbitrary
+set of ``flow_mod``. Bundles are supported for ``flow_mod`` and port_mod
+messages only. For ``flow_mod``, both ``atomic`` and ``ordered`` bundle flags
+are trivially supported, as all bundled messages are executed in the order they
+were added and all flow table modifications are now atomic to the datapath.
+Port mods may not appear in atomic bundles, as port status modifications are
+not atomic.
+
+To support bundles, ovs-ofctl has a ``--bundle`` option that makes the
+flow mod commands (``add-flow``, ``add-flows``, ``mod-flows``, ``del-flows``,
+and ``replace-flows``) use an OpenFlow 1.4 bundle to operate the
+modifications as a single atomic transaction. If any of the flow mods
+in a transaction fail, none of them are executed. All flow mods in a
+bundle appear to datapath lookups simultaneously.
+
+Furthermore, ovs-ofctl ``add-flow`` and ``add-flows`` commands now accept
+arbitrary flow mods as an input by allowing the flow specification to
+start with an explicit ``add``, ``modify``, ``modify_strict``, ``delete``, or
+``delete_strict`` keyword. A missing keyword is treated as ``add``, so
+this is fully backwards compatible. With the new ``--bundle`` option
+all the flow mods are executed as a single atomic transaction using an
+OpenFlow 1.4 bundle. Without the ``--bundle`` option the flow mods are
+executed in order up to the first failing ``flow_mod``, and in case of an
+error the earlier successful ``flow_mod`` calls are not rolled back.
+
+``OFPT_PACKET_IN``
+------------------
+
+The OpenFlow 1.1 specification for ``OFPT_PACKET_IN`` is confusing. The
+definition in OF1.1 ``openflow.h`` is[*]:
+
+::
+
+ /* Packet received on port (datapath -> controller). */
+ struct ofp_packet_in {
+ struct ofp_header header;
+ uint32_t buffer_id; /* ID assigned by datapath. */
+ uint32_t in_port; /* Port on which frame was received. */
+ uint32_t in_phy_port; /* Physical Port on which frame was received. */
+ uint16_t total_len; /* Full length of frame. */
+ uint8_t reason; /* Reason packet is being sent (one of OFPR_*) */
+ uint8_t table_id; /* ID of the table that was looked up */
+ uint8_t data[0]; /* Ethernet frame, halfway through 32-bit word,
+ so the IP header is 32-bit aligned. The
+ amount of data is inferred from the length
+ field in the header. Because of padding,
+ offsetof(struct ofp_packet_in, data) ==
+ sizeof(struct ofp_packet_in) - 2. */
+ };
+ OFP_ASSERT(sizeof(struct ofp_packet_in) == 24);
+
+The confusing part is the comment on the ``data[]`` member. This comment is a
+leftover from OF1.0 ``openflow.h``, in which the comment was correct:
+``sizeof(struct ofp_packet_in)`` is 20 in OF1.0 and ``ffsetof(struct
+ofp_packet_in, data)`` is 18. When OF1.1 was written, the structure members
+were changed but the comment was carelessly not updated, and the comment became
+wrong: ``sizeof(struct ofp_packet_in)`` and offsetof(struct ofp_packet_in,
+data) are both 24 in OF1.1.
+
+That leaves the question of how to implement ``ofp_packet_in`` in OF1.1. The
+OpenFlow reference implementation for OF1.1 does not include any padding, that
+is, the first byte of the encapsulated frame immediately follows the
+``table_id`` member without a gap. Open vSwitch therefore implements it the
+same way for compatibility.
+
+For an earlier discussion, please see the thread archived at:
+https://mailman.stanford.edu/pipermail/openflow-discuss/2011-August/002604.html
+
+[*] The quoted definition is directly from OF1.1. Definitions used inside OVS
+omit the 8-byte ``ofp_header`` members, so the sizes in this discussion are
+8 bytes larger than those declared in OVS header files.
+
+VLAN Matching
+-------------
+
+The 802.1Q VLAN header causes more trouble than any other 4 bytes in
+networking. More specifically, three versions of OpenFlow and Open vSwitch
+have among them four different ways to match the contents and presence of the
+VLAN header. The following table describes how each version works.
+
+======== ============= =============== =============== ================
+ Match NXM OF1.0 OF1.1 OF1.2
+======== ============= =============== =============== ================
+ ``[1]`` ``0000/0000`` ``????/1,??/?`` ``????/1,??/?`` ``0000/0000,--``
+ ``[2]`` ``0000/ffff`` ``ffff/0,??/?`` ``ffff/0,??/?`` ``0000/ffff,--``
+ ``[3]`` ``1xxx/1fff`` ``0xxx/0,??/1`` ``0xxx/0,??/1`` ``1xxx/ffff,--``
+ ``[4]`` ``z000/f000`` ``????/1,0y/0`` ``fffe/0,0y/0`` ``1000/1000,0y``
+ ``[5]`` ``zxxx/ffff`` ``0xxx/0,0y/0`` ``0xxx/0,0y/0`` ``1xxx/ffff,0y``
+ ``[6]`` ``0000/0fff`` ``<none>`` ``<none>`` ``<none>``
+ ``[7]`` ``0000/f000`` ``<none>`` ``<none>`` ``<none>``
+ ``[8]`` ``0000/efff`` ``<none>`` ``<none>`` ``<none>``
+ ``[9]`` ``1001/1001`` ``<none>`` ``<none>`` ``1001/1001,--``
+``[10]`` ``3000/3000`` ``<none>`` ``<none>`` ``<none>``
+``[11]`` ``1000/1000`` ``<none>`` ``fffe/0,??/1`` ``1000/1000,--``
+======== ============= =============== =============== ================
+
+where:
+
+Match:
+ See the list below.
+
+NXM:
+ ``xxxx/yyyy`` means ``NXM_OF_VLAN_TCI_W`` with value ``xxxx`` and mask
+ ``yyyy``. A mask of ``0000`` is equivalent to omitting
+ ``NXM_OF_VLAN_TCI(_W)``, a mask of ``ffff`` is equivalent to
+ ``NXM_OF_VLAN_TCI``.
+
+OF1.0, OF1.1:
+ ``wwww/x,yy/z`` means ``dl_vlan`` ``wwww``, ``OFPFW_DL_VLAN`` ``x``,
+ ``dl_vlan_pcp`` ``yy``, and ``OFPFW_DL_VLAN_PCP`` ``z``. If
+ ``OFPFW_DL_VLAN`` or ``OFPFW_DL_VLAN_PCP`` is 1, the corresponding field
+ value is wildcarded, otherwise it is matched. ``?`` means that the given
+ bits are ignored (their conventional values are ``0000/x,00/0`` in OF1.0,
+ ``0000/x,00/1`` in OF1.1; ``x`` is never ignored). ``<none>`` means that the
+ given match is not supported.
+
+OF1.2:
+ ``xxxx/yyyy,zz`` means ``OXM_OF_VLAN_VID_W`` with value ``xxxx`` and mask
+ ``yyyy``, and ``OXM_OF_VLAN_PCP`` (which is not maskable) with value ``zz``.
+ A mask of ``0000`` is equivalent to omitting ``OXM_OF_VLAN_VID(_W)``, a mask
+ of ``ffff`` is equivalent to ``OXM_OF_VLAN_VID``. ``--`` means that
+ ``OXM_OF_VLAN_PCP`` is omitted. ``<none>`` means that the given match is not
+ supported.
+
+The matches are:
+
+``[1]``:
+ Matches any packet, that is, one without an 802.1Q header or with an 802.1Q
+ header with any TCI value.
+
+``[2]``
+ Matches only packets without an 802.1Q header.
+
+ NXM:
+ Any match with ``vlan_tci == 0`` and ``(vlan_tci_mask & 0x1000) != 0`` is
+ equivalent to the one listed in the table.
+
+ OF1.0:
+ The spec doesn't define behavior if ``dl_vlan`` is set to ``0xffff`` and
+ ``OFPFW_DL_VLAN_PCP`` is not set.
+
+ OF1.1:
+ The spec says explicitly to ignore ``dl_vlan_pcp`` when ``dl_vlan`` is set
+ to ``0xffff``.
+
+ OF1.2:
+ The spec doesn't say what should happen if ``vlan_vid == 0`` and
+ ``(vlan_vid_mask & 0x1000) != 0`` but ``vlan_vid_mask != 0x1000``, but it
+ would be straightforward to also interpret as ``[2]``.
+
+``[3]``
+ Matches only packets that have an 802.1Q header with VID ``xxx`` (and any
+ PCP).
+
+``[4]``
+ Matches only packets that have an 802.1Q header with PCP ``y`` (and any VID).
+
+ NXM:
+ ``z`` is ``(y << 1) | 1``.
+
+ OF1.0:
+ The spec isn't very clear, but OVS implements it this way.
+
+ OF1.2:
+ Presumably other masks such that ``(vlan_vid_mask & 0x1fff) == 0x1000``
+ would also work, but the spec doesn't define their behavior.
+
+``[5]``
+ Matches only packets that have an 802.1Q header with VID ``xxx`` and PCP
+ ``y``.
+
+ NXM:
+ ``z`` is ``((y << 1) | 1)``.
+
+ OF1.2:
+ Presumably other masks such that ``(vlan_vid_mask & 0x1fff) == 0x1fff``
+ would also work.
+
+``[6]``
+ Matches packets with no 802.1Q header or with an 802.1Q header with a VID of
+ 0. Only possible with NXM.
+
+``[7]``
+ Matches packets with no 802.1Q header or with an 802.1Q header with a PCP of
+ 0. Only possible with NXM.
+
+``[8]``
+ Matches packets with no 802.1Q header or with an 802.1Q header with both VID
+ and PCP of 0. Only possible with NXM.
+
+``[9]``
+ Matches only packets that have an 802.1Q header with an odd-numbered VID (and
+ any PCP). Only possible with NXM and OF1.2. (This is just an example; one
+ can match on any desired VID bit pattern.)
+
+``[10]``
+ Matches only packets that have an 802.1Q header with an odd-numbered PCP (and
+ any VID). Only possible with NXM. (This is just an example; one can match
+ on any desired VID bit pattern.)
+
+``[11]``
+ Matches any packet with an 802.1Q header, regardless of VID or PCP.
+
+Additional notes:
+
+OF1.2:
+ The top three bits of ``OXM_OF_VLAN_VID`` are fixed to zero, so bits 13, 14,
+ and 15 in the masks listed in the table may be set to arbitrary values, as
+ long as the corresponding value bits are also zero. The suggested ``ffff``
+ mask for [2], [3], and [5] allows a shorter OXM representation (the mask is
+ omitted) than the minimal ``1fff`` mask.
+
+Flow Cookies
+------------
+
+OpenFlow 1.0 and later versions have the concept of a "flow cookie", which is a
+64-bit integer value attached to each flow. The treatment of the flow cookie
+has varied greatly across OpenFlow versions, however.
+
+In OpenFlow 1.0:
+
+- ``OFPFC_ADD`` set the cookie in the flow that it added.
+
+- ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` updated the cookie for the flow
+ or flows that it modified.
+
+- ``OFPST_FLOW`` messages included the flow cookie.
+
+- ``OFPT_FLOW_REMOVED`` messages reported the cookie of the flow that was
+ removed.
+
+OpenFlow 1.1 made the following changes:
+
+- Flow mod operations ``OFPFC_MODIFY``, ``OFPFC_MODIFY_STRICT``,
+ ``OFPFC_DELETE``, and ``OFPFC_DELETE_STRICT``, plus flow stats requests and
+ aggregate stats requests, gained the ability to match on flow cookies with an
+ arbitrary mask.
+
+- ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` were changed to add a new flow,
+ in the case of no match, only if the flow table modification operation did
+ not match on the cookie field. (In OpenFlow 1.0, modify operations always
+ added a new flow when there was no match.)
+
+- ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` no longer updated flow cookies.
+
+OpenFlow 1.2 made the following changes:
+
+- ``OFPC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` were changed to never add a new
+ flow, regardless of whether the flow cookie was used for matching.
+
+Open vSwitch support for OpenFlow 1.0 implements the OpenFlow 1.0 behavior with
+the following extensions:
+
+- An NXM extension field ``NXM_NX_COOKIE(_W)`` allows the NXM versions of
+ ``OFPFC_MODIFY``, ``OFPFC_MODIFY_STRICT``, ``OFPFC_DELETE``, and
+ ``OFPFC_DELETE_STRICT`` ``flow_mod`` calls, plus flow stats requests and
+ aggregate stats requests, to match on flow cookies with arbitrary masks.
+ This is much like the equivalent OpenFlow 1.1 feature.
+
+- Like OpenFlow 1.1, ``OFPC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` add a new flow
+ if there is no match and the mask is zero (or not given).
+
+- The ``cookie`` field in ``OFPT_FLOW_MOD`` and ``NXT_FLOW_MOD`` messages is
+ used as the cookie value for ``OFPFC_ADD`` commands, as described in OpenFlow
+ 1.0. For ``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` commands, the
+ ``cookie`` field is used as a new cookie for flows that match unless it is
+ ``UINT64_MAX``, in which case the flow's cookie is not updated.
+
+- ``NXT_PACKET_IN`` (the Nicira extended version of ``OFPT_PACKET_IN``) reports
+ the cookie of the rule that generated the packet, or all-1-bits if no rule
+ generated the packet. (Older versions of OVS used all-0-bits instead of
+ all-1-bits.)
+
+The following table shows the handling of different protocols when receiving
+``OFPFC_MODIFY`` and ``OFPFC_MODIFY_STRICT`` messages. A mask of 0 indicates
+either an explicit mask of zero or an implicit one by not specifying the
+``NXM_NX_COOKIE(_W)`` field.
+
+============== ====== ====== ============= =============
+ Match Update Add on miss Add on miss
+ cookie cookie mask!=0 mask==0
+============== ====== ====== ============= =============
+OpenFlow 1.0 no yes (add on miss) (add on miss)
+OpenFlow 1.1 yes no no yes
+OpenFlow 1.2 yes no no no
+NXM yes yes\* no yes
+============== ====== ====== ============= =============
+
+\* Updates the flow's cookie unless the ``cookie`` field is ``UINT64_MAX``.
+
+Multiple Table Support
+----------------------
+
+OpenFlow 1.0 has only rudimentary support for multiple flow tables. Notably,
+OpenFlow 1.0 does not allow the controller to specify the flow table to which a
+flow is to be added. Open vSwitch adds an extension for this purpose, which is
+enabled on a per-OpenFlow connection basis using the ``NXT_FLOW_MOD_TABLE_ID``
+message. When the extension is enabled, the upper 8 bits of the ``command``
+member in an ``OFPT_FLOW_MOD`` or ``NXT_FLOW_MOD`` message designates the table
+to which a flow is to be added.
+
+The Open vSwitch software switch implementation offers 255 flow tables. On
+packet ingress, only the first flow table (table 0) is searched, and the
+contents of the remaining tables are not considered in any way. Tables other
+than table 0 only come into play when an ``NXAST_RESUBMIT_TABLE`` action
+specifies another table to search.
+
+Tables 128 and above are reserved for use by the switch itself. Controllers
+should use only tables 0 through 127.
+
+``OFPTC_*`` Table Configuration
+-------------------------------
+
+This section covers the history of the ``OFPTC_*`` table configuration bits
+across OpenFlow versions.
+
+OpenFlow 1.0 flow tables had fixed configurations.
+
+OpenFlow 1.1 enabled controllers to configure behavior upon flow table miss and
+added the ``OFPTC_MISS_*`` constants for that purpose. ``OFPTC_*`` did not
+control anything else but it was nevertheless conceptualized as a set of
+bit-fields instead of an enum. OF1.1 added the ``OFPT_TABLE_MOD`` message to
+set ``OFPTC_MISS_*`` for a flow table and added the ``config`` field to the
+``OFPST_TABLE`` reply to report the current setting.
+
+OpenFlow 1.2 did not change anything in this regard.
+
+OpenFlow 1.3 switched to another means to changing flow table miss behavior and
+deprecated ``OFPTC_MISS_*`` without adding any more ``OFPTC_*`` constants.
+This meant that ``OFPT_TABLE_MOD`` now had no purpose at all, but OF1.3 kept it
+around "for backward compatibility with older and newer versions of the
+specification." At the same time, OF1.3 introduced a new message
+OFPMP_TABLE_FEATURES that included a field ``config`` documented as reporting
+the ``OFPTC_*`` values set with ``OFPT_TABLE_MOD``; of course this served no
+real purpose because no ``OFPTC_*`` values are defined. OF1.3 did remove the
+``OFPTC_*`` field from ``OFPMP_TABLE`` (previously named ``OFPST_TABLE``).
+
+OpenFlow 1.4 defined two new ``OFPTC_*`` constants, ``OFPTC_EVICTION`` and
+``OFPTC_VACANCY_EVENTS``, using bits that did not overlap with ``OFPTC_MISS_*``
+even though those bits had not been defined since OF1.2. ``OFPT_TABLE_MOD``
+still controlled these settings. The field for ``OFPTC_*`` values in
+``OFPMP_TABLE_FEATURES`` was renamed from ``config`` to ``capabilities`` and
+documented as reporting the flags that are supported in a ``OFPT_TABLE_MOD``
+message. The ``OFPMP_TABLE_DESC`` message newly added in OF1.4 reported the
+``OFPTC_*`` setting.
+
+OpenFlow 1.5 did not change anything in this regard.
+
+.. list-table:: Revisions
+ :header-rows: 1
+
+ * - OpenFlow
+ - ``OFPTC_*`` flags
+ - ``TABLE_MOD``
+ - Statistics
+ - ``TABLE_FEATURES``
+ - ``TABLE_DESC``
+ * - OF1.0
+ - none
+ - no (\*)(+)
+ - no (\*)
+ - nothing (\*)(+)
+ - no (\*)(+)
+ * - OF1.1/1.2
+ - ``MISS_*``
+ - yes
+ - yes
+ - nothing (+)
+ - no (+)
+ * - OF1.3
+ - none
+ - yes (\*)
+ - no (\*)
+ - config (\*)
+ - no (\*)(+)
+ * - OF1.4/1.5
+ - ``EVICTION``/``VACANCY_EVENTS``
+ - yes
+ - no
+ - capabilities
+ - yes
+
+where:
+
+OpenFlow:
+ The OpenFlow version(s).
+
+``OFPTC_*`` flags:
+ The ``OFPTC_*`` flags defined in those versions.
+
+``TABLE_MOD``:
+ Whether ``OFPT_TABLE_MOD`` can modify ``OFPTC_*`` flags.
+
+Statistics:
+ Whether ``OFPST_TABLE/OFPMP_TABLE`` reports the ``OFPTC_*`` flags.
+
+``TABLE_FEATURES``:
+ What ``OFPMP_TABLE_FEATURES`` reports (if it exists): either the current
+ configuration or the switch's capabilities.
+
+``TABLE_DESC``:
+ Whether ``OFPMP_TABLE_DESC`` reports the current configuration.
+
+(\*): Nothing to report/change anyway.
+
+(+): No such message.
+
+IPv6
+----
+
+Open vSwitch supports stateless handling of IPv6 packets. Flows can be written
+to support matching TCP, UDP, and ICMPv6 headers within an IPv6 packet. Deeper
+matching of some Neighbor Discovery messages is also supported.
+
+IPv6 was not designed to interact well with middle-boxes. This, combined with
+Open vSwitch's stateless nature, have affected the processing of IPv6 traffic,
+which is detailed below.
+
+Extension Headers
+~~~~~~~~~~~~~~~~~
+
+The base IPv6 header is incredibly simple with the intention of only containing
+information relevant for routing packets between two endpoints. IPv6 relies
+heavily on the use of extension headers to provide any other functionality.
+Unfortunately, the extension headers were designed in such a way that it is
+impossible to move to the next header (including the layer-4 payload) unless
+the current header is understood.
+
+Open vSwitch will process the following extension headers and continue to the
+next header:
+
+- Fragment (see the next section)
+- AH (Authentication Header)
+- Hop-by-Hop Options
+- Routing
+- Destination Options
+
+When a header is encountered that is not in that list, it is considered
+"terminal". A terminal header's IPv6 protocol value is stored in ``nw_proto``
+for matching purposes. If a terminal header is TCP, UDP, or ICMPv6, the packet
+will be further processed in an attempt to extract layer-4 information.
+
+Fragments
+~~~~~~~~~
+
+IPv6 requires that every link in the internet have an MTU of 1280 octets or
+greater (RFC 2460). As such, a terminal header (as described above in
+"Extension Headers") in the first fragment should generally be reachable. In
+this case, the terminal header's IPv6 protocol type is stored in the
+``nw_proto`` field for matching purposes. If a terminal header cannot be found
+in the first fragment (one with a fragment offset of zero), the ``nw_proto``
+field is set to 0. Subsequent fragments (those with a non-zero fragment
+offset) have the ``nw_proto`` field set to the IPv6 protocol type for fragments
+(44).
+
+Jumbograms
+~~~~~~~~~~
+
+An IPv6 jumbogram (RFC 2675) is a packet containing a payload longer than
+65,535 octets. A jumbogram is only relevant in subnets with a link MTU greater
+than 65,575 octets, and are not required to be supported on nodes that do not
+connect to link with such large MTUs. Currently, Open vSwitch doesn't process
+jumbograms.
+
+In-Band Control
+---------------
+
+Motivation
+~~~~~~~~~~
+
+An OpenFlow switch must establish and maintain a TCP network connection to its
+controller. There are two basic ways to categorize the network that this
+connection traverses: either it is completely separate from the one that the
+switch is otherwise controlling, or its path may overlap the network that the
+switch controls. We call the former case "out-of-band control", the latter
+case "in-band control".
+
+Out-of-band control has the following benefits:
+
+- Simplicity: Out-of-band control slightly simplifies the switch
+ implementation.
+
+- Reliability: Excessive switch traffic volume cannot interfere with control
+ traffic.
+
+- Integrity: Machines not on the control network cannot impersonate a switch or
+ a controller.
+
+- Confidentiality: Machines not on the control network cannot snoop on control
+ traffic.
+
+In-band control, on the other hand, has the following advantages:
+
+- No dedicated port: There is no need to dedicate a physical switch port to
+ control, which is important on switches that have few ports (e.g. wireless
+ routers, low-end embedded platforms).
+
+- No dedicated network: There is no need to build and maintain a separate
+ control network. This is important in many environments because it reduces
+ proliferation of switches and wiring.
+
+Open vSwitch supports both out-of-band and in-band control. This section
+describes the principles behind in-band control. See the description of the
+Controller table in ovs-vswitchd.conf.db(5) to configure OVS for in-band
+control.
+
+Principles
+~~~~~~~~~~
+
+The fundamental principle of in-band control is that an OpenFlow switch must
+recognize and switch control traffic without involving the OpenFlow controller.
+All the details of implementing in-band control are special cases of this
+principle.
+
+The rationale for this principle is simple. If the switch does not handle
+in-band control traffic itself, then it will be caught in a contradiction: it
+must contact the controller, but it cannot, because only the controller can set
+up the flows that are needed to contact the controller.
+
+The following points describe important special cases of this principle.
+
+- In-band control must be implemented regardless of whether the switch is
+ connected.
+
+ It is tempting to implement the in-band control rules only when the switch is
+ not connected to the controller, using the reasoning that the controller
+ should have complete control once it has established a connection with the
+ switch.
+
+ This does not work in practice. Consider the case where the switch is
+ connected to the controller. Occasionally it can happen that the controller
+ forgets or otherwise needs to obtain the MAC address of the switch. To do
+ so, the controller sends a broadcast ARP request. A switch that implements
+ the in-band control rules only when it is disconnected will then send an
+ ``OFPT_PACKET_IN`` message up to the controller. The controller will be
+ unable to respond, because it does not know the MAC address of the switch.
+ This is a deadlock situation that can only be resolved by the switch noticing
+ that its connection to the controller has hung and reconnecting.
+
+- In-band control must override flows set up by the controller.
+
+ It is reasonable to assume that flows set up by the OpenFlow controller
+ should take precedence over in-band control, on the basis that the controller
+ should be in charge of the switch.
+
+ Again, this does not work in practice. Reasonable controller implementations
+ may set up a "last resort" fallback rule that wildcards every field and,
+ e.g., sends it up to the controller or discards it. If a controller does
+ that, then it will isolate itself from the switch.
+
+- The switch must recognize all control traffic.
+
+ The fundamental principle of in-band control states, in part, that a switch
+ must recognize control traffic without involving the OpenFlow controller.
+ More specifically, the switch must recognize *all* control traffic. "False
+ negatives", that is, packets that constitute control traffic but that the
+ switch does not recognize as control traffic, lead to control traffic storms.
+
+ Consider an OpenFlow switch that only recognizes control packets sent to or
+ from that switch. Now suppose that two switches of this type, named A and B,
+ are connected to ports on an Ethernet hub (not a switch) and that an OpenFlow
+ controller is connected to a third hub port. In this setup, control traffic
+ sent by switch A will be seen by switch B, which will send it to the
+ controller as part of an OFPT_PACKET_IN message. Switch A will then see the
+ OFPT_PACKET_IN message's packet, re-encapsulate it in another OFPT_PACKET_IN,
+ and send it to the controller. Switch B will then see that OFPT_PACKET_IN,
+ and so on in an infinite loop.
+
+ Incidentally, the consequences of "false positives", where packets that are
+ not control traffic are nevertheless recognized as control traffic, are much
+ less severe. The controller will not be able to control their behavior, but
+ the network will remain in working order. False positives do constitute a
+ security problem.
+
+- The switch should use echo-requests to detect disconnection.
+
+ TCP will notice that a connection has hung, but this can take a considerable
+ amount of time. For example, with default settings the Linux kernel TCP
+ implementation will retransmit for between 13 and 30 minutes, depending on
+ the connection's retransmission timeout, according to kernel documentation.
+ This is far too long for a switch to be disconnected, so an OpenFlow switch
+ should implement its own connection timeout. OpenFlow ``OFPT_ECHO_REQUEST``
+ messages are the best way to do this, since they test the OpenFlow connection
+ itself.
+
+Implementation
+~~~~~~~~~~~~~~
+
+This section describes how Open vSwitch implements in-band control. Correctly
+implementing in-band control has proven difficult due to its many subtleties,
+and has thus gone through many iterations. Please read through and understand
+the reasoning behind the chosen rules before making modifications.
+
+Open vSwitch implements in-band control as "hidden" flows, that is, flows that
+are not visible through OpenFlow, and at a higher priority than wildcarded
+flows can be set up through OpenFlow. This is done so that the OpenFlow
+controller cannot interfere with them and possibly break connectivity with its
+switches. It is possible to see all flows, including in-band ones, with the
+ovs-appctl "bridge/dump-flows" command.
+
+The Open vSwitch implementation of in-band control can hide traffic to
+arbitrary "remotes", where each remote is one TCP port on one IP address.
+Currently the remotes are automatically configured as the in-band OpenFlow
+controllers plus the OVSDB managers, if any. (The latter is a requirement
+because OVSDB managers are responsible for configuring OpenFlow controllers, so
+if the manager cannot be reached then OpenFlow cannot be reconfigured.)
+
+The following rules (with the OFPP_NORMAL action) are set up on any bridge that
+has any remotes:
+
+(a)
+ DHCP requests sent from the local port.
+(b)
+ ARP replies to the local port's MAC address.
+(c)
+ ARP requests from the local port's MAC address.
+
+In-band also sets up the following rules for each unique next-hop MAC address
+for the remotes' IPs (the "next hop" is either the remote itself, if it is on a
+local subnet, or the gateway to reach the remote):
+
+(d)
+ ARP replies to the next hop's MAC address.
+(e)
+ ARP requests from the next hop's MAC address.
+
+In-band also sets up the following rules for each unique remote IP address:
+
+(f)
+ ARP replies containing the remote's IP address as a target.
+(g)
+ ARP requests containing the remote's IP address as a source.
+
+In-band also sets up the following rules for each unique remote (IP,port) pair:
+
+(h)
+ TCP traffic to the remote's IP and port.
+(i)
+ TCP traffic from the remote's IP and port.
+
+The goal of these rules is to be as narrow as possible to allow a switch to
+join a network and be able to communicate with the remotes. As mentioned
+earlier, these rules have higher priority than the controller's rules, so if
+they are too broad, they may prevent the controller from implementing its
+policy. As such, in-band actively monitors some aspects of flow and packet
+processing so that the rules can be made more precise.
+
+In-band control monitors attempts to add flows into the datapath that could
+interfere with its duties. The datapath only allows exact match entries, so
+in-band control is able to be very precise about the flows it prevents. Flows
+that miss in the datapath are sent to userspace to be processed, so preventing
+these flows from being cached in the "fast path" does not affect correctness.
+The only type of flow that is currently prevented is one that would prevent
+DHCP replies from being seen by the local port. For example, a rule that
+forwarded all DHCP traffic to the controller would not be allowed, but one that
+forwarded to all ports (including the local port) would.
+
+As mentioned earlier, packets that miss in the datapath are sent to the
+userspace for processing. The userspace has its own flow table, the
+"classifier", so in-band checks whether any special processing is needed before
+the classifier is consulted. If a packet is a DHCP response to a request from
+the local port, the packet is forwarded to the local port, regardless of the
+flow table. Note that this requires L7 processing of DHCP replies to determine
+whether the 'chaddr' field matches the MAC address of the local port.
+
+It is interesting to note that for an L3-based in-band control mechanism, the
+majority of rules are devoted to ARP traffic. At first glance, some of these
+rules appear redundant. However, each serves an important role. First, in
+order to determine the MAC address of the remote side (controller or gateway)
+for other ARP rules, we must allow ARP traffic for our local port with rules
+(b) and (c). If we are between a switch and its connection to the remote, we
+have to allow the other switch's ARP traffic to through. This is done with
+rules (d) and (e), since we do not know the addresses of the other switches a
+priori, but do know the remote's or gateway's. Finally, if the remote is
+running in a local guest VM that is not reached through the local port, the
+switch that is connected to the VM must allow ARP traffic based on the remote's
+IP address, since it will not know the MAC address of the local port that is
+sending the traffic or the MAC address of the remote in the guest VM.
+
+With a few notable exceptions below, in-band should work in most network
+setups. The following are considered "supported" in the current
+implementation:
+
+- Locally Connected. The switch and remote are on the same subnet. This uses
+ rules (a), (b), (c), (h), and (i).
+
+- Reached through Gateway. The switch and remote are on different subnets and
+ must go through a gateway. This uses rules (a), (b), (c), (h), and (i).
+
+- Between Switch and Remote. This switch is between another switch and the
+ remote, and we want to allow the other switch's traffic through. This uses
+ rules (d), (e), (h), and (i). It uses (b) and (c) indirectly in order to
+ know the MAC address for rules (d) and (e). Note that DHCP for the other
+ switch will not work unless an OpenFlow controller explicitly lets this
+ switch pass the traffic.
+
+- Between Switch and Gateway. This switch is between another switch and the
+ gateway, and we want to allow the other switch's traffic through. This uses
+ the same rules and logic as the "Between Switch and Remote" configuration
+ described earlier.
+
+- Remote on Local VM. The remote is a guest VM on the system running in-band
+ control. This uses rules (a), (b), (c), (h), and (i).
+
+- Remote on Local VM with Different Networks. The remote is a guest VM on the
+ system running in-band control, but the local port is not used to connect to
+ the remote. For example, an IP address is configured on eth0 of the switch.
+ The remote's VM is connected through eth1 of the switch, but an IP address
+ has not been configured for that port on the switch. As such, the switch
+ will use eth0 to connect to the remote, and eth1's rules about the local port
+ will not work. In the example, the switch attached to eth0 would use rules
+ (a), (b), (c), (h), and (i) on eth0. The switch attached to eth1 would use
+ rules (f), (g), (h), and (i).
+
+The following are explicitly *not* supported by in-band control:
+
+- Specify Remote by Name. Currently, the remote must be identified by IP
+ address. A naive approach would be to permit all DNS traffic.
+ Unfortunately, this would prevent the controller from defining any policy
+ over DNS. Since switches that are located behind us need to connect to the
+ remote, in-band cannot simply add a rule that allows DNS traffic from the
+ local port. The "correct" way to support this is to parse DNS requests to
+ allow all traffic related to a request for the remote's name through. Due to
+ the potential security problems and amount of processing, we decided to hold
+ off for the time-being.
+
+- Differing Remotes for Switches. All switches must know the L3 addresses for
+ all the remotes that other switches may use, since rules need to be set up to
+ allow traffic related to those remotes through. See rules (f), (g), (h), and
+ (i).
+
+- Differing Routes for Switches. In order for the switch to allow other
+ switches to connect to a remote through a gateway, it allows the gateway's
+ traffic through with rules (d) and (e). If the routes to the remote differ
+ for the two switches, we will not know the MAC address of the alternate
+ gateway.
+
+Action Reproduction
+-------------------
+
+It seems likely that many controllers, at least at startup, use the OpenFlow
+"flow statistics" request to obtain existing flows, then compare the flows'
+actions against the actions that they expect to find. Before version 1.8.0,
+Open vSwitch always returned exact, byte-for-byte copies of the actions that
+had been added to the flow table. The current version of Open vSwitch does not
+always do this in some exceptional cases. This section lists the exceptions
+that controller authors must keep in mind if they compare actual actions
+against desired actions in a bytewise fashion:
+
+- Open vSwitch zeros padding bytes in action structures, regardless of their
+ values when the flows were added.
+
+- Open vSwitch "normalizes" the instructions in OpenFlow 1.1 (and later) in the
+ following way:
+
+ * OVS sorts the instructions into the following order: Apply-Actions,
+ Clear-Actions, Write-Actions, Write-Metadata, Goto-Table.
+
+ * OVS drops Apply-Actions instructions that have empty action lists.
+
+ * OVS drops Write-Actions instructions that have empty action sets.
+
+Please report other discrepancies, if you notice any, so that we can fix or
+document them.
+
+Suggestions
+-----------
+
+Suggestions to improve Open vSwitch are welcome at discuss@openvswitch.org.
@@ -68,7 +68,7 @@ PYCOV_CLEAN_FILES = build-aux/check-structs,cover
docs = \
CONTRIBUTING.rst \
CodingStyle.rst \
- DESIGN.md \
+ DESIGN.rst \
DocumentationStyle.rst \
FAQ.rst \
INSTALL.rst \
@@ -813,7 +813,7 @@ struct ofputil_table_features {
* supported, otherwise 0. For other versions, they are decoded as -1 and
* ignored for encoding.
*
- * See the section "OFPTC_* Table Configuration" in DESIGN.md for more
+ * See the section "OFPTC_* Table Configuration" in DESIGN.rst for more
* details of how OpenFlow has changed in this area.
*/
enum ofputil_table_miss miss_config; /* OF1.1 and 1.2 only. */
@@ -5675,7 +5675,7 @@ ofputil_encode_table_config(enum ofputil_table_miss miss,
enum ofp_version version)
{
uint32_t config = 0;
- /* See the section "OFPTC_* Table Configuration" in DESIGN.md for more
+ /* See the section "OFPTC_* Table Configuration" in DESIGN.rst for more
* information on the crazy evolution of this field. */
switch (version) {
case OFP10_VERSION:
@@ -732,7 +732,7 @@ pinctrl_recv(const struct ofp_header *oh, enum ofptype type)
queue_msg(make_echo_reply(oh));
} else if (type == OFPTYPE_GET_CONFIG_REPLY) {
/* Enable asynchronous messages (see "Asynchronous Messages" in
- * DESIGN.md for more information). */
+ * DESIGN.rst for more information). */
struct ofputil_switch_config config;
ofputil_decode_get_config_reply(oh, &config);
@@ -342,7 +342,7 @@
It's possible, however, for some other bridge in the same system to have
an in-band remote controller, and in that case this suppresses the flows
that in-band control would ordinarily set up. See <code>In-Band
- Control</code> in <code>DESIGN.md</code> for more information.
+ Control</code> in <code>DESIGN.rst</code> for more information.
</dd>
</dl>
@@ -474,7 +474,7 @@ fi
%{_mandir}/man8/ovs-vswitchd.8*
%{_mandir}/man8/ovs-parse-backtrace.8*
%{_mandir}/man8/ovs-testcontroller.8*
-%doc COPYING DESIGN.md INSTALL.SSL.md NOTICE README.rst WHY-OVS.rst
+%doc COPYING DESIGN.rst INSTALL.SSL.md NOTICE README.rst WHY-OVS.rst
%doc FAQ.rst NEWS INSTALL.DPDK.rst rhel/README.RHEL
/var/lib/openvswitch
/var/log/openvswitch
@@ -247,7 +247,7 @@ exit 0
/usr/share/openvswitch/scripts/sysconfig.template
/usr/share/openvswitch/vswitch.ovsschema
/usr/share/openvswitch/vtep.ovsschema
-%doc COPYING DESIGN.md INSTALL.SSL.md NOTICE README.rst WHY-OVS.rst FAQ.rst NEWS
+%doc COPYING DESIGN.rst INSTALL.SSL.md NOTICE README.rst WHY-OVS.rst FAQ.rst NEWS
%doc INSTALL.DPDK.rst rhel/README.RHEL README-native-tunneling.rst
/var/lib/openvswitch
/var/log/openvswitch
This is a top-level document, so plain old rST is preferred. Signed-off-by: Stephen Finucane <stephen@that.guru> --- v2: - Split the OF message table into multiple tables - Resolve some issues with table headers not displaying --- DESIGN.md | 1093 ------------------------------------ DESIGN.rst | 1163 +++++++++++++++++++++++++++++++++++++++ Makefile.am | 2 +- include/openvswitch/ofp-util.h | 2 +- lib/ofp-util.c | 2 +- ovn/controller/pinctrl.c | 2 +- ovn/ovn-architecture.7.xml | 2 +- rhel/openvswitch-fedora.spec.in | 2 +- rhel/openvswitch.spec.in | 2 +- 9 files changed, 1170 insertions(+), 1100 deletions(-) delete mode 100644 DESIGN.md create mode 100644 DESIGN.rst