Message ID | 20221104154941.365187-1-amorenoz@redhat.com |
---|---|
Headers | show |
Series | Add ovn drop debugging | expand |
On Fri, Nov 4, 2022 at 4:50 PM Adrian Moreno <amorenoz@redhat.com> wrote: > Very often when troubleshooting networking issues in an OVN cluster one > would like to know if any packet (or a specific one) is being dropped by > OVN. > > Currently, this cannot be known because of two main reasons: > > 1 - Implicit drops: Some tables do not have a default action > (priority=0, match=1). In this case, a packet that does not match any > rule will be silently dropped. > > 2 - Even on explicit drops, we only know a packet was dropped. We lack > information about that packet. > > In order to improve this, this series introduces a two-fold solution: > > - First, make all drops explicit: > - northd add a default (match = "1") "drop;" action to those tables > that currently lack one. > - ovn-controller add an explicit drop action on those tables are not > associated with logical flows (i.e: physical-to-logical mappings). > > - Secondly, allow sampling of all drops. By introducing a new OVN > action: "sample" (equivalent to OVS's), OVN can make OVS sample the > packets as they are dropped. In order to be able to correlate those > samples back to what exact rule generated them, the user specifies the > a 8-bit observation_domain_id. Based on that, the samples contain > the following fields: > - obs_domain_id: > - 8 most significant bits = the provided observation_domain_id. > - 24 least significant bits = the datapath's tunnely key if the > drop comes from a lflow or zero otherwise. > - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the > cookie) if the drop comes from an lflow or the table number > otherwise. > > Based on the above changes in the flows, all of which are optional, > users can collect IPFIX samples of the packets that are dropped by OVN > which contain header information useful for debugging. > > * Note on observation_domain_ids: > By allowing the user to specify only the 8 most significant bits of the > obs_domain_id and having OVN combine it with the datapath's tunnel key, > OVN could be extended to support more than one "sampling" application. > For instance, ACL sampling could be developed in the future and, by > specifying a different observation_domain_id, it could co-exist with the > drop sampling mode implemented in the current series while still > allowing to uniquely identify the flow that created the sample. > > * Notes on testing and usage: > Any IPFIX collector that parses ObservationPointID and > ObservationDomainID fields can be used. For instance, nfdump 1.7 > supports these fields in nfdump. Example of how to capture and analyze > drops: > # Enable debug sampling: > $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1 > options:debug_drop_domain_id=1 > # Start nfcapd: > nfcapd -p 2055 -l nfcap & > # Configue sampling on the OVS you want to inspect: > $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX > targets=\"172.18.0.1:2055\" -- create Flow_Sample_Collector_Set > bridge=@br id=1 > # Inspect samples and figure out what LogicalFlow caused them: > $ nfdump -r nfcap -o fmt:'%line %odid %opid' > Date first seen Duration Proto Src IP Addr:Port > Dst IP Addr:Port Packets Bytes Flows obsDomainID obsPointID > 1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 -> > 239.255.255.250:1900 12 6356 1 0x001000009 0x00d8dd23c7 > 1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353 -> > 224.0.0.251:5353 165 89257 1 0x001000009 0x00d8dd23c7 > [...] > $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7 > _uuid : d8dd23c7-1451-4ea3-add7-8d68b4be4691 > actions : > "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie); > /* drop */" > controller_meter : [] > external_ids : {source="northd.c:12504", > stage-name=lr_in_ip_input} > logical_datapath : [] > logical_dp_group : 0dc1b195-c647-4277-aea0-0bad5e896f51 > match : "ip4.mcast || ip6.mcast" > pipeline : ingress > priority : 82 > table_id : 3 > tags : {} > hash : 0 > > V4 -> V5: Added documentation > V3 -> V4: Make explicit drops the default behavior. > V2 -> V3: Fix rebase problem on unit test > V1 -> V2 > - Rebased and Addressed Mark's comments. > - Added NEWS section. > > > Adrian Moreno (3): > actions: add sample action > northd: make default drops explicit > northd: add drop sampling > > NEWS | 2 + > controller/lflow.c | 1 + > controller/ovn-controller.c | 44 ++++++ > controller/physical.c | 77 ++++++++- > controller/physical.h | 6 + > include/ovn/actions.h | 16 ++ > lib/actions.c | 120 ++++++++++++++ > northd/automake.mk | 2 + > northd/debug.c | 98 ++++++++++++ > northd/debug.h | 30 ++++ > northd/northd.c | 109 ++++++++----- > northd/ovn-northd.8.xml | 66 +++++++- > ovn-nb.xml | 28 ++++ > ovn-sb.xml | 81 ++++++++++ > tests/ovn-northd.at | 84 ++++++++++ > tests/ovn.at | 303 ++++++++++++++++++++++++++++++++---- > tests/test-ovn.c | 3 + > utilities/ovn-trace.c | 2 + > 18 files changed, 996 insertions(+), 76 deletions(-) > create mode 100644 northd/debug.c > create mode 100644 northd/debug.h > > -- > 2.37.3 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > The whole series looks good to me, thanks. Reviewed-by: Ales Musil <amusil@redhat.com>
On 11/15/22 10:49, Ales Musil wrote: > On Fri, Nov 4, 2022 at 4:50 PM Adrian Moreno <amorenoz@redhat.com> wrote: > >> Very often when troubleshooting networking issues in an OVN cluster one >> would like to know if any packet (or a specific one) is being dropped by >> OVN. >> >> Currently, this cannot be known because of two main reasons: >> >> 1 - Implicit drops: Some tables do not have a default action >> (priority=0, match=1). In this case, a packet that does not match any >> rule will be silently dropped. >> >> 2 - Even on explicit drops, we only know a packet was dropped. We lack >> information about that packet. >> >> In order to improve this, this series introduces a two-fold solution: >> >> - First, make all drops explicit: >> - northd add a default (match = "1") "drop;" action to those tables >> that currently lack one. >> - ovn-controller add an explicit drop action on those tables are not >> associated with logical flows (i.e: physical-to-logical mappings). >> >> - Secondly, allow sampling of all drops. By introducing a new OVN >> action: "sample" (equivalent to OVS's), OVN can make OVS sample the >> packets as they are dropped. In order to be able to correlate those >> samples back to what exact rule generated them, the user specifies the >> a 8-bit observation_domain_id. Based on that, the samples contain >> the following fields: >> - obs_domain_id: >> - 8 most significant bits = the provided observation_domain_id. >> - 24 least significant bits = the datapath's tunnely key if the >> drop comes from a lflow or zero otherwise. >> - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the >> cookie) if the drop comes from an lflow or the table number >> otherwise. >> >> Based on the above changes in the flows, all of which are optional, >> users can collect IPFIX samples of the packets that are dropped by OVN >> which contain header information useful for debugging. >> >> * Note on observation_domain_ids: >> By allowing the user to specify only the 8 most significant bits of the >> obs_domain_id and having OVN combine it with the datapath's tunnel key, >> OVN could be extended to support more than one "sampling" application. >> For instance, ACL sampling could be developed in the future and, by >> specifying a different observation_domain_id, it could co-exist with the >> drop sampling mode implemented in the current series while still >> allowing to uniquely identify the flow that created the sample. >> >> * Notes on testing and usage: >> Any IPFIX collector that parses ObservationPointID and >> ObservationDomainID fields can be used. For instance, nfdump 1.7 >> supports these fields in nfdump. Example of how to capture and analyze >> drops: >> # Enable debug sampling: >> $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1 >> options:debug_drop_domain_id=1 >> # Start nfcapd: >> nfcapd -p 2055 -l nfcap & >> # Configue sampling on the OVS you want to inspect: >> $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX >> targets=\"172.18.0.1:2055\" -- create Flow_Sample_Collector_Set >> bridge=@br id=1 >> # Inspect samples and figure out what LogicalFlow caused them: >> $ nfdump -r nfcap -o fmt:'%line %odid %opid' >> Date first seen Duration Proto Src IP Addr:Port >> Dst IP Addr:Port Packets Bytes Flows obsDomainID obsPointID >> 1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 -> >> 239.255.255.250:1900 12 6356 1 0x001000009 0x00d8dd23c7 >> 1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353 -> >> 224.0.0.251:5353 165 89257 1 0x001000009 0x00d8dd23c7 >> [...] >> $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7 >> _uuid : d8dd23c7-1451-4ea3-add7-8d68b4be4691 >> actions : >> "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie); >> /* drop */" >> controller_meter : [] >> external_ids : {source="northd.c:12504", >> stage-name=lr_in_ip_input} >> logical_datapath : [] >> logical_dp_group : 0dc1b195-c647-4277-aea0-0bad5e896f51 >> match : "ip4.mcast || ip6.mcast" >> pipeline : ingress >> priority : 82 >> table_id : 3 >> tags : {} >> hash : 0 >> >> V4 -> V5: Added documentation >> V3 -> V4: Make explicit drops the default behavior. >> V2 -> V3: Fix rebase problem on unit test >> V1 -> V2 >> - Rebased and Addressed Mark's comments. >> - Added NEWS section. >> >> >> Adrian Moreno (3): >> actions: add sample action >> northd: make default drops explicit >> northd: add drop sampling >> >> NEWS | 2 + >> controller/lflow.c | 1 + >> controller/ovn-controller.c | 44 ++++++ >> controller/physical.c | 77 ++++++++- >> controller/physical.h | 6 + >> include/ovn/actions.h | 16 ++ >> lib/actions.c | 120 ++++++++++++++ >> northd/automake.mk | 2 + >> northd/debug.c | 98 ++++++++++++ >> northd/debug.h | 30 ++++ >> northd/northd.c | 109 ++++++++----- >> northd/ovn-northd.8.xml | 66 +++++++- >> ovn-nb.xml | 28 ++++ >> ovn-sb.xml | 81 ++++++++++ >> tests/ovn-northd.at | 84 ++++++++++ >> tests/ovn.at | 303 ++++++++++++++++++++++++++++++++---- >> tests/test-ovn.c | 3 + >> utilities/ovn-trace.c | 2 + >> 18 files changed, 996 insertions(+), 76 deletions(-) >> create mode 100644 northd/debug.c >> create mode 100644 northd/debug.h >> >> -- >> 2.37.3 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > The whole series looks good to me, thanks. > > Reviewed-by: Ales Musil <amusil@redhat.com> > Thanks Adrian, Ales, Mark, Numan! The series looks OK to me too. I only have a few minor comments; I replied to the individual patches. I'm OK to take care of fixing those minor issues myself before pushing the patches. Just let me know what you prefer. Thanks, Dumitru
On Fri, Nov 18, 2022 at 9:56 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 11/15/22 10:49, Ales Musil wrote: > > On Fri, Nov 4, 2022 at 4:50 PM Adrian Moreno <amorenoz@redhat.com> wrote: > > > >> Very often when troubleshooting networking issues in an OVN cluster one > >> would like to know if any packet (or a specific one) is being dropped by > >> OVN. > >> > >> Currently, this cannot be known because of two main reasons: > >> > >> 1 - Implicit drops: Some tables do not have a default action > >> (priority=0, match=1). In this case, a packet that does not match any > >> rule will be silently dropped. > >> > >> 2 - Even on explicit drops, we only know a packet was dropped. We lack > >> information about that packet. > >> > >> In order to improve this, this series introduces a two-fold solution: > >> > >> - First, make all drops explicit: > >> - northd add a default (match = "1") "drop;" action to those tables > >> that currently lack one. > >> - ovn-controller add an explicit drop action on those tables are not > >> associated with logical flows (i.e: physical-to-logical mappings). > >> > >> - Secondly, allow sampling of all drops. By introducing a new OVN > >> action: "sample" (equivalent to OVS's), OVN can make OVS sample the > >> packets as they are dropped. In order to be able to correlate those > >> samples back to what exact rule generated them, the user specifies the > >> a 8-bit observation_domain_id. Based on that, the samples contain > >> the following fields: > >> - obs_domain_id: > >> - 8 most significant bits = the provided observation_domain_id. > >> - 24 least significant bits = the datapath's tunnely key if the > >> drop comes from a lflow or zero otherwise. > >> - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the > >> cookie) if the drop comes from an lflow or the table number > >> otherwise. > >> > >> Based on the above changes in the flows, all of which are optional, > >> users can collect IPFIX samples of the packets that are dropped by OVN > >> which contain header information useful for debugging. > >> > >> * Note on observation_domain_ids: > >> By allowing the user to specify only the 8 most significant bits of the > >> obs_domain_id and having OVN combine it with the datapath's tunnel key, > >> OVN could be extended to support more than one "sampling" application. > >> For instance, ACL sampling could be developed in the future and, by > >> specifying a different observation_domain_id, it could co-exist with the > >> drop sampling mode implemented in the current series while still > >> allowing to uniquely identify the flow that created the sample. > >> > >> * Notes on testing and usage: > >> Any IPFIX collector that parses ObservationPointID and > >> ObservationDomainID fields can be used. For instance, nfdump 1.7 > >> supports these fields in nfdump. Example of how to capture and analyze > >> drops: > >> # Enable debug sampling: > >> $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1 > >> options:debug_drop_domain_id=1 > >> # Start nfcapd: > >> nfcapd -p 2055 -l nfcap & > >> # Configue sampling on the OVS you want to inspect: > >> $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX > >> targets=\"172.18.0.1:2055\" -- create Flow_Sample_Collector_Set > >> bridge=@br id=1 > >> # Inspect samples and figure out what LogicalFlow caused them: > >> $ nfdump -r nfcap -o fmt:'%line %odid %opid' > >> Date first seen Duration Proto Src IP Addr:Port > >> Dst IP Addr:Port Packets Bytes Flows obsDomainID obsPointID > >> 1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 -> > >> 239.255.255.250:1900 12 6356 1 0x001000009 0x00d8dd23c7 > >> 1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353 -> > >> 224.0.0.251:5353 165 89257 1 0x001000009 0x00d8dd23c7 > >> [...] > >> $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7 > >> _uuid : d8dd23c7-1451-4ea3-add7-8d68b4be4691 > >> actions : > >> "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie); > >> /* drop */" > >> controller_meter : [] > >> external_ids : {source="northd.c:12504", > >> stage-name=lr_in_ip_input} > >> logical_datapath : [] > >> logical_dp_group : 0dc1b195-c647-4277-aea0-0bad5e896f51 > >> match : "ip4.mcast || ip6.mcast" > >> pipeline : ingress > >> priority : 82 > >> table_id : 3 > >> tags : {} > >> hash : 0 > >> > >> V4 -> V5: Added documentation > >> V3 -> V4: Make explicit drops the default behavior. > >> V2 -> V3: Fix rebase problem on unit test > >> V1 -> V2 > >> - Rebased and Addressed Mark's comments. > >> - Added NEWS section. > >> > >> > >> Adrian Moreno (3): > >> actions: add sample action > >> northd: make default drops explicit > >> northd: add drop sampling > >> > >> NEWS | 2 + > >> controller/lflow.c | 1 + > >> controller/ovn-controller.c | 44 ++++++ > >> controller/physical.c | 77 ++++++++- > >> controller/physical.h | 6 + > >> include/ovn/actions.h | 16 ++ > >> lib/actions.c | 120 ++++++++++++++ > >> northd/automake.mk | 2 + > >> northd/debug.c | 98 ++++++++++++ > >> northd/debug.h | 30 ++++ > >> northd/northd.c | 109 ++++++++----- > >> northd/ovn-northd.8.xml | 66 +++++++- > >> ovn-nb.xml | 28 ++++ > >> ovn-sb.xml | 81 ++++++++++ > >> tests/ovn-northd.at | 84 ++++++++++ > >> tests/ovn.at | 303 ++++++++++++++++++++++++++++++++---- > >> tests/test-ovn.c | 3 + > >> utilities/ovn-trace.c | 2 + > >> 18 files changed, 996 insertions(+), 76 deletions(-) > >> create mode 100644 northd/debug.c > >> create mode 100644 northd/debug.h > >> > >> -- > >> 2.37.3 > >> > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > >> > > The whole series looks good to me, thanks. > > > > Reviewed-by: Ales Musil <amusil@redhat.com> > > > > Thanks Adrian, Ales, Mark, Numan! > > The series looks OK to me too. I only have a few minor comments; I > replied to the individual patches. > > I'm OK to take care of fixing those minor issues myself before pushing > the patches. Just let me know what you prefer. Sounds good to me. Numan > > Thanks, > Dumitru > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >