Message ID | SY4PR01MB84388ADC0207878D8925992CCD8D9@SY4PR01MB8438.ausprd01.prod.outlook.com |
---|---|
State | Changes Requested |
Headers | show |
Series | Add netdev-dpdk support for ingress and egress pkts policer. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
On 4/2/23 11:30, miterv@outlook.com wrote: > From: Lin Huang <linhuang@ruijie.com.cn> > > add netdev-dpdk egress pkts policer. > > Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> > --- Hi. Thanks for the patch! I didn't test this, but I have a few general comments. First, please, add some of the relevant information that you posted in reply to the cover letter into a commit message. I.e. why the feature is needed and how to configure it. Also, since it is a user-visible change, we need an entry in the NEWS file for it. See some more comments inline. Best regards, Ilya Maximets. > lib/netdev-dpdk.c | 118 +++++++++++++++++++++++++++++++++++++++++++ > vswitchd/vswitch.xml | 32 ++++++++++++ > 2 files changed, 150 insertions(+) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index fb0dd43f7..6709c459c 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -19,6 +19,7 @@ > > #include <errno.h> > #include <signal.h> > +#include <stdint.h> > #include <stdlib.h> > #include <string.h> > #include <unistd.h> > @@ -59,6 +60,7 @@ > #include "openvswitch/ofp-parse.h" > #include "openvswitch/ofp-print.h" > #include "openvswitch/shash.h" > +#include "openvswitch/token-bucket.h" > #include "openvswitch/vlog.h" > #include "ovs-numa.h" > #include "ovs-rcu.h" > @@ -345,6 +347,7 @@ struct dpdk_qos_ops { > > /* dpdk_qos_ops for each type of user space QoS implementation. */ > static const struct dpdk_qos_ops egress_policer_ops; > +static const struct dpdk_qos_ops egress_pkts_policer_ops; > static const struct dpdk_qos_ops trtcm_policer_ops; > > /* > @@ -353,6 +356,7 @@ static const struct dpdk_qos_ops trtcm_policer_ops; > */ > static const struct dpdk_qos_ops *const qos_confs[] = { > &egress_policer_ops, > + &egress_pkts_policer_ops, > &trtcm_policer_ops, > NULL > }; > @@ -2335,6 +2339,29 @@ srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter, > return cnt; > } > > +static int > +pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf **pkts, > + int pkt_cnt, bool should_steal) > +{ > + int cnt = 0; > + struct rte_mbuf *pkt; I know that existing code is written this way, but it's better to keep variable definitions in the reverse x-mass tree order, i.e. from the longest line to the shortest. > + > + for (int i = 0; i < pkt_cnt; i++) { > + pkt = pkts[i]; > + /* Handle current packet */ Period at the end of a sentence. > + if (token_bucket_withdraw(tb, 1000)) { Why a 1000 and not 1 ? > + if (cnt != i) { > + pkts[cnt] = pkt; > + } > + cnt++; > + } else if (should_steal) { > + rte_pktmbuf_free(pkt); > + } > + } I wonder if it makes sense to have a per-packet granularity here. I mean, what if we just call token_bucket_withdraw(tb, pkt_cnt) and drop the whole batch if it doesn't fit? It sounds unlikely that someone will care about extra 31 packets being dropped in a worst case scenario. We're already dropping packets, it might be better to allow 1 batch in 32, instead of 1 packet from each of 32 batches (just an example). The single token_bucket_withdraw() and the batched mbuf free will be much more efficient. What do you think? > + > + return cnt; > +} > + > static int > ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts, > int pkt_cnt, bool should_steal) > @@ -4858,6 +4885,97 @@ static const struct dpdk_qos_ops egress_policer_ops = { > .qos_run = egress_policer_run > }; > > +/* egress-pkts-policer details */ Period at the end. > + Empty line is unnecessary. > +struct egress_pkts_policer { > + struct qos_conf qos_conf; > + struct token_bucket tb; > +}; > + > +static int > +egress_pkts_policer_qos_construct(const struct smap *details, > + struct qos_conf **conf) > +{ > + uint32_t rate, burst; > + struct egress_pkts_policer *policer; > + > + policer = xmalloc(sizeof *policer); > + rate = smap_get_uint(details, "pkts_rate", 0); > + burst = smap_get_uint(details, "pkts_burst", 0); We use kpkts as a vlue for ingress policing instead of pkts, should we use the same here? > + > + /* > + * Force to 0 if no rate specified, > + * default to rate if burst is 0, > + * else stick with user-specified value. > + */ > + burst = (!rate ? 0 : !burst ? rate : burst); > + > + qos_conf_init(&policer->qos_conf, &egress_pkts_policer_ops); > + token_bucket_init(&policer->tb, rate, burst * 1000); Why * 1000 ? > + > + *conf = &policer->qos_conf; > + > + return 0; > +} > + > +static void > +egress_pkts_policer_qos_destruct(struct qos_conf *conf) > +{ > + struct egress_pkts_policer *policer = > + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); > + > + free(policer); > +} > + > +static int > +egress_pkts_policer_qos_get(const struct qos_conf *conf, struct smap *details) > +{ > + struct egress_pkts_policer *policer = > + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); > + > + smap_add_format(details, "pkts_rate", "%"PRIu32, policer->tb.rate); > + smap_add_format(details, "pkts_burst", "%"PRIu32, policer->tb.burst); > + > + return 0; > +} > + > +static bool > +egress_pkts_policer_qos_is_equal(const struct qos_conf *conf, > + const struct smap *details) > +{ > + uint32_t rate, burst; > + struct egress_pkts_policer *policer = > + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); Reverse x-mass tree. > + > + rate = smap_get_uint(details, "pkts_rate", 0); > + burst = smap_get_uint(details, "pkts_burst", 0); > + > + return (policer->tb.rate == rate && policer->tb.burst == burst); > +} > + > +static int > +egress_pkts_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, > + int pkt_cnt, bool should_steal) > +{ > + int cnt = 0; > + struct egress_pkts_policer *policer = > + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); Reverse x-mass tree. > + > + cnt = pkts_policer_run_single_packet(&policer->tb, pkts, pkt_cnt, > + should_steal); > + > + return cnt; > +} > + > +static const struct dpdk_qos_ops egress_pkts_policer_ops = { > + .qos_name = "egress-pkts-policer", > + .qos_construct = egress_pkts_policer_qos_construct, > + .qos_destruct = egress_pkts_policer_qos_destruct, > + .qos_get = egress_pkts_policer_qos_get, > + .qos_is_equal = egress_pkts_policer_qos_is_equal, > + .qos_run = egress_pkts_policer_run > +}; > + > /* trtcm-policer details */ > > struct trtcm_policer { > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index edb5eafa0..94bfe54eb 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -4791,6 +4791,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ > in performance will be noticed in terms of overall aggregate traffic > throughput. > </dd> > + <dt><code>egress-pkts-policer</code></dt> > + <dd> > + A DPDK egress packet per second policer algorithm using the ovs > + token bucket library. The token bucket library provides an > + implementation which allows the policing of packets traffic. The > + implementation in OVS essentially creates a single token bucket used > + to police traffic. I'd replace the text above with just: "An egress packet per second policer for DPDK ports using a simple tocken bucket algorithm." > It should be noted that when the token bucket is > + configured as part of QoS there will be a performance overhead as the > + token bucket itself will consume CPU cycles in order to police > + traffic. These CPU cycles ordinarily are used for packet proccessing. > + As such the drop in performance will be noticed in terms of overall > + aggregate traffic throughput. > + </dd> > <dt><code>trtcm-policer</code></dt> > <dd> > A DPDK egress policer algorithm using RFC 4115's Two-Rate, > @@ -4896,6 +4909,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ > </column> > </group> > > + <group title="Configuration for egress-pkts-policer QoS"> > + <p> > + <ref table="QoS"/> <ref table="QoS" column="type"/> > + <code>egress-pkts-policer</code> provides egress pkts policing for > + userspace port types with DPDK. > + > + It has the following key-value pairs defined. > + </p> > + > + <column name="other_config" key="pkts_rate" type='{"type": "integer"}'> > + The Packets Per Second (pps) represents the packet per second rate at > + which the token bucket will be updated. > + </column> > + <column name="other_config" key="pkts_burst" type='{"type": "integer"}'> > + The Packets Per Second Burst Size is measured in count and represents a > + token bucket. > + </column> The available value range should be specified for these. > + </group> > + > <group title="Configuration for linux-sfq"> > <p> > The <code>linux-sfq</code> QoS supports the following key-value pairs:
Hi Ilya, Thanks for looking at this patch and for your feedback. There are some explanations about your comments. > -----Original Message----- > From: Ilya Maximets <i.maximets@ovn.org> > Sent: Tuesday, April 4, 2023 10:58 PM > To: miterv@outlook.com; dev@openvswitch.org > Cc: i.maximets@ovn.org; Lin Huang <linhuang@ruijie.com.cn> > Subject: Re: [PATCH v1 1/2] netdev-dpdk: add netdev-dpdk egress pkts policer. > > On 4/2/23 11:30, miterv@outlook.com wrote: > > From: Lin Huang <linhuang@ruijie.com.cn> > > > > add netdev-dpdk egress pkts policer. > > > > Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> > > --- > > Hi. Thanks for the patch! I didn't test this, but I have a few > general comments. > > First, please, add some of the relevant information that you posted > in reply to the cover letter into a commit message. I.e. why the > feature is needed and how to configure it. > > Also, since it is a user-visible change, we need an entry in the > NEWS file for it. Yes, I will add more information about my work later. > > See some more comments inline. > > Best regards, Ilya Maximets. > > > lib/netdev-dpdk.c | 118 > +++++++++++++++++++++++++++++++++++++++++++ > > vswitchd/vswitch.xml | 32 ++++++++++++ > > 2 files changed, 150 insertions(+) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index fb0dd43f7..6709c459c 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -19,6 +19,7 @@ > > > > #include <errno.h> > > #include <signal.h> > > +#include <stdint.h> > > #include <stdlib.h> > > #include <string.h> > > #include <unistd.h> > > @@ -59,6 +60,7 @@ > > #include "openvswitch/ofp-parse.h" > > #include "openvswitch/ofp-print.h" > > #include "openvswitch/shash.h" > > +#include "openvswitch/token-bucket.h" > > #include "openvswitch/vlog.h" > > #include "ovs-numa.h" > > #include "ovs-rcu.h" > > @@ -345,6 +347,7 @@ struct dpdk_qos_ops { > > > > /* dpdk_qos_ops for each type of user space QoS implementation. */ > > static const struct dpdk_qos_ops egress_policer_ops; > > +static const struct dpdk_qos_ops egress_pkts_policer_ops; > > static const struct dpdk_qos_ops trtcm_policer_ops; > > > > /* > > @@ -353,6 +356,7 @@ static const struct dpdk_qos_ops trtcm_policer_ops; > > */ > > static const struct dpdk_qos_ops *const qos_confs[] = { > > &egress_policer_ops, > > + &egress_pkts_policer_ops, > > &trtcm_policer_ops, > > NULL > > }; > > @@ -2335,6 +2339,29 @@ srtcm_policer_run_single_packet(struct > rte_meter_srtcm *meter, > > return cnt; > > } > > > > +static int > > +pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf > **pkts, > > + int pkt_cnt, bool should_steal) > > +{ > > + int cnt = 0; > > + struct rte_mbuf *pkt; > > I know that existing code is written this way, but it's better to keep > variable definitions in the reverse x-mass tree order, i.e. from the > longest line to the shortest. Yes, I will modify it. > > > + > > + for (int i = 0; i < pkt_cnt; i++) { > > + pkt = pkts[i]; > > + /* Handle current packet */ > > Period at the end of a sentence. > > > + if (token_bucket_withdraw(tb, 1000)) { > > Why a 1000 and not 1 ? By default tokens is added per millisecond. Rate in packets/second, bucket 1/1000 packets. That's to say. 1 token means 1/1000 packet. 1 packet means 1000 tokens. > > > + if (cnt != i) { > > + pkts[cnt] = pkt; > > + } > > + cnt++; > > + } else if (should_steal) { > > + rte_pktmbuf_free(pkt); > > + } > > + } > > I wonder if it makes sense to have a per-packet granularity here. > I mean, what if we just call token_bucket_withdraw(tb, pkt_cnt) > and drop the whole batch if it doesn't fit? > It sounds unlikely that someone will care about extra 31 packets > being dropped in a worst case scenario. We're already dropping > packets, it might be better to allow 1 batch in 32, instead of > 1 packet from each of 32 batches (just an example). > The single token_bucket_withdraw() and the batched mbuf free will > be much more efficient. > > What do you think? > This code is reference to srtcm_policer_run_single_packet which have a per-packet check. I think batched work will be more efficient, but the granularity here will be less inaccurate. from my point of view, using per-packet check is more reasonable. > > + > > + return cnt; > > +} > > + > > static int > > ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts, > > int pkt_cnt, bool should_steal) > > @@ -4858,6 +4885,97 @@ static const struct dpdk_qos_ops > egress_policer_ops = { > > .qos_run = egress_policer_run > > }; > > > > +/* egress-pkts-policer details */ > > Period at the end. > > > + > > Empty line is unnecessary. > > > +struct egress_pkts_policer { > > + struct qos_conf qos_conf; > > + struct token_bucket tb; > > +}; > > + > > +static int > > +egress_pkts_policer_qos_construct(const struct smap *details, > > + struct qos_conf **conf) > > +{ > > + uint32_t rate, burst; > > + struct egress_pkts_policer *policer; > > + > > + policer = xmalloc(sizeof *policer); > > + rate = smap_get_uint(details, "pkts_rate", 0); > > + burst = smap_get_uint(details, "pkts_burst", 0); > > We use kpkts as a vlue for ingress policing instead of pkts, should > we use the same here? OK, I will convert the value of unit to kilo to keep both side the same. That's to say. Rate in kilo packets/second, bucket 1 packets. 1 token means 1 packet. 1 packet means 1 tokens. > > > + > > + /* > > + * Force to 0 if no rate specified, > > + * default to rate if burst is 0, > > + * else stick with user-specified value. > > + */ > > + burst = (!rate ? 0 : !burst ? rate : burst); > > + > > + qos_conf_init(&policer->qos_conf, &egress_pkts_policer_ops); > > + token_bucket_init(&policer->tb, rate, burst * 1000); > > Why * 1000 ? By default tokens is added per millisecond. Rate in packets/second, bucket 1/1000 packets. So we need to scale the burst value by a factor of 1000. > > > + > > + *conf = &policer->qos_conf; > > + > > + return 0; > > +} > > + > > +static void > > +egress_pkts_policer_qos_destruct(struct qos_conf *conf) > > +{ > > + struct egress_pkts_policer *policer = > > + CONTAINER_OF(conf, struct egress_pkts_policer, > qos_conf); > > + > > + free(policer); > > +} > > + > > +static int > > +egress_pkts_policer_qos_get(const struct qos_conf *conf, struct smap > *details) > > +{ > > + struct egress_pkts_policer *policer = > > + CONTAINER_OF(conf, struct egress_pkts_policer, > qos_conf); > > + > > + smap_add_format(details, "pkts_rate", "%"PRIu32, policer->tb.rate); > > + smap_add_format(details, "pkts_burst", "%"PRIu32, policer->tb.burst); > > + > > + return 0; > > +} > > + > > +static bool > > +egress_pkts_policer_qos_is_equal(const struct qos_conf *conf, > > + const struct smap *details) > > +{ > > + uint32_t rate, burst; > > + struct egress_pkts_policer *policer = > > + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); > > Reverse x-mass tree. Yes, I will modify it. > > > + > > + rate = smap_get_uint(details, "pkts_rate", 0); > > + burst = smap_get_uint(details, "pkts_burst", 0); > > + > > + return (policer->tb.rate == rate && policer->tb.burst == burst); > > +} > > + > > +static int > > +egress_pkts_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, > > + int pkt_cnt, bool should_steal) > > +{ > > + int cnt = 0; > > + struct egress_pkts_policer *policer = > > + CONTAINER_OF(conf, struct egress_pkts_policer, > qos_conf); > > Reverse x-mass tree. Yes, I will modify it. > > > + > > + cnt = pkts_policer_run_single_packet(&policer->tb, pkts, pkt_cnt, > > + should_steal); > > + > > + return cnt; > > +} > > + > > +static const struct dpdk_qos_ops egress_pkts_policer_ops = { > > + .qos_name = "egress-pkts-policer", > > + .qos_construct = egress_pkts_policer_qos_construct, > > + .qos_destruct = egress_pkts_policer_qos_destruct, > > + .qos_get = egress_pkts_policer_qos_get, > > + .qos_is_equal = egress_pkts_policer_qos_is_equal, > > + .qos_run = egress_pkts_policer_run > > +}; > > + > > /* trtcm-policer details */ > > > > struct trtcm_policer { > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > index edb5eafa0..94bfe54eb 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -4791,6 +4791,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > type=patch options:peer=p1 \ > > in performance will be noticed in terms of overall aggregate > traffic > > throughput. > > </dd> > > + <dt><code>egress-pkts-policer</code></dt> > > + <dd> > > + A DPDK egress packet per second policer algorithm using the > ovs > > + token bucket library. The token bucket library provides an > > + implementation which allows the policing of packets traffic. The > > + implementation in OVS essentially creates a single token bucket > used > > + to police traffic. > > I'd replace the text above with just: > > "An egress packet per second policer for DPDK ports using > a simple tocken bucket algorithm." Yes, I will modify it. > > It should be noted that when the token bucket is > > + configured as part of QoS there will be a performance overhead > as the > > + token bucket itself will consume CPU cycles in order to police > > + traffic. These CPU cycles ordinarily are used for packet > proccessing. > > + As such the drop in performance will be noticed in terms of > overall > > + aggregate traffic throughput. > > + </dd> > > <dt><code>trtcm-policer</code></dt> > > <dd> > > A DPDK egress policer algorithm using RFC 4115's Two-Rate, > > @@ -4896,6 +4909,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > type=patch options:peer=p1 \ > > </column> > > </group> > > > > + <group title="Configuration for egress-pkts-policer QoS"> > > + <p> > > + <ref table="QoS"/> <ref table="QoS" column="type"/> > > + <code>egress-pkts-policer</code> provides egress pkts policing for > > + userspace port types with DPDK. > > + > > + It has the following key-value pairs defined. > > + </p> > > + > > + <column name="other_config" key="pkts_rate" type='{"type": > "integer"}'> > > + The Packets Per Second (pps) represents the packet per second > rate at > > + which the token bucket will be updated. > > + </column> > > + <column name="other_config" key="pkts_burst" type='{"type": > "integer"}'> > > + The Packets Per Second Burst Size is measured in count and > represents a > > + token bucket. > > + </column> > > The available value range should be specified for these. > > > + </group> > > + > > <group title="Configuration for linux-sfq"> > > <p> > > The <code>linux-sfq</code> QoS supports the following key-value > pairs: Best regards, Huang Lin.
On 4/4/23 18:58, lin huang wrote: > Hi Ilya, > > Thanks for looking at this patch and for your feedback. > There are some explanations about your comments. > >> -----Original Message----- >> From: Ilya Maximets <i.maximets@ovn.org> >> Sent: Tuesday, April 4, 2023 10:58 PM >> To: miterv@outlook.com; dev@openvswitch.org >> Cc: i.maximets@ovn.org; Lin Huang <linhuang@ruijie.com.cn> >> Subject: Re: [PATCH v1 1/2] netdev-dpdk: add netdev-dpdk egress pkts policer. >> >> On 4/2/23 11:30, miterv@outlook.com wrote: >>> From: Lin Huang <linhuang@ruijie.com.cn> >>> >>> add netdev-dpdk egress pkts policer. >>> >>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> >>> --- >> >> Hi. Thanks for the patch! I didn't test this, but I have a few >> general comments. >> >> First, please, add some of the relevant information that you posted >> in reply to the cover letter into a commit message. I.e. why the >> feature is needed and how to configure it. >> >> Also, since it is a user-visible change, we need an entry in the >> NEWS file for it. > > Yes, I will add more information about my work later. > >> >> See some more comments inline. >> >> Best regards, Ilya Maximets. >> >>> lib/netdev-dpdk.c | 118 >> +++++++++++++++++++++++++++++++++++++++++++ >>> vswitchd/vswitch.xml | 32 ++++++++++++ >>> 2 files changed, 150 insertions(+) >>> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>> index fb0dd43f7..6709c459c 100644 >>> --- a/lib/netdev-dpdk.c >>> +++ b/lib/netdev-dpdk.c >>> @@ -19,6 +19,7 @@ >>> >>> #include <errno.h> >>> #include <signal.h> >>> +#include <stdint.h> >>> #include <stdlib.h> >>> #include <string.h> >>> #include <unistd.h> >>> @@ -59,6 +60,7 @@ >>> #include "openvswitch/ofp-parse.h" >>> #include "openvswitch/ofp-print.h" >>> #include "openvswitch/shash.h" >>> +#include "openvswitch/token-bucket.h" >>> #include "openvswitch/vlog.h" >>> #include "ovs-numa.h" >>> #include "ovs-rcu.h" >>> @@ -345,6 +347,7 @@ struct dpdk_qos_ops { >>> >>> /* dpdk_qos_ops for each type of user space QoS implementation. */ >>> static const struct dpdk_qos_ops egress_policer_ops; >>> +static const struct dpdk_qos_ops egress_pkts_policer_ops; >>> static const struct dpdk_qos_ops trtcm_policer_ops; >>> >>> /* >>> @@ -353,6 +356,7 @@ static const struct dpdk_qos_ops trtcm_policer_ops; >>> */ >>> static const struct dpdk_qos_ops *const qos_confs[] = { >>> &egress_policer_ops, >>> + &egress_pkts_policer_ops, >>> &trtcm_policer_ops, >>> NULL >>> }; >>> @@ -2335,6 +2339,29 @@ srtcm_policer_run_single_packet(struct >> rte_meter_srtcm *meter, >>> return cnt; >>> } >>> >>> +static int >>> +pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf >> **pkts, >>> + int pkt_cnt, bool should_steal) >>> +{ >>> + int cnt = 0; >>> + struct rte_mbuf *pkt; >> >> I know that existing code is written this way, but it's better to keep >> variable definitions in the reverse x-mass tree order, i.e. from the >> longest line to the shortest. > > Yes, I will modify it. > >> >>> + >>> + for (int i = 0; i < pkt_cnt; i++) { >>> + pkt = pkts[i]; >>> + /* Handle current packet */ >> >> Period at the end of a sentence. >> >>> + if (token_bucket_withdraw(tb, 1000)) { >> >> Why a 1000 and not 1 ? > > By default tokens is added per millisecond. > Rate in packets/second, bucket 1/1000 packets. > That's to say. > 1 token means 1/1000 packet. > 1 packet means 1000 tokens. Oh, I see. I didn't read the token-bucket implementation carefully enough. It might make sense to have a comment about this at the point of initialization. It might be less important if we're going to use kilo- values, but still might be worth mentioning that "kilo-packets per second" equals "packets per millisecond" that the token bucket is using, to avoid potential confusion. > >> >>> + if (cnt != i) { >>> + pkts[cnt] = pkt; >>> + } >>> + cnt++; >>> + } else if (should_steal) { >>> + rte_pktmbuf_free(pkt); >>> + } >>> + } >> >> I wonder if it makes sense to have a per-packet granularity here. >> I mean, what if we just call token_bucket_withdraw(tb, pkt_cnt) >> and drop the whole batch if it doesn't fit? >> It sounds unlikely that someone will care about extra 31 packets >> being dropped in a worst case scenario. We're already dropping >> packets, it might be better to allow 1 batch in 32, instead of >> 1 packet from each of 32 batches (just an example). >> The single token_bucket_withdraw() and the batched mbuf free will >> be much more efficient. >> >> What do you think? >> > > This code is reference to srtcm_policer_run_single_packet which have a per-packet check. > I think batched work will be more efficient, but the granularity here will be less inaccurate. > from my point of view, using per-packet check is more reasonable. Sure. I don't have a strong opinion here. The only thing that bothers me is that we may end up calling time_msec() per packet, if the bucket is empty. That might cost us some performance. Best regards, Ilya Maximets.
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index fb0dd43f7..6709c459c 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -19,6 +19,7 @@ #include <errno.h> #include <signal.h> +#include <stdint.h> #include <stdlib.h> #include <string.h> #include <unistd.h> @@ -59,6 +60,7 @@ #include "openvswitch/ofp-parse.h" #include "openvswitch/ofp-print.h" #include "openvswitch/shash.h" +#include "openvswitch/token-bucket.h" #include "openvswitch/vlog.h" #include "ovs-numa.h" #include "ovs-rcu.h" @@ -345,6 +347,7 @@ struct dpdk_qos_ops { /* dpdk_qos_ops for each type of user space QoS implementation. */ static const struct dpdk_qos_ops egress_policer_ops; +static const struct dpdk_qos_ops egress_pkts_policer_ops; static const struct dpdk_qos_ops trtcm_policer_ops; /* @@ -353,6 +356,7 @@ static const struct dpdk_qos_ops trtcm_policer_ops; */ static const struct dpdk_qos_ops *const qos_confs[] = { &egress_policer_ops, + &egress_pkts_policer_ops, &trtcm_policer_ops, NULL }; @@ -2335,6 +2339,29 @@ srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter, return cnt; } +static int +pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf **pkts, + int pkt_cnt, bool should_steal) +{ + int cnt = 0; + struct rte_mbuf *pkt; + + for (int i = 0; i < pkt_cnt; i++) { + pkt = pkts[i]; + /* Handle current packet */ + if (token_bucket_withdraw(tb, 1000)) { + if (cnt != i) { + pkts[cnt] = pkt; + } + cnt++; + } else if (should_steal) { + rte_pktmbuf_free(pkt); + } + } + + return cnt; +} + static int ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts, int pkt_cnt, bool should_steal) @@ -4858,6 +4885,97 @@ static const struct dpdk_qos_ops egress_policer_ops = { .qos_run = egress_policer_run }; +/* egress-pkts-policer details */ + +struct egress_pkts_policer { + struct qos_conf qos_conf; + struct token_bucket tb; +}; + +static int +egress_pkts_policer_qos_construct(const struct smap *details, + struct qos_conf **conf) +{ + uint32_t rate, burst; + struct egress_pkts_policer *policer; + + policer = xmalloc(sizeof *policer); + rate = smap_get_uint(details, "pkts_rate", 0); + burst = smap_get_uint(details, "pkts_burst", 0); + + /* + * Force to 0 if no rate specified, + * default to rate if burst is 0, + * else stick with user-specified value. + */ + burst = (!rate ? 0 : !burst ? rate : burst); + + qos_conf_init(&policer->qos_conf, &egress_pkts_policer_ops); + token_bucket_init(&policer->tb, rate, burst * 1000); + + *conf = &policer->qos_conf; + + return 0; +} + +static void +egress_pkts_policer_qos_destruct(struct qos_conf *conf) +{ + struct egress_pkts_policer *policer = + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); + + free(policer); +} + +static int +egress_pkts_policer_qos_get(const struct qos_conf *conf, struct smap *details) +{ + struct egress_pkts_policer *policer = + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); + + smap_add_format(details, "pkts_rate", "%"PRIu32, policer->tb.rate); + smap_add_format(details, "pkts_burst", "%"PRIu32, policer->tb.burst); + + return 0; +} + +static bool +egress_pkts_policer_qos_is_equal(const struct qos_conf *conf, + const struct smap *details) +{ + uint32_t rate, burst; + struct egress_pkts_policer *policer = + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); + + rate = smap_get_uint(details, "pkts_rate", 0); + burst = smap_get_uint(details, "pkts_burst", 0); + + return (policer->tb.rate == rate && policer->tb.burst == burst); +} + +static int +egress_pkts_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, + int pkt_cnt, bool should_steal) +{ + int cnt = 0; + struct egress_pkts_policer *policer = + CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf); + + cnt = pkts_policer_run_single_packet(&policer->tb, pkts, pkt_cnt, + should_steal); + + return cnt; +} + +static const struct dpdk_qos_ops egress_pkts_policer_ops = { + .qos_name = "egress-pkts-policer", + .qos_construct = egress_pkts_policer_qos_construct, + .qos_destruct = egress_pkts_policer_qos_destruct, + .qos_get = egress_pkts_policer_qos_get, + .qos_is_equal = egress_pkts_policer_qos_is_equal, + .qos_run = egress_pkts_policer_run +}; + /* trtcm-policer details */ struct trtcm_policer { diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index edb5eafa0..94bfe54eb 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -4791,6 +4791,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ in performance will be noticed in terms of overall aggregate traffic throughput. </dd> + <dt><code>egress-pkts-policer</code></dt> + <dd> + A DPDK egress packet per second policer algorithm using the ovs + token bucket library. The token bucket library provides an + implementation which allows the policing of packets traffic. The + implementation in OVS essentially creates a single token bucket used + to police traffic. It should be noted that when the token bucket is + configured as part of QoS there will be a performance overhead as the + token bucket itself will consume CPU cycles in order to police + traffic. These CPU cycles ordinarily are used for packet proccessing. + As such the drop in performance will be noticed in terms of overall + aggregate traffic throughput. + </dd> <dt><code>trtcm-policer</code></dt> <dd> A DPDK egress policer algorithm using RFC 4115's Two-Rate, @@ -4896,6 +4909,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ </column> </group> + <group title="Configuration for egress-pkts-policer QoS"> + <p> + <ref table="QoS"/> <ref table="QoS" column="type"/> + <code>egress-pkts-policer</code> provides egress pkts policing for + userspace port types with DPDK. + + It has the following key-value pairs defined. + </p> + + <column name="other_config" key="pkts_rate" type='{"type": "integer"}'> + The Packets Per Second (pps) represents the packet per second rate at + which the token bucket will be updated. + </column> + <column name="other_config" key="pkts_burst" type='{"type": "integer"}'> + The Packets Per Second Burst Size is measured in count and represents a + token bucket. + </column> + </group> + <group title="Configuration for linux-sfq"> <p> The <code>linux-sfq</code> QoS supports the following key-value pairs: