Message ID | 20200311173356.38181-4-petrm@mellanox.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | RED: Introduce an ECN tail-dropping mode | expand |
On 3/11/20 10:33 AM, Petr Machata wrote: > When the RED Qdisc is currently configured to enable ECN, the RED algorithm > is used to decide whether a certain SKB should be marked. If that SKB is > not ECN-capable, it is early-dropped. > > It is also possible to keep all traffic in the queue, and just mark the > ECN-capable subset of it, as appropriate under the RED algorithm. Some > switches support this mode, and some installations make use of it. > > To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is > configured with this flag, non-ECT traffic is enqueued (and tail-dropped > when the queue size is exhausted) instead of being early-dropped. > I find the naming of the feature very confusing. When enabling this new feature, we no longer drop packets that could not be CE marked. Tail drop is already in the packet scheduler, you want to disable it. How this feature has been named elsewhere ??? (you mentioned in your cover letter : "Some switches support this mode, and some installations make use of it.")
Eric Dumazet <eric.dumazet@gmail.com> writes: > On 3/11/20 10:33 AM, Petr Machata wrote: >> When the RED Qdisc is currently configured to enable ECN, the RED algorithm >> is used to decide whether a certain SKB should be marked. If that SKB is >> not ECN-capable, it is early-dropped. >> >> It is also possible to keep all traffic in the queue, and just mark the >> ECN-capable subset of it, as appropriate under the RED algorithm. Some >> switches support this mode, and some installations make use of it. >> >> To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is >> configured with this flag, non-ECT traffic is enqueued (and tail-dropped >> when the queue size is exhausted) instead of being early-dropped. >> > > I find the naming of the feature very confusing. > > When enabling this new feature, we no longer drop packets > that could not be CE marked. > Tail drop is already in the packet scheduler, you want to disable it. > > > How this feature has been named elsewhere ??? > (you mentioned in your cover letter : > "Some switches support this mode, and some installations make use of it.") The two interfaces that I know about are Juniper and Cumulus. I don't know either from direct experience, but from the manual, Cumulus seems to allow enablement of either ECN on its own[0], or ECN with RED[1]. (Or RED on its own I presume, but I couldn't actually find that.) In Juniper likewise, "on ECN-enabled queues, the switch [...] uses the tail-drop algorithm to drop non-ECN-capable packets during periods of congestion"[2]. You need to direct non-ECT traffic to a different queue and configure RED on that to get the RED+ECN behavior ala Linux. So this is unlike the RED qdisc, where RED is implied, and needs to be turned off again by an anti-RED flag. The logic behind the chosen flag name is that the opposite of early dropping is tail dropping. Note that Juniper actually calls it that as well. That said, I agree that from the perspective of the qdisc itself the name does not make sense. We can make it "nodrop", or "nored", or maybe "keep_non_ect"... I guess "nored" is closest to the desired effect. [0] https://docs.cumulusnetworks.com/cumulus-linux-40/Layer-1-and-Switch-Ports/Buffer-and-Queue-Management/ [1] https://docs.cumulusnetworks.com/version/cumulus-linux-37/Network-Solutions/RDMA-over-Converged-Ethernet-RoCE/ [2] https://www.juniper.net/documentation/en_US/junos/topics/concept/cos-qfx-series-tail-drop-understanding.html
On 3/11/20 5:42 PM, Petr Machata wrote: > > Eric Dumazet <eric.dumazet@gmail.com> writes: > >> On 3/11/20 10:33 AM, Petr Machata wrote: >>> When the RED Qdisc is currently configured to enable ECN, the RED algorithm >>> is used to decide whether a certain SKB should be marked. If that SKB is >>> not ECN-capable, it is early-dropped. >>> >>> It is also possible to keep all traffic in the queue, and just mark the >>> ECN-capable subset of it, as appropriate under the RED algorithm. Some >>> switches support this mode, and some installations make use of it. >>> >>> To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is >>> configured with this flag, non-ECT traffic is enqueued (and tail-dropped >>> when the queue size is exhausted) instead of being early-dropped. >>> >> >> I find the naming of the feature very confusing. >> >> When enabling this new feature, we no longer drop packets >> that could not be CE marked. >> Tail drop is already in the packet scheduler, you want to disable it. >> >> >> How this feature has been named elsewhere ??? >> (you mentioned in your cover letter : >> "Some switches support this mode, and some installations make use of it.") > > The two interfaces that I know about are Juniper and Cumulus. I don't > know either from direct experience, but from the manual, Cumulus seems > to allow enablement of either ECN on its own[0], or ECN with RED[1]. (Or > RED on its own I presume, but I couldn't actually find that.) > > In Juniper likewise, "on ECN-enabled queues, the switch [...] uses the > tail-drop algorithm to drop non-ECN-capable packets during periods of > congestion"[2]. You need to direct non-ECT traffic to a different queue > and configure RED on that to get the RED+ECN behavior ala Linux. > > So this is unlike the RED qdisc, where RED is implied, and needs to be > turned off again by an anti-RED flag. The logic behind the chosen flag > name is that the opposite of early dropping is tail dropping. Note that > Juniper actually calls it that as well. > > That said, I agree that from the perspective of the qdisc itself the > name does not make sense. We can make it "nodrop", or "nored", or maybe > "keep_non_ect"... I guess "nored" is closest to the desired effect. Well, red algo is still used to decide if we attempt ECN marking, so "nodrop" seems better to me :) > > [0] https://docs.cumulusnetworks.com/cumulus-linux-40/Layer-1-and-Switch-Ports/Buffer-and-Queue-Management/ > [1] https://docs.cumulusnetworks.com/version/cumulus-linux-37/Network-Solutions/RDMA-over-Converged-Ethernet-RoCE/ > [2] https://www.juniper.net/documentation/en_US/junos/topics/concept/cos-qfx-series-tail-drop-understanding.html >
On Wed, 11 Mar 2020 18:01:38 -0700 Eric Dumazet wrote: > > That said, I agree that from the perspective of the qdisc itself the > > name does not make sense. We can make it "nodrop", or "nored", or maybe > > "keep_non_ect"... I guess "nored" is closest to the desired effect. > > Well, red algo is still used to decide if we attempt ECN marking, so "nodrop" > seems better to me :) nodrop + harddrop is a valid combination and that'd look a little strange. Maybe something like ect-only?
Jakub Kicinski <kuba@kernel.org> writes: > On Wed, 11 Mar 2020 18:01:38 -0700 Eric Dumazet wrote: >> > That said, I agree that from the perspective of the qdisc itself the >> > name does not make sense. We can make it "nodrop", or "nored", or maybe >> > "keep_non_ect"... I guess "nored" is closest to the desired effect. >> >> Well, red algo is still used to decide if we attempt ECN marking, so "nodrop" >> seems better to me :) I know that the D in RED could stand for "detection", but the one in RED as a qdisc kind certainly stands for "drop" :) > nodrop + harddrop is a valid combination and that'd look a little > strange. Maybe something like ect-only? But then "ecn ect_only" would look equally strange. Like, of course ECN is ECT only... I don't actually mind "nodrop harddrop". You can't guess what these flags do from just their names anyway, so the fact they don't seem to make sense next to each other is not an issue. And "nodrop" does describe what the immediate behavior in context of the RED qdisc is.
Petr Machata <petrm@mellanox.com> writes: > @@ -60,6 +60,11 @@ static inline int red_use_harddrop(struct red_sched_data *q) > return q->flags & TC_RED_HARDDROP; > } > > +static inline int red_use_taildrop(struct red_sched_data *q) Forgot to take care of the static inline :-| > +{ > + return q->flags & TC_RED_TAILDROP; > +} > +
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 341a66af8d59..9ad369aba678 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -727,6 +727,7 @@ struct tc_red_qopt_offload_params { u32 limit; bool is_ecn; bool is_harddrop; + bool is_taildrop; struct gnet_stats_queue *qstats; }; diff --git a/include/net/red.h b/include/net/red.h index 5718d2b25637..372b4988118c 100644 --- a/include/net/red.h +++ b/include/net/red.h @@ -200,6 +200,11 @@ static inline bool red_get_flags(unsigned char flags, return false; } + if ((*p_flags & TC_RED_TAILDROP) && !(*p_flags & TC_RED_ECN)) { + NL_SET_ERR_MSG_MOD(extack, "taildrop mode is only meaningful with ECN"); + return false; + } + *p_userbits = flags & ~historic_mask; return true; } diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 277df546e1a9..45d1c0e6444e 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -286,6 +286,7 @@ struct tc_red_qopt { #define TC_RED_ECN 1 #define TC_RED_HARDDROP 2 #define TC_RED_ADAPTATIVE 4 +#define TC_RED_TAILDROP 8 }; #define TC_RED_HISTORIC_FLAGS (TC_RED_ECN | TC_RED_HARDDROP | TC_RED_ADAPTATIVE) diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index 61d7c5a61279..1474f973ec6d 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -48,7 +48,7 @@ struct red_sched_data { struct Qdisc *qdisc; }; -#define RED_SUPPORTED_FLAGS TC_RED_HISTORIC_FLAGS +#define RED_SUPPORTED_FLAGS (TC_RED_HISTORIC_FLAGS | TC_RED_TAILDROP) static inline int red_use_ecn(struct red_sched_data *q) { @@ -60,6 +60,11 @@ static inline int red_use_harddrop(struct red_sched_data *q) return q->flags & TC_RED_HARDDROP; } +static inline int red_use_taildrop(struct red_sched_data *q) +{ + return q->flags & TC_RED_TAILDROP; +} + static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { @@ -80,23 +85,36 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, case RED_PROB_MARK: qdisc_qstats_overlimit(sch); - if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) { + if (!red_use_ecn(q)) { q->stats.prob_drop++; goto congestion_drop; } - q->stats.prob_mark++; + if (INET_ECN_set_ce(skb)) { + q->stats.prob_mark++; + } else if (!red_use_taildrop(q)) { + q->stats.prob_drop++; + goto congestion_drop; + } + + /* Non-ECT packet in ECN taildrop mode: queue it. */ break; case RED_HARD_MARK: qdisc_qstats_overlimit(sch); - if (red_use_harddrop(q) || !red_use_ecn(q) || - !INET_ECN_set_ce(skb)) { + if (red_use_harddrop(q) || !red_use_ecn(q)) { q->stats.forced_drop++; goto congestion_drop; } - q->stats.forced_mark++; + if (INET_ECN_set_ce(skb)) { + q->stats.forced_mark++; + } else if (!red_use_taildrop(q)) { + q->stats.forced_drop++; + goto congestion_drop; + } + + /* Non-ECT packet in ECN taildrop mode: queue it. */ break; } @@ -171,6 +189,7 @@ static int red_offload(struct Qdisc *sch, bool enable) opt.set.limit = q->limit; opt.set.is_ecn = red_use_ecn(q); opt.set.is_harddrop = red_use_harddrop(q); + opt.set.is_taildrop = red_use_taildrop(q); opt.set.qstats = &sch->qstats; } else { opt.command = TC_RED_DESTROY;
When the RED Qdisc is currently configured to enable ECN, the RED algorithm is used to decide whether a certain SKB should be marked. If that SKB is not ECN-capable, it is early-dropped. It is also possible to keep all traffic in the queue, and just mark the ECN-capable subset of it, as appropriate under the RED algorithm. Some switches support this mode, and some installations make use of it. To that end, add a new RED flag, TC_RED_TAILDROP. When the Qdisc is configured with this flag, non-ECT traffic is enqueued (and tail-dropped when the queue size is exhausted) instead of being early-dropped. Signed-off-by: Petr Machata <petrm@mellanox.com> --- Notes: v2: - Fix red_use_taildrop() condition in red_enqueue switch for probabilistic case. include/net/pkt_cls.h | 1 + include/net/red.h | 5 +++++ include/uapi/linux/pkt_sched.h | 1 + net/sched/sch_red.c | 31 +++++++++++++++++++++++++------ 4 files changed, 32 insertions(+), 6 deletions(-)