From patchwork Sun Sep 18 22:03:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neal Cardwell X-Patchwork-Id: 671487 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3scjjW2GJcz9t2g for ; Mon, 19 Sep 2016 08:04:31 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b=MifqVfsX; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936284AbcIRWEV (ORCPT ); Sun, 18 Sep 2016 18:04:21 -0400 Received: from mail-qt0-f176.google.com ([209.85.216.176]:35953 "EHLO mail-qt0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932119AbcIRWEL (ORCPT ); Sun, 18 Sep 2016 18:04:11 -0400 Received: by mail-qt0-f176.google.com with SMTP id l91so64341839qte.3 for ; Sun, 18 Sep 2016 15:04:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=tQJ8mBz7IwJnbWP3yXVWE2e3xAoVm9RaPfoKeSaipRw=; b=MifqVfsXK9vdnTSstBJEo23xyd1zSvaGkLxQJJnSFJTtnqfME8s9mUlrCTgPEpeoGH UHp68A5YmILgw8VJdtpjFnrbmNItKOHFvchtgbyUL9rtdfcuN8WNXYYuRC6gfEpX+cch wiDel7LpNXBgjyPeyde/urV2WAx+O2qsyR//cJ/9d1h7crrDf5SXCL+v23ZmdYZfYmiK fbHWljel43i7D9cXrrXsQleZ+45OTLvLML5VDTXq3D8bYKmg7CXBqs21ACE3cxSd6jag iTZuA/vuRlLCsGyXmez/oh6uoGqcYndebHjX64ez2i0qgphB1jhCJgmMxMACTjTjywQD D5Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=tQJ8mBz7IwJnbWP3yXVWE2e3xAoVm9RaPfoKeSaipRw=; b=jp5axUBFGweCdLRupQ67mgWZ1PPAN8lEWkvY6MudmFLTvI+BIN11lgs4CGxH46J3VS WQzdFIaDi/eApcGj5+imRXnVjb7bYvHpTRUHIY5uEJ7zrMBLSIynqbH8RUO/aoSN4sYf xnjMoLYjg+dImm0l4596COv5B0UnV4NC0Cd2S94x7p2me83wPp7eBgc+vSHRaA7/UDYc sF6BP9KERfSfcDTK1E1rSJpXMlvA8+OD0RO9AgCXpLHSDO3m4tsUtlvLVtG55C5seMIY uNhGI2tsxtAkA0e/jczMm4yOogbn9SEFSMoEOngAxDGfxr+k4K46LQJlpETUWXRhUQJ8 7Kqw== X-Gm-Message-State: AE9vXwN+3JWr0JsTlmtu4ZiYb2a3NJDli+V2zdyA6TL5FBA3QJsTtyClhAiuuKbKiVWy6qKl X-Received: by 10.200.35.219 with SMTP id r27mr27130554qtr.63.1474236250053; Sun, 18 Sep 2016 15:04:10 -0700 (PDT) Received: from joy.nyc.corp.google.com ([100.101.230.104]) by smtp.gmail.com with ESMTPSA id v43sm11313014qtv.15.2016.09.18.15.04.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 18 Sep 2016 15:04:09 -0700 (PDT) From: Neal Cardwell To: David Miller Cc: netdev@vger.kernel.org, Eric Dumazet , Van Jacobson , Neal Cardwell , Yuchung Cheng , Nandita Dukkipati , Soheil Hassas Yeganeh Subject: [PATCH v3 net-next 04/16] net_sched: sch_fq: add low_rate_threshold parameter Date: Sun, 18 Sep 2016 18:03:41 -0400 Message-Id: <1474236233-28511-5-git-send-email-ncardwell@google.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1474236233-28511-1-git-send-email-ncardwell@google.com> References: <1474236233-28511-1-git-send-email-ncardwell@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet This commit adds to the fq module a low_rate_threshold parameter to insert a delay after all packets if the socket requests a pacing rate below the threshold. This helps achieve more precise control of the sending rate with low-rate paths, especially policers. The basic issue is that if a congestion control module detects a policer at a certain rate, it may want fq to be able to shape to that policed rate. That way the sender can avoid policer drops by having the packets arrive at the policer at or just under the policed rate. The default threshold of 550Kbps was chosen analytically so that for policers or links at 500Kbps or 512Kbps fq would very likely invoke this mechanism, even if the pacing rate was briefly slightly above the available bandwidth. This value was then empirically validated with two years of production testing on YouTube video servers. Signed-off-by: Van Jacobson Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Nandita Dukkipati Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh --- include/uapi/linux/pkt_sched.h | 2 ++ net/sched/sch_fq.c | 22 +++++++++++++++++++--- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 2382eed..f8e39db 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -792,6 +792,8 @@ enum { TCA_FQ_ORPHAN_MASK, /* mask applied to orphaned skb hashes */ + TCA_FQ_LOW_RATE_THRESHOLD, /* per packet delay under this rate */ + __TCA_FQ_MAX }; diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index e5458b9..40ad4fc 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -94,6 +94,7 @@ struct fq_sched_data { u32 flow_max_rate; /* optional max rate per flow */ u32 flow_plimit; /* max packets per flow */ u32 orphan_mask; /* mask for orphaned skb */ + u32 low_rate_threshold; struct rb_root *fq_root; u8 rate_enable; u8 fq_trees_log; @@ -433,7 +434,7 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch) struct fq_flow_head *head; struct sk_buff *skb; struct fq_flow *f; - u32 rate; + u32 rate, plen; skb = fq_dequeue_head(sch, &q->internal); if (skb) @@ -482,7 +483,7 @@ begin: prefetch(&skb->end); f->credit -= qdisc_pkt_len(skb); - if (f->credit > 0 || !q->rate_enable) + if (!q->rate_enable) goto out; /* Do not pace locally generated ack packets */ @@ -493,8 +494,15 @@ begin: if (skb->sk) rate = min(skb->sk->sk_pacing_rate, rate); + if (rate <= q->low_rate_threshold) { + f->credit = 0; + plen = qdisc_pkt_len(skb); + } else { + plen = max(qdisc_pkt_len(skb), q->quantum); + if (f->credit > 0) + goto out; + } if (rate != ~0U) { - u32 plen = max(qdisc_pkt_len(skb), q->quantum); u64 len = (u64)plen * NSEC_PER_SEC; if (likely(rate)) @@ -662,6 +670,7 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = { [TCA_FQ_FLOW_MAX_RATE] = { .type = NLA_U32 }, [TCA_FQ_BUCKETS_LOG] = { .type = NLA_U32 }, [TCA_FQ_FLOW_REFILL_DELAY] = { .type = NLA_U32 }, + [TCA_FQ_LOW_RATE_THRESHOLD] = { .type = NLA_U32 }, }; static int fq_change(struct Qdisc *sch, struct nlattr *opt) @@ -716,6 +725,10 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt) if (tb[TCA_FQ_FLOW_MAX_RATE]) q->flow_max_rate = nla_get_u32(tb[TCA_FQ_FLOW_MAX_RATE]); + if (tb[TCA_FQ_LOW_RATE_THRESHOLD]) + q->low_rate_threshold = + nla_get_u32(tb[TCA_FQ_LOW_RATE_THRESHOLD]); + if (tb[TCA_FQ_RATE_ENABLE]) { u32 enable = nla_get_u32(tb[TCA_FQ_RATE_ENABLE]); @@ -781,6 +794,7 @@ static int fq_init(struct Qdisc *sch, struct nlattr *opt) q->fq_root = NULL; q->fq_trees_log = ilog2(1024); q->orphan_mask = 1024 - 1; + q->low_rate_threshold = 550000 / 8; qdisc_watchdog_init(&q->watchdog, sch); if (opt) @@ -811,6 +825,8 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_FQ_FLOW_REFILL_DELAY, jiffies_to_usecs(q->flow_refill_delay)) || nla_put_u32(skb, TCA_FQ_ORPHAN_MASK, q->orphan_mask) || + nla_put_u32(skb, TCA_FQ_LOW_RATE_THRESHOLD, + q->low_rate_threshold) || nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log)) goto nla_put_failure;