From patchwork Sat Sep 17 17:35:37 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Neal Cardwell <ncardwell@google.com>
X-Patchwork-Id: 671239
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 3sbzpg16JXz9ryT
	for <patchwork-incoming@ozlabs.org>;
	Sun, 18 Sep 2016 03:36:27 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (2048-bit key;
	unprotected) header.d=google.com header.i=@google.com
	header.b=hHVnaZuC; dkim-atps=neutral
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754193AbcIQRgW (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);
	Sat, 17 Sep 2016 13:36:22 -0400
Received: from mail-qk0-f180.google.com ([209.85.220.180]:34228 "EHLO
	mail-qk0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754079AbcIQRgP (ORCPT
	<rfc822;netdev@vger.kernel.org>); Sat, 17 Sep 2016 13:36:15 -0400
Received: by mail-qk0-f180.google.com with SMTP id h8so109670253qka.1
	for <netdev@vger.kernel.org>; Sat, 17 Sep 2016 10:36:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=tQJ8mBz7IwJnbWP3yXVWE2e3xAoVm9RaPfoKeSaipRw=;
	b=hHVnaZuCSG5zpHBL+jrradvq3ifk75N/tcEkO+2df1bnFKB21yBBy6XJd574++pymL
	QK7erDOUskRTjGZdg1vpTHJDKBWma+j9IWOZEGVp5dimPobZRGTPXg6WYKQ7iezSlHwG
	EDlCOqK/apcgZ2QnwKFrk3kv6bb6rWSaX0wo6vdj7Hs6XUw5MMEi37kmEBE2K4kaTOFL
	V9+WnVMrFsoCJOz+P2K1rv5IVqeOgyQXHhALD+JejAno6+Ed8nDQ8Fpv8OIW/OZEAq+V
	POlmadJkHjYxggAbNwL3Mro5z4P6MOr8UrbFkGgQNJx3TacAiqQDrifVA1O51VLVb62W
	S/8Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=tQJ8mBz7IwJnbWP3yXVWE2e3xAoVm9RaPfoKeSaipRw=;
	b=bfwwdyAI4eXbX7bsL9K2YFZaXvOmqTIGtGPW6VyZJ6TM00m+gwgg5iGkp2kmPQbMh2
	yDZ4oaF7B+2karNz7PG8QvmvzWrrABLbu8XQh6Gl4kfosZjZqtLujLQAxoVoKnpgrd6f
	RN08TguIklNj68fCEtqr69DSpJ3432O8YpBF5zzfsY1ysddogrRBfaXSe5gFBtpkKDUE
	NcA/kkh8xAKUE9sHXWb5wsQmizChfQTO1pdOpaCJnqGc+/Yhf/84NClAdfbdbYu9kCwK
	/RrcvBAaP54xiVxKpYECcMK8g0wCcKqD1uUXV18IbOy0MCqx8SlT+hvwud3f9xFzyXgz
	jkWw==
X-Gm-Message-State: 
 AE9vXwMuyuBybavi7c/AiPYh6Y1wGNs9hOO+VCo0HvIFZUbOvnDtXFH6zq7lxJYO1Q4TvpYn
X-Received: by 10.233.237.201 with SMTP id
	c192mr21485692qkg.29.1474133773318;
	Sat, 17 Sep 2016 10:36:13 -0700 (PDT)
Received: from joy.nyc.corp.google.com ([100.101.230.104])
	by smtp.gmail.com with ESMTPSA id
	t21sm8068625qkg.4.2016.09.17.10.36.12
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Sat, 17 Sep 2016 10:36:12 -0700 (PDT)
From: Neal Cardwell <ncardwell@google.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Van Jacobson <vanj@google.com>, Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	Nandita Dukkipati <nanditad@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>
Subject: [PATCH v2 net-next 04/16] net_sched: sch_fq: add low_rate_threshold
	parameter
Date: Sat, 17 Sep 2016 13:35:37 -0400
Message-Id: <1474133749-12895-5-git-send-email-ncardwell@google.com>
X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020
In-Reply-To: <1474133749-12895-1-git-send-email-ncardwell@google.com>
References: <1474133749-12895-1-git-send-email-ncardwell@google.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

From: Eric Dumazet <edumazet@google.com>

This commit adds to the fq module a low_rate_threshold parameter to
insert a delay after all packets if the socket requests a pacing rate
below the threshold.

This helps achieve more precise control of the sending rate with
low-rate paths, especially policers. The basic issue is that if a
congestion control module detects a policer at a certain rate, it may
want fq to be able to shape to that policed rate. That way the sender
can avoid policer drops by having the packets arrive at the policer at
or just under the policed rate.

The default threshold of 550Kbps was chosen analytically so that for
policers or links at 500Kbps or 512Kbps fq would very likely invoke
this mechanism, even if the pacing rate was briefly slightly above the
available bandwidth. This value was then empirically validated with
two years of production testing on YouTube video servers.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 include/uapi/linux/pkt_sched.h |  2 ++
 net/sched/sch_fq.c             | 22 +++++++++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 2382eed..f8e39db 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -792,6 +792,8 @@ enum {
 
 	TCA_FQ_ORPHAN_MASK,	/* mask applied to orphaned skb hashes */
 
+	TCA_FQ_LOW_RATE_THRESHOLD, /* per packet delay under this rate */
+
 	__TCA_FQ_MAX
 };
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index e5458b9..40ad4fc 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -94,6 +94,7 @@ struct fq_sched_data {
 	u32		flow_max_rate;	/* optional max rate per flow */
 	u32		flow_plimit;	/* max packets per flow */
 	u32		orphan_mask;	/* mask for orphaned skb */
+	u32		low_rate_threshold;
 	struct rb_root	*fq_root;
 	u8		rate_enable;
 	u8		fq_trees_log;
@@ -433,7 +434,7 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch)
 	struct fq_flow_head *head;
 	struct sk_buff *skb;
 	struct fq_flow *f;
-	u32 rate;
+	u32 rate, plen;
 
 	skb = fq_dequeue_head(sch, &q->internal);
 	if (skb)
@@ -482,7 +483,7 @@ begin:
 	prefetch(&skb->end);
 	f->credit -= qdisc_pkt_len(skb);
 
-	if (f->credit > 0 || !q->rate_enable)
+	if (!q->rate_enable)
 		goto out;
 
 	/* Do not pace locally generated ack packets */
@@ -493,8 +494,15 @@ begin:
 	if (skb->sk)
 		rate = min(skb->sk->sk_pacing_rate, rate);
 
+	if (rate <= q->low_rate_threshold) {
+		f->credit = 0;
+		plen = qdisc_pkt_len(skb);
+	} else {
+		plen = max(qdisc_pkt_len(skb), q->quantum);
+		if (f->credit > 0)
+			goto out;
+	}
 	if (rate != ~0U) {
-		u32 plen = max(qdisc_pkt_len(skb), q->quantum);
 		u64 len = (u64)plen * NSEC_PER_SEC;
 
 		if (likely(rate))
@@ -662,6 +670,7 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = {
 	[TCA_FQ_FLOW_MAX_RATE]		= { .type = NLA_U32 },
 	[TCA_FQ_BUCKETS_LOG]		= { .type = NLA_U32 },
 	[TCA_FQ_FLOW_REFILL_DELAY]	= { .type = NLA_U32 },
+	[TCA_FQ_LOW_RATE_THRESHOLD]	= { .type = NLA_U32 },
 };
 
 static int fq_change(struct Qdisc *sch, struct nlattr *opt)
@@ -716,6 +725,10 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt)
 	if (tb[TCA_FQ_FLOW_MAX_RATE])
 		q->flow_max_rate = nla_get_u32(tb[TCA_FQ_FLOW_MAX_RATE]);
 
+	if (tb[TCA_FQ_LOW_RATE_THRESHOLD])
+		q->low_rate_threshold =
+			nla_get_u32(tb[TCA_FQ_LOW_RATE_THRESHOLD]);
+
 	if (tb[TCA_FQ_RATE_ENABLE]) {
 		u32 enable = nla_get_u32(tb[TCA_FQ_RATE_ENABLE]);
 
@@ -781,6 +794,7 @@ static int fq_init(struct Qdisc *sch, struct nlattr *opt)
 	q->fq_root		= NULL;
 	q->fq_trees_log		= ilog2(1024);
 	q->orphan_mask		= 1024 - 1;
+	q->low_rate_threshold	= 550000 / 8;
 	qdisc_watchdog_init(&q->watchdog, sch);
 
 	if (opt)
@@ -811,6 +825,8 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	    nla_put_u32(skb, TCA_FQ_FLOW_REFILL_DELAY,
 			jiffies_to_usecs(q->flow_refill_delay)) ||
 	    nla_put_u32(skb, TCA_FQ_ORPHAN_MASK, q->orphan_mask) ||
+	    nla_put_u32(skb, TCA_FQ_LOW_RATE_THRESHOLD,
+			q->low_rate_threshold) ||
 	    nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log))
 		goto nla_put_failure;