From patchwork Mon Feb 25 19:09:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leslie Monis X-Patchwork-Id: 1047958 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="fu6WOU4s"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 447Wk95Hr4z9sBr for ; Tue, 26 Feb 2019 06:10:41 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727183AbfBYTKk (ORCPT ); Mon, 25 Feb 2019 14:10:40 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:36843 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726481AbfBYTKk (ORCPT ); Mon, 25 Feb 2019 14:10:40 -0500 Received: by mail-pf1-f193.google.com with SMTP id n22so4937866pfa.3 for ; Mon, 25 Feb 2019 11:10:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=tUss4pV2DQe9No72fLGwesiL9LjPz3k1RtuDRheAl7U=; b=fu6WOU4sk5ZiZcOTzsMX8wsJM0Ffcv5stAVUdbaSt7fuGyTwYELWbYnw0k7usr5SJE O4Wg4cKyNdo1kgj8o5S/yOPocCNggOPAl0c0LpXgwvzLxwH6uQIjo3A3GbAP9KBJdQmj QTmXpGQtaou2ULLJhC8dh2YlolBMTRUhq3YeKmqt6sCX6OGZ55M4Sx74ONM3LgjAfUME iH/F7uSCyxXqx1wEVeiarYVFeQP3uZ1OgAIcLLd9pqOMCGk+5MIvdYyDKFmXDrzR/Aju jx1tFGPie8kYaBGqUbXsKns56vcFjtLhkyzypTT63eHs+KFjdh/YvTbKxxqjjb+WFvEf 7Azw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=tUss4pV2DQe9No72fLGwesiL9LjPz3k1RtuDRheAl7U=; b=oAaiMu2yMj1oEk8wBk7fAKgUBTH+2vTnmrvg0sknMnFGmWzZ1YwdTvodBQhaawZ5t/ wUWFXRXX1gNy3Jjm+K9+GliZLWjksunAj+zSYmcW90fn5LPUHLyXs+OMx8YJDD9RaoJe zBmu4Kz0UUF9uKzVtpOb8cS5q0uEjto2wt4Pt77waOY8kRjP0yq17bqebG4VgNzwLkcH ks0K/C4nfBRVuGuL70dkBLO26VK1z8vNQKm4wqVnCmawa68JZzNGBbuDe2vG6cYwNCEL WtVSZJp5Om+TVxusUo6DNyB6cQder3U7IJbtahu4rTVsi7LM7wgGTvO/Huai6m2KTkKm 7lig== X-Gm-Message-State: AHQUAua+jsQ/afkTS7wFlYXT94IqFpi/nZwrtFWYOfXOi7963UVffvb9 iQQZIPtdddcpRPmiqLprjSI= X-Google-Smtp-Source: AHgI3Ia9qEPGvKBlH5Xlx48tcAmjzOGpL6FIAdqkpc50B4OFg19hSWGd7pvWCtAsqx+gtSUIhFxZoA== X-Received: by 2002:a63:1060:: with SMTP id 32mr16273754pgq.126.1551121839093; Mon, 25 Feb 2019 11:10:39 -0800 (PST) Received: from localhost.localdomain ([106.216.180.155]) by smtp.gmail.com with ESMTPSA id g15sm13296394pfo.35.2019.02.25.11.10.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Feb 2019 11:10:38 -0800 (PST) From: Leslie Monis To: davem@davemloft.net Cc: netdev@vger.kernel.org, "Mohit P. Tahiliani" , Dave Taht , Jamal Hadi Salim , Dhaval Khandla , Hrishikesh Hiraskar , Manish Kumar B , "Sachin D . Patil" , Leslie Monis Subject: [PATCH net-next v3 5/7] net: sched: pie: add more cases to auto-tune alpha and beta Date: Tue, 26 Feb 2019 00:39:59 +0530 Message-Id: <20190225191001.26797-6-lesliemonis@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190225191001.26797-1-lesliemonis@gmail.com> References: <20190225191001.26797-1-lesliemonis@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: "Mohit P. Tahiliani" The current implementation scales the local alpha and beta variables in the calculate_probability function by the same amount for all values of drop probability below 1%. RFC 8033 suggests using additional cases for auto-tuning alpha and beta when the drop probability is less than 1%. In order to add more auto-tuning cases, MAX_PROB must be scaled by u64 instead of u32 to prevent underflow when scaling the local alpha and beta variables in the calculate_probability function. Signed-off-by: Mohit P. Tahiliani Signed-off-by: Dhaval Khandla Signed-off-by: Hrishikesh Hiraskar Signed-off-by: Manish Kumar B Signed-off-by: Sachin D. Patil Signed-off-by: Leslie Monis Acked-by: Dave Taht Acked-by: Jamal Hadi Salim --- include/uapi/linux/pkt_sched.h | 2 +- net/sched/sch_pie.c | 65 +++++++++++++++++----------------- 2 files changed, 33 insertions(+), 34 deletions(-) diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h index 0d18b1d1fbbc..1eb572ef3f27 100644 --- a/include/uapi/linux/pkt_sched.h +++ b/include/uapi/linux/pkt_sched.h @@ -954,7 +954,7 @@ enum { #define TCA_PIE_MAX (__TCA_PIE_MAX - 1) struct tc_pie_xstats { - __u32 prob; /* current probability */ + __u64 prob; /* current probability */ __u32 delay; /* current delay in ms */ __u32 avg_dq_rate; /* current average dq_rate in bits/pie_time */ __u32 packets_in; /* total number of packets enqueued */ diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c index d88ab53593b3..30f158582499 100644 --- a/net/sched/sch_pie.c +++ b/net/sched/sch_pie.c @@ -33,7 +33,7 @@ #define QUEUE_THRESHOLD 16384 #define DQCOUNT_INVALID -1 -#define MAX_PROB 0xffffffff +#define MAX_PROB 0xffffffffffffffff #define PIE_SCALE 8 /* parameters used */ @@ -49,7 +49,7 @@ struct pie_params { /* variables used */ struct pie_vars { - u32 prob; /* probability but scaled by u32 limit. */ + u64 prob; /* probability but scaled by u64 limit. */ psched_time_t burst_time; psched_time_t qdelay; psched_time_t qdelay_old; @@ -99,8 +99,8 @@ static void pie_vars_init(struct pie_vars *vars) static bool drop_early(struct Qdisc *sch, u32 packet_size) { struct pie_sched_data *q = qdisc_priv(sch); - u32 rnd; - u32 local_prob = q->vars.prob; + u64 rnd; + u64 local_prob = q->vars.prob; u32 mtu = psched_mtu(qdisc_dev(sch)); /* If there is still burst allowance left skip random early drop */ @@ -124,11 +124,11 @@ static bool drop_early(struct Qdisc *sch, u32 packet_size) * probablity. Smaller packets will have lower drop prob in this case */ if (q->params.bytemode && packet_size <= mtu) - local_prob = (local_prob / mtu) * packet_size; + local_prob = (u64)packet_size * div_u64(local_prob, mtu); else local_prob = q->vars.prob; - rnd = prandom_u32(); + prandom_bytes(&rnd, 8); if (rnd < local_prob) return true; @@ -317,9 +317,10 @@ static void calculate_probability(struct Qdisc *sch) u32 qlen = sch->qstats.backlog; /* queue size in bytes */ psched_time_t qdelay = 0; /* in pschedtime */ psched_time_t qdelay_old = q->vars.qdelay; /* in pschedtime */ - s32 delta = 0; /* determines the change in probability */ - u32 oldprob; - u32 alpha, beta; + s64 delta = 0; /* determines the change in probability */ + u64 oldprob; + u64 alpha, beta; + u32 power; bool update_prob = true; q->vars.qdelay_old = q->vars.qdelay; @@ -339,38 +340,36 @@ static void calculate_probability(struct Qdisc *sch) * value for alpha as 0.125. In this implementation, we use values 0-32 * passed from user space to represent this. Also, alpha and beta have * unit of HZ and need to be scaled before they can used to update - * probability. alpha/beta are updated locally below by 1) scaling them - * appropriately 2) scaling down by 16 to come to 0-2 range. - * Please see paper for details. - * - * We scale alpha and beta differently depending on whether we are in - * light, medium or high dropping mode. + * probability. alpha/beta are updated locally below by scaling down + * by 16 to come to 0-2 range. */ - if (q->vars.prob < MAX_PROB / 100) { - alpha = - (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 7; - beta = - (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 7; - } else if (q->vars.prob < MAX_PROB / 10) { - alpha = - (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 5; - beta = - (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 5; - } else { - alpha = - (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4; - beta = - (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4; + alpha = ((u64)q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4; + beta = ((u64)q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4; + + /* We scale alpha and beta differently depending on how heavy the + * congestion is. Please see RFC 8033 for details. + */ + if (q->vars.prob < MAX_PROB / 10) { + alpha >>= 1; + beta >>= 1; + + power = 100; + while (q->vars.prob < div_u64(MAX_PROB, power) && + power <= 1000000) { + alpha >>= 2; + beta >>= 2; + power *= 10; + } } /* alpha and beta should be between 0 and 32, in multiples of 1/16 */ - delta += alpha * ((qdelay - q->params.target)); - delta += beta * ((qdelay - qdelay_old)); + delta += alpha * (u64)(qdelay - q->params.target); + delta += beta * (u64)(qdelay - qdelay_old); oldprob = q->vars.prob; /* to ensure we increase probability in steps of no more than 2% */ - if (delta > (s32)(MAX_PROB / (100 / 2)) && + if (delta > (s64)(MAX_PROB / (100 / 2)) && q->vars.prob >= MAX_PROB / 10) delta = (MAX_PROB / 100) * 2;