From patchwork Wed Sep 15 12:10:21 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerrit Renker X-Patchwork-Id: 64801 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id AB7D2B6F01 for ; Wed, 15 Sep 2010 22:10:50 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753284Ab0IOMKm (ORCPT ); Wed, 15 Sep 2010 08:10:42 -0400 Received: from dee.erg.abdn.ac.uk ([139.133.204.82]:56033 "EHLO erg.abdn.ac.uk" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753195Ab0IOMKl (ORCPT ); Wed, 15 Sep 2010 08:10:41 -0400 Received: from laptev.erg.abdn.ac.uk (Debian-exim@ra-gerrit.erg.abdn.ac.uk [139.133.204.38]) by erg.abdn.ac.uk (8.13.4/8.13.4) with ESMTP id o8FCANxB020467 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 15 Sep 2010 13:10:23 +0100 (BST) Received: from root by laptev.erg.abdn.ac.uk with local (Exim 4.69) (envelope-from ) id 1Ovqoh-0004F5-Bv; Wed, 15 Sep 2010 14:10:23 +0200 From: Gerrit Renker To: davem@davemloft.net Cc: dccp@vger.kernel.org, netdev@vger.kernel.org, Gerrit Renker Subject: [PATCH 1/3] dccp ccid-3: A lower bound for the inter-packet scheduling algorithm Date: Wed, 15 Sep 2010 14:10:21 +0200 Message-Id: <1284552623-16283-2-git-send-email-gerrit@erg.abdn.ac.uk> X-Mailer: git-send-email 1.6.0.rc2 In-Reply-To: <1284552623-16283-1-git-send-email-gerrit@erg.abdn.ac.uk> References: <1284552623-16283-1-git-send-email-gerrit@erg.abdn.ac.uk> X-ERG-MailScanner: Found to be clean X-ERG-MailScanner-From: root@erg.abdn.ac.uk X-Spam-Status: No Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This fixes a subtle bug in the calculation of the inter-packet gap and shows that t_delta, as it is currently used, is not needed. The algorithm from RFC 5348, 8.3 below continually computes a send time t_nom, which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies the scheduling granularity, s the packet size, and X the sending rate: t_distance = t_nom - t_now; // in microseconds t_delta = min(t_ipi, t_gran) / 2; // `delta' parameter in microseconds if (t_distance >= t_delta) { reschedule after (t_distance / 1000) milliseconds; } else { t_ipi = s / X; // inter-packet interval in usec t_nom += t_ipi; // compute the next send time send packet now; } Problem: -------- Rescheduling requires a conversion into milliseconds (sk_reset_timer()). The highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher granularity does not make much sense here. As a consequence, values of t_distance < 1000 are truncated to 0. This issue has so far been resolved by using instead if (t_distance >= t_delta + 1000) reschedule after (t_distance / 1000) milliseconds; This is unnecessarily large, a lower bound is t_delta' = max(t_delta, 1000). And it implies a further simplification: a) when HZ >= 500, then t_delta <= t_gran/2 = 10^6/(2*HZ) <= 1000, so that t_delta' = MAX(1000, t_delta) = 1000 (constant value); b) when HZ < 500, then t_delta = 1/2*MIN(rtt, t_ipi, t_gran) <= t_gran/2, so that 1000 <= t_delta' <= t_gran/2. The maximum error of using a constant t_delta in (b) is less than half a jiffy. Fix: ---- The patch replaces t_delta with a constant, whose value depends on CONFIG_HZ, changing the above algorithm to: if (t_distance >= t_delta') reschedule after (t_distance / 1000) milliseconds; where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise. Signed-off-by: Gerrit Renker --- net/dccp/ccids/ccid3.h | 18 +++++++++++++----- net/dccp/ccids/ccid3.c | 19 ++++++++----------- 2 files changed, 21 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/net/dccp/ccids/ccid3.h +++ b/net/dccp/ccids/ccid3.h @@ -45,12 +45,22 @@ /* Two seconds as per RFC 5348, 4.2 */ #define TFRC_INITIAL_TIMEOUT (2 * USEC_PER_SEC) -/* In usecs - half the scheduling granularity as per RFC3448 4.6 */ -#define TFRC_OPSYS_HALF_TIME_GRAN (USEC_PER_SEC / (2 * HZ)) - /* Parameter t_mbi from [RFC 3448, 4.3]: backoff interval in seconds */ #define TFRC_T_MBI 64 +/* + * The t_delta parameter (RFC 5348, 8.3): delays of less than %USEC_PER_MSEC are + * rounded down to 0, since sk_reset_timer() here uses millisecond granularity. + * Hence we can use a constant t_delta = %USEC_PER_MSEC when HZ >= 500. A coarse + * resolution of HZ < 500 means that the error is below one timer tick (t_gran) + * when using the constant t_delta = t_gran / 2 = %USEC_PER_SEC / (2 * HZ). + */ +#if (HZ >= 500) +# define TFRC_T_DELTA USEC_PER_MSEC +#else +# define TFRC_T_DELTA (USEC_PER_SEC / (2 * HZ)) +#endif + enum ccid3_options { TFRC_OPT_LOSS_EVENT_RATE = 192, TFRC_OPT_LOSS_INTERVALS = 193, @@ -90,7 +100,6 @@ enum ccid3_hc_tx_states { * @tx_no_feedback_timer: Handle to no feedback timer * @tx_t_ld: Time last doubled during slow start * @tx_t_nom: Nominal send time of next packet - * @tx_delta: Send timer delta (RFC 3448, 4.6) in usecs * @tx_hist: Packet history * @tx_options_received: Parsed set of retrieved options */ @@ -109,7 +118,6 @@ struct ccid3_hc_tx_sock { struct timer_list tx_no_feedback_timer; ktime_t tx_t_ld; ktime_t tx_t_nom; - u32 tx_delta; struct tfrc_tx_hist_entry *tx_hist; struct ccid3_options_received tx_options_received; }; --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -91,19 +91,16 @@ static inline u64 rfc3390_initial_rate(struct sock *sk) return scaled_div(w_init << 6, hc->tx_rtt); } -/* - * Recalculate t_ipi and delta (should be called whenever X changes) +/** + * ccid3_update_send_interval - Calculate new t_ipi = s / X_inst + * This respects the granularity of X_inst (64 * bytes/second). */ static void ccid3_update_send_interval(struct ccid3_hc_tx_sock *hc) { - /* Calculate new t_ipi = s / X_inst (X_inst is in 64 * bytes/second) */ hc->tx_t_ipi = scaled_div32(((u64)hc->tx_s) << 6, hc->tx_x); - /* Calculate new delta by delta = min(t_ipi / 2, t_gran / 2) */ - hc->tx_delta = min_t(u32, hc->tx_t_ipi / 2, TFRC_OPSYS_HALF_TIME_GRAN); - - ccid3_pr_debug("t_ipi=%u, delta=%u, s=%u, X=%u\n", hc->tx_t_ipi, - hc->tx_delta, hc->tx_s, (unsigned)(hc->tx_x >> 6)); + ccid3_pr_debug("t_ipi=%u, s=%u, X=%u\n", hc->tx_t_ipi, + hc->tx_s, (unsigned)(hc->tx_x >> 6)); } static u32 ccid3_hc_tx_idle_rtt(struct ccid3_hc_tx_sock *hc, ktime_t now) @@ -332,15 +329,15 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb) delay = ktime_us_delta(hc->tx_t_nom, now); ccid3_pr_debug("delay=%ld\n", (long)delay); /* - * Scheduling of packet transmissions [RFC 3448, 4.6] + * Scheduling of packet transmissions (RFC 5348, 8.3) * * if (t_now > t_nom - delta) * // send the packet now * else * // send the packet in (t_nom - t_now) milliseconds. */ - if (delay - (s64)hc->tx_delta >= 1000) - return (u32)delay / 1000L; + if (delay >= TFRC_T_DELTA) + return (u32)delay / USEC_PER_MSEC; ccid3_hc_tx_update_win_count(hc, now); break;