From patchwork Wed Jun 24 17:17:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316462 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=EiK+iXWL; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVHZ02JRz9sPF for ; Thu, 25 Jun 2020 03:19:06 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405445AbgFXRTF (ORCPT ); Wed, 24 Jun 2020 13:19:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405414AbgFXRTE (ORCPT ); Wed, 24 Jun 2020 13:19:04 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 710ECC061573 for ; Wed, 24 Jun 2020 10:19:04 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id d10so1305254pls.5 for ; Wed, 24 Jun 2020 10:19:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ol9t0bjdsAj0UzWNXZ3UJ+4bKWVFzJMdubkoVZLa8xA=; b=EiK+iXWLGGkn4hpVC0KxeD/SKkLR03cRAQ/XwlOw9T1/k3oaxAopAhREqjpJ4kAiR/ smVPciiVnd9/FD9DAJJ6l9avVnir/JTl1pL+FsBhT3MSN2aCeEFl4W2Zv8OEstazuqmb hBOGPZ1dJzHIzTkG92yh4gcrGIFB/P3ys8aEqA0iaMXj38l//uFHWsMF9EhYrDj/gKht 8c3vt69qRhjeM0uH1A0VEE8Fgniu+qmV0mjRHrEbeCJsTNafJqKu5TatTpWGiAuBLHm6 r1UKqzqNQ3bGEx2eDiJndF0nj1yiJe5fzjnxN4hjetuMtiJGv/HVnJHXwY4W9U+VX0Kn cHaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ol9t0bjdsAj0UzWNXZ3UJ+4bKWVFzJMdubkoVZLa8xA=; b=cxvr6Qwz9emMDqysBEIvZXoyyVMM5fu0dd5Be3EBoGRJ+u73ZFNADk8il09L15nSHN /H0SqJJBmWArtJnfN2W7Se0P3ty/pmGqxKXKpeu33V1kCiJNbEPlOKcWf/F3e8mOJGJP sjweX/DgHvNn/Ue9zp5Ej8QSLh4X+aCFjLwYJ6Bw9CWEUz8aFUESt8qF0QcJKRqCd4+c 2IH/8OfwX6mGfMRbq37kmS3a2FdCC9EUS72W99xDuucwbJMbmXYwNwi8RRLbYL7GHSxx lroe/9xMIXKZpzlPubeQBSp4hkdJnWhLgUGUWhJCdHSf2Cuj6XcwI53roFrcA9d3Ur5U eTtw== X-Gm-Message-State: AOAM533kWdFqxxBAPNTydAXbeEcKd2SGvBp2MWbRzzCbo/6q0ABHmHZc bC1IqGT69mWaO0q/EP6/xB4EddzdKsE= X-Google-Smtp-Source: ABdhPJxUdPE/TcbmEDgp+ec+yReJTK3z9xFkM8syVGkc37ghhdKA6MDwSHb+qj8wkEY7SXzDUb9Krg== X-Received: by 2002:a17:902:8b82:: with SMTP id ay2mr18039670plb.185.1593019143557; Wed, 24 Jun 2020 10:19:03 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:02 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 01/11] cgroup: Export cgroup_{procs,threads}_start and cgroup_procs_next Date: Wed, 24 Jun 2020 10:17:40 -0700 Message-Id: <20200624171749.11927-2-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Export the functions and put prototypes in linux/cgroup.h. This allows creating cgroup entries that provide per task information. --- include/linux/cgroup.h | 3 +++ kernel/cgroup/cgroup.c | 9 ++++++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 4598e4da6b1b..59837f6f4e54 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -119,6 +119,9 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen); int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry); int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *tsk); +void *cgroup_procs_start(struct seq_file *s, loff_t *pos); +void *cgroup_threads_start(struct seq_file *s, loff_t *pos); +void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos); void cgroup_fork(struct task_struct *p); extern int cgroup_can_fork(struct task_struct *p, diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1ea181a58465..69cd14201cf0 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -4597,7 +4597,7 @@ static void cgroup_procs_release(struct kernfs_open_file *of) } } -static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos) +void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos) { struct kernfs_open_file *of = s->private; struct css_task_iter *it = of->priv; @@ -4607,6 +4607,7 @@ static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos) return css_task_iter_next(it); } +EXPORT_SYMBOL_GPL(cgroup_procs_next); static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos, unsigned int iter_flags) @@ -4637,7 +4638,7 @@ static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos, return cgroup_procs_next(s, NULL, NULL); } -static void *cgroup_procs_start(struct seq_file *s, loff_t *pos) +void *cgroup_procs_start(struct seq_file *s, loff_t *pos) { struct cgroup *cgrp = seq_css(s)->cgroup; @@ -4653,6 +4654,7 @@ static void *cgroup_procs_start(struct seq_file *s, loff_t *pos) return __cgroup_procs_start(s, pos, CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED); } +EXPORT_SYMBOL_GPL(cgroup_procs_start); static int cgroup_procs_show(struct seq_file *s, void *v) { @@ -4764,10 +4766,11 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of, return ret ?: nbytes; } -static void *cgroup_threads_start(struct seq_file *s, loff_t *pos) +void *cgroup_threads_start(struct seq_file *s, loff_t *pos) { return __cgroup_procs_start(s, pos, 0); } +EXPORT_SYMBOL_GPL(cgroup_threads_start); static ssize_t cgroup_threads_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) From patchwork Wed Jun 24 17:17:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316463 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=jwJYVbLw; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVHh3TbCz9sPF for ; Thu, 25 Jun 2020 03:19:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405450AbgFXRTL (ORCPT ); Wed, 24 Jun 2020 13:19:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405414AbgFXRTK (ORCPT ); Wed, 24 Jun 2020 13:19:10 -0400 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6168CC061573 for ; Wed, 24 Jun 2020 10:19:10 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id s14so1301563plq.6 for ; Wed, 24 Jun 2020 10:19:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ao8I7yh9lQNodY4ZHsI/D/j9Zk0SkiV9Y+kygLt5pz0=; b=jwJYVbLwAc2j2v6U9ufu3rZMyHPCHikwfg1Ayf2zDmsjeei6zrCzP3SVeGQ7F7l7P4 /cQ5aEQW7OUk5F4xgEtvnIBj4urjSKiYBcm8kpHn1SsLl5PU/GptXF7p6RhWD8wkmMbh 32DwSwFAfXWUFExvint9+eiaYqkYPrFGBaAXi1YKz7LAepoMsiBJAoXeUzApQf7sSkVD Z1eITOO2J4wjuJ3MT9Vs49oWg7h+dozGmvsuTK4WhES2a7viDwm9b8LlcK/07sMsFyeg MeIs8XLaHpJt0EW7Kas/peCrwOQtSSOetEfsHfxfgISGkzpTjoigL38IBVSGsoMrIN37 5e8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ao8I7yh9lQNodY4ZHsI/D/j9Zk0SkiV9Y+kygLt5pz0=; b=lZ5qXYGeuhN+bqqYZP7sVqJ+K6P5sl7UwoN9cQcCg+5jihS7Ki46OIEPkcnrQTVOGk QCQcxRMSQSiUf0He5N1/7tqocg904gh9Yl2r7R6Mt8BoevK+EWshvS9eND8Z5jO4GBYa ssytfStavXY3U9UWOEz7oNn81YzZhnj7FP5/LFUWyzHcHX/PqmhKrqUB/aHpeabdjpw8 DWuDTs8wK3yyx0ofbcXkkP1ttsgI78WzzjCZLwkwo+F9JHiXQkzF2C+WXZa7FB2EDYZH OAtIkLG3DSHTn7hJdhEdyURonWogsMC6xDCry1K4GVxKuWantYOKTEhpl4m5ZutoTjwG 2ZYg== X-Gm-Message-State: AOAM530P6Rm9cV8JRFG0h226LkiGXmmy7QU9lpnA+jiEjhZa7l+VxWoN BfPdPDMAbfX4kyWuyCrpRRjoiCUCDZQ= X-Google-Smtp-Source: ABdhPJz8qrGbV8gFtWHmHCOQ3E04Fi7GMkMCqh+40COTZcq3zm8sbsqm+poBlAVEjY8liQyx5fCpsQ== X-Received: by 2002:a17:90b:23d2:: with SMTP id md18mr29528038pjb.179.1593019149481; Wed, 24 Jun 2020 10:19:09 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:08 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 02/11] net: Create netqueue.h and define NO_QUEUE Date: Wed, 24 Jun 2020 10:17:41 -0700 Message-Id: <20200624171749.11927-3-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Create linux/netqueue.h to hold generic network queue definitions. Define NO_QUEUE to replace NO_QUEUE_MAPPING in net/sock.h. NO_QUEUE can generally be used to indicate that a 16 bit queue index does not refer to a queue. Also, define net_queue_pair which will be used as a generic way to store a transmit/receive pair of network queues. --- include/linux/netdevice.h | 1 + include/linux/netqueue.h | 25 +++++++++++++++++++++++++ include/net/sock.h | 12 +++++------- net/core/filter.c | 4 ++-- 4 files changed, 33 insertions(+), 9 deletions(-) create mode 100644 include/linux/netqueue.h diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 6fc613ed8eae..bf5f2a85da97 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -32,6 +32,7 @@ #include #include #include +#include #include #include diff --git a/include/linux/netqueue.h b/include/linux/netqueue.h new file mode 100644 index 000000000000..5a4d39821ada --- /dev/null +++ b/include/linux/netqueue.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Network queue identifier definitions + * + * Copyright (c) 2020 Tom Herbert + */ + +#ifndef _LINUX_NETQUEUE_H +#define _LINUX_NETQUEUE_H + +/* Indicates no network queue is present in 16 bit queue number */ +#define NO_QUEUE USHRT_MAX + +struct net_queue_pair { + unsigned short txq_id; + unsigned short rxq_id; +}; + +static inline void init_net_queue_pair(struct net_queue_pair *qpair) +{ + qpair->rxq_id = NO_QUEUE; + qpair->txq_id = NO_QUEUE; +} + +#endif /* _LINUX_NETQUEUE_H */ diff --git a/include/net/sock.h b/include/net/sock.h index c53cc42b5ab9..acb76cfaae1b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1800,16 +1800,14 @@ static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) sk->sk_tx_queue_mapping = tx_queue; } -#define NO_QUEUE_MAPPING USHRT_MAX - static inline void sk_tx_queue_clear(struct sock *sk) { - sk->sk_tx_queue_mapping = NO_QUEUE_MAPPING; + sk->sk_tx_queue_mapping = NO_QUEUE; } static inline int sk_tx_queue_get(const struct sock *sk) { - if (sk && sk->sk_tx_queue_mapping != NO_QUEUE_MAPPING) + if (sk && sk->sk_tx_queue_mapping != NO_QUEUE) return sk->sk_tx_queue_mapping; return -1; @@ -1821,7 +1819,7 @@ static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb) if (skb_rx_queue_recorded(skb)) { u16 rx_queue = skb_get_rx_queue(skb); - if (WARN_ON_ONCE(rx_queue == NO_QUEUE_MAPPING)) + if (WARN_ON_ONCE(rx_queue == NO_QUEUE)) return; sk->sk_rx_queue_mapping = rx_queue; @@ -1832,14 +1830,14 @@ static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb) static inline void sk_rx_queue_clear(struct sock *sk) { #ifdef CONFIG_XPS - sk->sk_rx_queue_mapping = NO_QUEUE_MAPPING; + sk->sk_rx_queue_mapping = NO_QUEUE; #endif } #ifdef CONFIG_XPS static inline int sk_rx_queue_get(const struct sock *sk) { - if (sk && sk->sk_rx_queue_mapping != NO_QUEUE_MAPPING) + if (sk && sk->sk_rx_queue_mapping != NO_QUEUE) return sk->sk_rx_queue_mapping; return -1; diff --git a/net/core/filter.c b/net/core/filter.c index 73395384afe2..d696aaabe3af 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7544,7 +7544,7 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, case offsetof(struct __sk_buff, queue_mapping): if (type == BPF_WRITE) { - *insn++ = BPF_JMP_IMM(BPF_JGE, si->src_reg, NO_QUEUE_MAPPING, 1); + *insn++ = BPF_JMP_IMM(BPF_JGE, si->src_reg, NO_QUEUE, 1); *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg, bpf_target_off(struct sk_buff, queue_mapping, @@ -7981,7 +7981,7 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, sizeof_field(struct sock, sk_rx_queue_mapping), target_size)); - *insn++ = BPF_JMP_IMM(BPF_JNE, si->dst_reg, NO_QUEUE_MAPPING, + *insn++ = BPF_JMP_IMM(BPF_JNE, si->dst_reg, NO_QUEUE, 1); *insn++ = BPF_MOV64_IMM(si->dst_reg, -1); #else From patchwork Wed Jun 24 17:17:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316464 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=PdmPaahH; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVHr44Z5z9sPF for ; Thu, 25 Jun 2020 03:19:20 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405456AbgFXRTT (ORCPT ); Wed, 24 Jun 2020 13:19:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405451AbgFXRTT (ORCPT ); Wed, 24 Jun 2020 13:19:19 -0400 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF85AC061573 for ; Wed, 24 Jun 2020 10:19:18 -0700 (PDT) Received: by mail-pg1-x544.google.com with SMTP id w2so915671pgg.10 for ; Wed, 24 Jun 2020 10:19:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=70hby/uDfnlNOV84ZWyp8KNhkZ8UU7RwIhRI9s67NEc=; b=PdmPaahHcaUcSThGgBro358sZNBvviSfpxQiyxyA8ko76/Bzs5iTnuHUD+913/p8jg D2a8bC+r0tPewLCBMzWLNvMog38jKIsvyhgbBlB53uzSaJfgOYhCnfRuN9c9Zd0EXoyv ke97xb6NfZwZCoY4LfJdyZL5rKi6hshYiwhZ5QSbOETi/YtZ0PQ+z65/qk0yL/YIUCPi Fdx9jpZXlsS0Tv+sDeLHqyFFHxEc8KSCxxAlgs3YkQMuW9GXdQs295a1VJV4IbTc8OwK mKoCiSjls72lA5NvXOsxubXBEVvGBud36rbw2XWvItM9zNoDqAPgfA7GpJv3GZ2C5CTP U+Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=70hby/uDfnlNOV84ZWyp8KNhkZ8UU7RwIhRI9s67NEc=; b=UvGBNPligWpkOLUgeO6diykQFU9kwAGn4KcpUkwzEp8GiQikk8VJPH3YWhmfrg+y2S 8elyf6tROQsDibrv2r/v8tv35hdQcklCZfkHSoWyRYneoIQMTdP+TlvzmKbDns4IMmkz 67E2UWndEv+DEezpsjSUqzmHeKp6hUfK7Iwv2Ho1IFe6xPDeu9H5U0Dnga7ACr2eccaj OxbnU6TC4HUg1lxmlHHpu4cTaNUlKjg2uDsY9rNjAZDSSAoyYaflm/uLmH7OjOuEwHTY JpeT0Eb0zvIb1bg5cwDW6Wxp0F8XDxK4x5DbBuwjdWU+CwEepecX4p63ju9KALyKjR5E corw== X-Gm-Message-State: AOAM533NrtolOPjjMftYPR58mYvsTmK9/KsFtkQG25ZQwUYX0nk1dvjF 1CyJ1Crk4IlJKvrkS0IE1ReRgti9cE8= X-Google-Smtp-Source: ABdhPJzoKpvimB8ijVg2M/UozoW3bWV5HuW9QTZNuuj67H0GJk1ky50V/+WzjqK3YqVMknOaKYFB1g== X-Received: by 2002:a65:6496:: with SMTP id e22mr23292609pgv.63.1593019158077; Wed, 24 Jun 2020 10:19:18 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:17 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 03/11] arfs: Create set_arfs_queue Date: Wed, 24 Jun 2020 10:17:42 -0700 Message-Id: <20200624171749.11927-4-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Abstract out the code for steering a flow to an aRFS queue (via ndo_rx_flow_steer) into its own function. This allows the function to be called in other use cases. --- net/core/dev.c | 67 +++++++++++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 28 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 6bc2388141f6..9f7a3e78e23a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4250,42 +4250,53 @@ EXPORT_SYMBOL(rps_needed); struct static_key_false rfs_needed __read_mostly; EXPORT_SYMBOL(rfs_needed); +#ifdef CONFIG_RFS_ACCEL +static void set_arfs_queue(struct net_device *dev, struct sk_buff *skb, + struct rps_dev_flow *rflow, u16 rxq_index) +{ + struct rps_dev_flow_table *flow_table; + struct netdev_rx_queue *rxqueue; + struct rps_dev_flow *old_rflow; + u32 flow_id; + int rc; + + rxqueue = dev->_rx + rxq_index; + + flow_table = rcu_dereference(rxqueue->rps_flow_table); + if (!flow_table) + return; + + flow_id = skb_get_hash(skb) & flow_table->mask; + rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb, + rxq_index, flow_id); + if (rc < 0) + return; + + old_rflow = rflow; + rflow = &flow_table->flows[flow_id]; + rflow->filter = rc; + if (old_rflow->filter == rflow->filter) + old_rflow->filter = RPS_NO_FILTER; +} +#endif + static struct rps_dev_flow * set_rps_cpu(struct net_device *dev, struct sk_buff *skb, struct rps_dev_flow *rflow, u16 next_cpu) { if (next_cpu < nr_cpu_ids) { #ifdef CONFIG_RFS_ACCEL - struct netdev_rx_queue *rxqueue; - struct rps_dev_flow_table *flow_table; - struct rps_dev_flow *old_rflow; - u32 flow_id; - u16 rxq_index; - int rc; /* Should we steer this flow to a different hardware queue? */ - if (!skb_rx_queue_recorded(skb) || !dev->rx_cpu_rmap || - !(dev->features & NETIF_F_NTUPLE)) - goto out; - rxq_index = cpu_rmap_lookup_index(dev->rx_cpu_rmap, next_cpu); - if (rxq_index == skb_get_rx_queue(skb)) - goto out; - - rxqueue = dev->_rx + rxq_index; - flow_table = rcu_dereference(rxqueue->rps_flow_table); - if (!flow_table) - goto out; - flow_id = skb_get_hash(skb) & flow_table->mask; - rc = dev->netdev_ops->ndo_rx_flow_steer(dev, skb, - rxq_index, flow_id); - if (rc < 0) - goto out; - old_rflow = rflow; - rflow = &flow_table->flows[flow_id]; - rflow->filter = rc; - if (old_rflow->filter == rflow->filter) - old_rflow->filter = RPS_NO_FILTER; - out: + if (skb_rx_queue_recorded(skb) && dev->rx_cpu_rmap && + (dev->features & NETIF_F_NTUPLE)) { + u16 rxq_index; + + rxq_index = cpu_rmap_lookup_index(dev->rx_cpu_rmap, + next_cpu); + if (rxq_index != skb_get_rx_queue(skb)) + set_arfs_queue(dev, skb, rflow, rxq_index); + } #endif rflow->last_qtail = per_cpu(softnet_data, next_cpu).input_queue_head; From patchwork Wed Jun 24 17:17:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316465 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=A/zA8aE8; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVHw2RBnz9sPF for ; Thu, 25 Jun 2020 03:19:24 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405459AbgFXRTX (ORCPT ); Wed, 24 Jun 2020 13:19:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405391AbgFXRTX (ORCPT ); Wed, 24 Jun 2020 13:19:23 -0400 Received: from mail-pf1-x442.google.com (mail-pf1-x442.google.com [IPv6:2607:f8b0:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCB01C061573 for ; Wed, 24 Jun 2020 10:19:22 -0700 (PDT) Received: by mail-pf1-x442.google.com with SMTP id q17so1433615pfu.8 for ; Wed, 24 Jun 2020 10:19:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=sqicd+21/6vyqmBwMtIT4lwIz534aUuSXUaVL318pzE=; b=A/zA8aE8oRFQllAOhI3c/+Ei1r4WrWirmQP/9z4mnJ7ByX5DuG8vDewRtTy0CZIMBU 7bdWKz4HOulVJupx3uQRzkJpbWvI7reQoDh5QWzhqpyfHv0tU05AR71HmneYsp6WRy/Z Is6wGSrUqPuM4JzA79zZWhIcSlhfY5mN9Po319Pq6YQ8b7PDpMrKkm1Yw/xAjJfHUVuI lT3j/H4/1CG2pUn8xWE67eTzkSurVO1Qry1ut1fuP+1hgoVXAIlOfxrpYj1m4X0m3+jN F3w86eyr+WQ1b/X+iMT45rcP2Sy4xMZto3sdIvdQfWazNa5V8xsdGGGkdNPtDJdUGGA/ x3Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sqicd+21/6vyqmBwMtIT4lwIz534aUuSXUaVL318pzE=; b=kavLAdqJ/fO2A3obsy9/J8Xoj6lbslB4b1/A/56puX/K6pYE0zRF2qJi1F+5mcOJ5x Jv2x+9/2Ph3x98zXBv31byX21qY4YcRUX2Zs77WJOLPqPxzWwL9PYLNVrnBCx3FDxTpy WfCeC5Pj+dAY/Tt880e2TqGRoGrjllG2fBoUbYlBFr9G+u88R9pCkEtYaEwsvKFFPsqr BSyknVYv/HYAUYiAx9zr0HT39bk9TtRFhUUIzj2T+BFuTEIOe2RaJb3nTbxCxxMLN7kx anOt5lgNY4GAq1tXQ4ELYVILNe0anMgJRtARgaRQDgnyXBS/WnVZpVJIwq0UuojMpKnS N61Q== X-Gm-Message-State: AOAM533+f/CgjwwR7YipiWuVPSIGrHNyhySwkfdO60ngPmMOrcys0X9a VkkKUjdnK2rVTnrRAtZPlV7ptWyzPoE= X-Google-Smtp-Source: ABdhPJzvU/Rr/WaOpr+VifMasEdAaxNBaA1XZUnHtNde3jeQzlhlEJoL9VXWV3z1cycV/Al8s8snkw== X-Received: by 2002:a63:5761:: with SMTP id h33mr22288890pgm.175.1593019161885; Wed, 24 Jun 2020 10:19:21 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:21 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 04/11] net-sysfs: Create rps_create_sock_flow_table Date: Wed, 24 Jun 2020 10:17:43 -0700 Message-Id: <20200624171749.11927-5-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Move code for writing a sock_flow_table to its own function so that it can be called for other use cases. --- net/core/sysctl_net_core.c | 102 +++++++++++++++++++++---------------- 1 file changed, 57 insertions(+), 45 deletions(-) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index f93f8ace6c56..9c7d46fbb75a 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -46,66 +46,78 @@ int sysctl_devconf_inherit_init_net __read_mostly; EXPORT_SYMBOL(sysctl_devconf_inherit_init_net); #ifdef CONFIG_RPS +static int rps_create_sock_flow_table(size_t size, size_t orig_size, + struct rps_sock_flow_table *orig_table, + bool force) +{ + struct rps_sock_flow_table *sock_table; + int i; + + if (size) { + if (size > 1 << 29) { + /* Enforce limit to prevent overflow */ + return -EINVAL; + } + size = roundup_pow_of_two(size); + if (size != orig_size || force) { + sock_table = vmalloc(RPS_SOCK_FLOW_TABLE_SIZE(size)); + if (!sock_table) + return -ENOMEM; + + sock_table->mask = size - 1; + } else { + sock_table = orig_table; + } + + for (i = 0; i < size; i++) + sock_table->ents[i] = RPS_NO_CPU; + } else { + sock_table = NULL; + } + + if (sock_table != orig_table) { + rcu_assign_pointer(rps_sock_flow_table, sock_table); + if (sock_table) { + static_branch_inc(&rps_needed); + static_branch_inc(&rfs_needed); + } + if (orig_table) { + static_branch_dec(&rps_needed); + static_branch_dec(&rfs_needed); + synchronize_rcu(); + vfree(orig_table); + } + } + + return 0; +} + +static DEFINE_MUTEX(sock_flow_mutex); + static int rps_sock_flow_sysctl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { unsigned int orig_size, size; - int ret, i; + int ret; struct ctl_table tmp = { .data = &size, .maxlen = sizeof(size), .mode = table->mode }; - struct rps_sock_flow_table *orig_sock_table, *sock_table; - static DEFINE_MUTEX(sock_flow_mutex); + struct rps_sock_flow_table *sock_table; mutex_lock(&sock_flow_mutex); - orig_sock_table = rcu_dereference_protected(rps_sock_flow_table, - lockdep_is_held(&sock_flow_mutex)); - size = orig_size = orig_sock_table ? orig_sock_table->mask + 1 : 0; + sock_table = rcu_dereference_protected(rps_sock_flow_table, + lockdep_is_held(&sock_flow_mutex)); + size = sock_table ? sock_table->mask + 1 : 0; + orig_size = size; ret = proc_dointvec(&tmp, write, buffer, lenp, ppos); - if (write) { - if (size) { - if (size > 1<<29) { - /* Enforce limit to prevent overflow */ - mutex_unlock(&sock_flow_mutex); - return -EINVAL; - } - size = roundup_pow_of_two(size); - if (size != orig_size) { - sock_table = - vmalloc(RPS_SOCK_FLOW_TABLE_SIZE(size)); - if (!sock_table) { - mutex_unlock(&sock_flow_mutex); - return -ENOMEM; - } - rps_cpu_mask = roundup_pow_of_two(nr_cpu_ids) - 1; - sock_table->mask = size - 1; - } else - sock_table = orig_sock_table; - - for (i = 0; i < size; i++) - sock_table->ents[i] = RPS_NO_CPU; - } else - sock_table = NULL; - - if (sock_table != orig_sock_table) { - rcu_assign_pointer(rps_sock_flow_table, sock_table); - if (sock_table) { - static_branch_inc(&rps_needed); - static_branch_inc(&rfs_needed); - } - if (orig_sock_table) { - static_branch_dec(&rps_needed); - static_branch_dec(&rfs_needed); - synchronize_rcu(); - vfree(orig_sock_table); - } - } - } + if (write) + ret = rps_create_sock_flow_table(size, orig_size, + sock_table, false); mutex_unlock(&sock_flow_mutex); From patchwork Wed Jun 24 17:17:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316466 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=sPl29O/m; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVJ40Yfsz9sPF for ; Thu, 25 Jun 2020 03:19:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405464AbgFXRTb (ORCPT ); Wed, 24 Jun 2020 13:19:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405391AbgFXRTa (ORCPT ); Wed, 24 Jun 2020 13:19:30 -0400 Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7BE8FC061573 for ; Wed, 24 Jun 2020 10:19:30 -0700 (PDT) Received: by mail-pl1-x644.google.com with SMTP id f2so1299217plr.8 for ; Wed, 24 Jun 2020 10:19:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ioFTKN5RCu0a7L5Oh0HoLLLpAlODISnRUTDIV10EYKA=; b=sPl29O/mx8F2uru+TrWX4LXSKhPye0yGhnAtyug/AWAjyMQ1adSirI0F3rUxBiRR4q SaBuVzauWxsQb2E4PN8v+WwpShaXwmE2H6wmR/vLiFpdtWsUuxPQc6ul2PHEY9S9OxJ6 15hSnidYFjpOVMlffl5gbJBqNjSBmFqw+vaURBXJZRXhLdzqYNvvvR6jA4jtmFw+Ldf6 Jztt/BytR6iSZLsjhfrciRTbnO128ecpgA6gPRaYpXPMBmx2SVynFUqQ/ZarNunygaWj lxX87F4AHXqjg0+6jzPdydam85lKiaU7fKBQJsCMuAhnXc33DfTEvN+aX/kB36Jnft0Y OdJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ioFTKN5RCu0a7L5Oh0HoLLLpAlODISnRUTDIV10EYKA=; b=jKYk4qeRpua6YrsojBKwXYkjuOYEi5Ck+TkTL6vYjTHGsirq6Xl12j+GfsbLsFUBdT bLa/grEwmTlXgO36wwwOn31BklOf7uttNyvnRjRbmUhPeRrbrKWEOkkZOf7wHOlmYZza t7Xi4jdx2X62KUaWBxoTH6UK6CU9sUI1MBjGpaBoE6c/7LwRPu4FmIRlJKujfZS/N8ZM +FVRAk0urIgRhFgLa0s65Kh/G2kZHdTMG97xlFSRaEOyAy7fIc72Dq8VcAqqC8riEO8q GmZawEl5A/eLOnGAm4yuTtRHnGBtXqOPrz3ZcKSQKpI43jqPv34gyTiC+Ho9N+zQV8Ze e92Q== X-Gm-Message-State: AOAM531pePoNLjKandnyiMU9KKVxCbHMa3yHazl6Ua4a+27uKfDbxKtc TWW3SmSIpQPKnAG/cy9SvOi/KGqOzYM= X-Google-Smtp-Source: ABdhPJxK2iyD66bnB4O7ENYc0JXMaGhK8R1kOrrOMADFEwYYmGMzvheXxkTsDozG+Ebp66MI19LCTg== X-Received: by 2002:a17:902:c082:: with SMTP id j2mr29319573pld.175.1593019169376; Wed, 24 Jun 2020 10:19:29 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:28 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 05/11] net: Infrastructure for per queue aRFS Date: Wed, 24 Jun 2020 10:17:44 -0700 Message-Id: <20200624171749.11927-6-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Infrastructure changes to allow aRFS to be based on Per Thread Queues instead of just CPU. The basic change is to create a field in rps_dev_flow to hold either a CPU or a queue index (not just a CPU that is). Changes include: - Replace u16 cpu field in rps_dev_flow structure with rps_cpu_qid structure that contains either a CPU or a device queue index. Note the structure is still sixteen bits - Helper functions to clear and set the cpu in the rps_cpu_qid of rps_dev_flow - Create a sock_masks structure that contains the partition of the thirty-two bit entry in rps_sock_flow_table. The structure contains two masks, one to extract the upper bits of the hash and one to extract the CPU number or queue index - Replace rps_cpu_mask with sock_masks from rps_sock_flow_table - Add rps_max_num_queues which will be used when creating sock_masks for queue entries in rps_sock_flow_table --- include/linux/netdevice.h | 94 +++++++++++++++++++++++++++++++++----- net/core/dev.c | 47 ++++++++++++------- net/core/net-sysfs.c | 2 +- net/core/sysctl_net_core.c | 6 ++- 4 files changed, 119 insertions(+), 30 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index bf5f2a85da97..d528aa61fea3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -674,18 +674,65 @@ struct rps_map { }; #define RPS_MAP_SIZE(_num) (sizeof(struct rps_map) + ((_num) * sizeof(u16))) +/* The rps_cpu_qid structure is sixteen bits and holds either a CPU number or + * a queue index. The use_qid field specifies which type of value is set (i.e. + * if use_qid is 1 then cpu_qid contains a fifteen bit queue identifier, and if + * use_qid is 0 then cpu_qid contains a fifteen bit CPU number). No entry is + * signified by RPS_NO_CPU_QID in val which is set to NO_QUEUE (0xffff). So the + * range of CPU numbers that can be stored is 0..32,767 (0x7fff) and the range + * of queue identifiers is 0..32,766. Note that CPU numbers are limited by + * CONFIG_NR_CPUS which currently has a maximum supported value of 8,192 (per + * arch/x86/Kconfig), so WARN_ON is used to check that a CPU number is less + * than 0x8000 when setting the cpu in rps_cpu_qid. The queue index is limited + * by configuration. + */ +struct rps_cpu_qid { + union { + u16 val; + struct { + u16 use_qid: 1; + union { + u16 cpu: 15; + u16 qid: 15; + }; + }; + }; +}; + +#define RPS_NO_CPU_QID NO_QUEUE /* No CPU or qid in rps_cpu_qid */ +#define RPS_MAX_CPU 0x7fff /* Maximum cpu in rps_cpu_qid */ +#define RPS_MAX_QID 0x7ffe /* Maximum qid in rps_cpu_qid */ + /* * The rps_dev_flow structure contains the mapping of a flow to a CPU, the * tail pointer for that CPU's input queue at the time of last enqueue, and * a hardware filter index. */ struct rps_dev_flow { - u16 cpu; + struct rps_cpu_qid cpu_qid; u16 filter; unsigned int last_qtail; }; #define RPS_NO_FILTER 0xffff +static inline void rps_dev_flow_clear(struct rps_dev_flow *dev_flow) +{ + dev_flow->cpu_qid.val = RPS_NO_CPU_QID; +} + +static inline void rps_dev_flow_set_cpu(struct rps_dev_flow *dev_flow, u16 cpu) +{ + struct rps_cpu_qid cpu_qid; + + if (WARN_ON(cpu > RPS_MAX_CPU)) + return; + + /* Set the rflow target to the CPU atomically */ + cpu_qid.use_qid = 0; + cpu_qid.cpu = cpu; + dev_flow->cpu_qid = cpu_qid; +} + /* * The rps_dev_flow_table structure contains a table of flow mappings. */ @@ -697,34 +744,57 @@ struct rps_dev_flow_table { #define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \ ((_num) * sizeof(struct rps_dev_flow))) +struct rps_sock_masks { + u32 mask; + u32 hash_mask; +}; + /* - * The rps_sock_flow_table contains mappings of flows to the last CPU - * on which they were processed by the application (set in recvmsg). - * Each entry is a 32bit value. Upper part is the high-order bits - * of flow hash, lower part is CPU number. - * rps_cpu_mask is used to partition the space, depending on number of - * possible CPUs : rps_cpu_mask = roundup_pow_of_two(nr_cpu_ids) - 1 - * For example, if 64 CPUs are possible, rps_cpu_mask = 0x3f, - * meaning we use 32-6=26 bits for the hash. + * The rps_sock_flow_table contains mappings of flows to the last CPU on which + * they were processed by the application (set in recvmsg), or the mapping of + * the flow to a per thread queue for the application. Each entry is a 32bit + * value. The high order bit indicates whether a CPU number or a queue index is + * stored. The next high-order bits contain the flow hash, and the lower bits + * contain the CPU number or queue index. The sock_flow table contains two + * sets of masks, one for CPU entries (cpu_masks) and one for queue entries + * (queue_masks), that are to used partition the space between the hash bits + * and the CPU number or queue index. For the cpu masks, cpu_masks.mask is set + * to roundup_pow_of_two(nr_cpu_ids) - 1 and the corresponding hash mask, + * cpu_masks.hash_mask, is set to (~cpu_masks.mask & ~RPS_SOCK_FLOW_USE_QID). + * For example, if 64 CPUs are possible, cpu_masks.mask == 0x3f, meaning we use + * 31-6=25 bits for the hash (so cpu_masks.hash_mask == 0x7fffffc0). Similarly, + * queue_masks in rps_sock_flow_table is used to partition the space when a + * queue index is present. */ struct rps_sock_flow_table { u32 mask; + struct rps_sock_masks cpu_masks; + struct rps_sock_masks queue_masks; u32 ents[] ____cacheline_aligned_in_smp; }; #define RPS_SOCK_FLOW_TABLE_SIZE(_num) (offsetof(struct rps_sock_flow_table, ents[_num])) -#define RPS_NO_CPU 0xffff +#define RPS_SOCK_FLOW_USE_QID (1 << 31) +#define RPS_SOCK_FLOW_NO_IDENT -1U -extern u32 rps_cpu_mask; extern struct rps_sock_flow_table __rcu *rps_sock_flow_table; +extern unsigned int rps_max_num_queues; + +static inline void rps_init_sock_masks(struct rps_sock_masks *masks, u32 num) +{ + u32 mask = roundup_pow_of_two(num) - 1; + + masks->mask = mask; + masks->hash_mask = (~mask & ~RPS_SOCK_FLOW_USE_QID); +} static inline void rps_record_sock_flow(struct rps_sock_flow_table *table, u32 hash) { if (table && hash) { + u32 val = hash & table->cpu_masks.hash_mask; unsigned int index = hash & table->mask; - u32 val = hash & ~rps_cpu_mask; /* We only give a hint, preemption can change CPU under us */ val |= raw_smp_processor_id(); diff --git a/net/core/dev.c b/net/core/dev.c index 9f7a3e78e23a..946940bdd583 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4242,8 +4242,7 @@ static inline void ____napi_schedule(struct softnet_data *sd, /* One global table that all flow-based protocols share. */ struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly; EXPORT_SYMBOL(rps_sock_flow_table); -u32 rps_cpu_mask __read_mostly; -EXPORT_SYMBOL(rps_cpu_mask); +unsigned int rps_max_num_queues; struct static_key_false rps_needed __read_mostly; EXPORT_SYMBOL(rps_needed); @@ -4302,7 +4301,7 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb, per_cpu(softnet_data, next_cpu).input_queue_head; } - rflow->cpu = next_cpu; + rps_dev_flow_set_cpu(rflow, next_cpu); return rflow; } @@ -4349,22 +4348,39 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, sock_flow_table = rcu_dereference(rps_sock_flow_table); if (flow_table && sock_flow_table) { + u32 next_cpu, comparator, ident; struct rps_dev_flow *rflow; - u32 next_cpu; - u32 ident; /* First check into global flow table if there is a match */ ident = sock_flow_table->ents[hash & sock_flow_table->mask]; - if ((ident ^ hash) & ~rps_cpu_mask) - goto try_rps; + comparator = ((ident & RPS_SOCK_FLOW_USE_QID) ? + sock_flow_table->queue_masks.hash_mask : + sock_flow_table->cpu_masks.hash_mask); - next_cpu = ident & rps_cpu_mask; + if ((ident ^ hash) & comparator) + goto try_rps; /* OK, now we know there is a match, * we can look at the local (per receive queue) flow table */ rflow = &flow_table->flows[hash & flow_table->mask]; - tcpu = rflow->cpu; + + /* The flow_sock entry may refer to either a queue or a + * CPU. Proceed accordingly. + */ + if (ident & RPS_SOCK_FLOW_USE_QID) { + /* A queue identifier is in the sock_flow_table entry */ + + /* Don't use aRFS to set CPU in this case, skip to + * trying RPS + */ + goto try_rps; + } + + /* A CPU number is in the sock_flow_table entry */ + + next_cpu = ident & sock_flow_table->cpu_masks.mask; + tcpu = rflow->cpu_qid.use_qid ? NO_QUEUE : rflow->cpu_qid.cpu; /* * If the desired CPU (where last recvmsg was done) is @@ -4396,10 +4412,8 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, if (map) { tcpu = map->cpus[reciprocal_scale(hash, map->len)]; - if (cpu_online(tcpu)) { + if (cpu_online(tcpu)) cpu = tcpu; - goto done; - } } done: @@ -4424,17 +4438,18 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, { struct netdev_rx_queue *rxqueue = dev->_rx + rxq_index; struct rps_dev_flow_table *flow_table; + struct rps_cpu_qid cpu_qid; struct rps_dev_flow *rflow; bool expire = true; - unsigned int cpu; rcu_read_lock(); flow_table = rcu_dereference(rxqueue->rps_flow_table); if (flow_table && flow_id <= flow_table->mask) { rflow = &flow_table->flows[flow_id]; - cpu = READ_ONCE(rflow->cpu); - if (rflow->filter == filter_id && cpu < nr_cpu_ids && - ((int)(per_cpu(softnet_data, cpu).input_queue_head - + cpu_qid = READ_ONCE(rflow->cpu_qid); + if (rflow->filter == filter_id && !cpu_qid.use_qid && + cpu_qid.cpu < nr_cpu_ids && + ((int)(per_cpu(softnet_data, cpu_qid.cpu).input_queue_head - rflow->last_qtail) < (int)(10 * flow_table->mask))) expire = false; diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index e353b822bb15..56d27463d466 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -858,7 +858,7 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue, table->mask = mask; for (count = 0; count <= mask; count++) - table->flows[count].cpu = RPS_NO_CPU; + rps_dev_flow_clear(&table->flows[count]); } else { table = NULL; } diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 9c7d46fbb75a..d09471f29d89 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -65,12 +65,16 @@ static int rps_create_sock_flow_table(size_t size, size_t orig_size, return -ENOMEM; sock_table->mask = size - 1; + rps_init_sock_masks(&sock_table->cpu_masks, + nr_cpu_ids); + rps_init_sock_masks(&sock_table->queue_masks, + rps_max_num_queues); } else { sock_table = orig_table; } for (i = 0; i < size; i++) - sock_table->ents[i] = RPS_NO_CPU; + sock_table->ents[i] = RPS_NO_CPU_QID; } else { sock_table = NULL; } From patchwork Wed Jun 24 17:17:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316467 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=uufpB6pM; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVJ91W4mz9sPF for ; Thu, 25 Jun 2020 03:19:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405473AbgFXRTg (ORCPT ); Wed, 24 Jun 2020 13:19:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405391AbgFXRTf (ORCPT ); Wed, 24 Jun 2020 13:19:35 -0400 Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com [IPv6:2607:f8b0:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86807C061573 for ; Wed, 24 Jun 2020 10:19:35 -0700 (PDT) Received: by mail-pl1-x641.google.com with SMTP id n2so1287119pld.13 for ; Wed, 24 Jun 2020 10:19:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aprpFk4khDRXI59E9proW6IIfXD4QomUQ0QEthUeBSw=; b=uufpB6pMuP62Patz3Po1Hh2iXyCXTjGJ2SpyczCe/aq45/2TRzl58NvZJIaFlnWKyj ACy4hVXwOMo9qp7zs4Ba8al3+aKPrjKp7gJYaIXiejcinzFNeoeudmHdlwFbcXWrDcEJ m49DN91Rz/+5T3KYJYPzRPHRh5UkRbbNvMD2h+DhfPb6OhB93OscSN5vddF5xbeBRryl CbXT0Ym+sMav/rJZkNWNPUcz5/kTwttXZVZV4XepVwXxgbkiPAUdr6Xp9lqEJ+id/EeY rBQ27+gt/A2VlCd7XBmfVqXe1GgGYI7DJYq0oPTtDYrk0rkpV+/eymMOdoafNsumEtch 5AUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aprpFk4khDRXI59E9proW6IIfXD4QomUQ0QEthUeBSw=; b=tLjI1/RpWzf8rlMp+3YNiAHlqK47+OVOvdPPEwXHpkLh5E9bGLQpc3f96n8mxBxN3X jInwqY7D46x5Fto1kfOVF/6rWTZRMKii5mBOlle6lVIavcx11nV1zNNx+VMTVoshhGer m8XLBIE3xI+zLtWKBIeTKy/ON9TtSMxyCDf5bYtsOjL1Wou6uqMrCKvotFTBlKzYcMQ0 j9Uaf5ryAtBHkQKT9USVuG8vQ36iqx0MzTTnDg12if/0+5YTKWIMODhXIJxiEJpq34Oo II1rUBLmCtind4gL+ZR94WE/NEu9LiF3fz/LsceoJztuSU6JwXDKDWtgrbMfdWLxWtjC OYww== X-Gm-Message-State: AOAM5301VncHQHbIdkmCsJGg0xu8TK/+IWBY3fzqGoagg+HmfTtXSjmp jtM7xPyALUgfCLKeXqYrsllz7x+jg5c= X-Google-Smtp-Source: ABdhPJw/ICN79rSM9/8oHI/nd9ryhmGFa4U78DphjGdl9BujMC2rSjLZ66+uTC3+U8w0Z3obR9Glsw== X-Received: by 2002:a17:90a:266f:: with SMTP id l102mr4580583pje.144.1593019174582; Wed, 24 Jun 2020 10:19:34 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:33 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 06/11] net: Function to check against maximum number for RPS queues Date: Wed, 24 Jun 2020 10:17:45 -0700 Message-Id: <20200624171749.11927-7-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add rps_check_max_queues function which checks is the input number is greater than rps_max_num_queues. If it is then set max_num_queues to the value and recreating the sock_flow_table to update the queue masks used in table entries. --- include/linux/netdevice.h | 10 ++++++++ net/core/sysctl_net_core.c | 48 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d528aa61fea3..48ba1c1fc644 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -804,6 +804,16 @@ static inline void rps_record_sock_flow(struct rps_sock_flow_table *table, } } +int __rps_check_max_queues(unsigned int idx); + +static inline int rps_check_max_queues(unsigned int idx) +{ + if (idx < rps_max_num_queues) + return 0; + + return __rps_check_max_queues(idx); +} + #ifdef CONFIG_RFS_ACCEL bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id, u16 filter_id); diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index d09471f29d89..743c46148135 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -127,6 +127,54 @@ static int rps_sock_flow_sysctl(struct ctl_table *table, int write, return ret; } + +int __rps_check_max_queues(unsigned int idx) +{ + unsigned int old; + size_t size; + int ret = 0; + + /* Assume maximum queues should be a least the number of CPUs. + * This avoids too much thrashing of the sock flow table at + * initialization. + */ + if (idx < nr_cpu_ids && nr_cpu_ids < RPS_MAX_QID) + idx = nr_cpu_ids; + + if (idx > RPS_MAX_QID) + return -EINVAL; + + mutex_lock(&sock_flow_mutex); + + old = rps_max_num_queues; + rps_max_num_queues = idx; + + /* No need to reallocate table since nothing is changing */ + + if (roundup_pow_of_two(old) != roundup_pow_of_two(idx)) { + struct rps_sock_flow_table *sock_table; + + sock_table = rcu_dereference_protected(rps_sock_flow_table, + lockdep_is_held(&sock_flow_mutex)); + size = sock_table ? sock_table->mask + 1 : 0; + + /* Force creation of a new rps_sock_flow_table. It's + * the same size as the existing table, but we expunge + * any stale queue entries that would refer to the old + * queue mask. + */ + ret = rps_create_sock_flow_table(size, size, + sock_table, true); + if (ret) + rps_max_num_queues = old; + } + + mutex_unlock(&sock_flow_mutex); + + return ret; +} +EXPORT_SYMBOL(__rps_check_max_queues); + #endif /* CONFIG_RPS */ #ifdef CONFIG_NET_FLOW_LIMIT From patchwork Wed Jun 24 17:17:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316468 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=cwsVEsDd; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVJG0BlYz9sRR for ; Thu, 25 Jun 2020 03:19:42 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405484AbgFXRTk (ORCPT ); Wed, 24 Jun 2020 13:19:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405475AbgFXRTi (ORCPT ); Wed, 24 Jun 2020 13:19:38 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0F39C061795 for ; Wed, 24 Jun 2020 10:19:37 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id ne5so1409257pjb.5 for ; Wed, 24 Jun 2020 10:19:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uWEQN00J+EsTD7vc+2ZqKDA1RdwxuktjAkTxSvFMBrY=; b=cwsVEsDd05a8SHzQwLET/o5+fbJx1bp956sls0WQ9hI9wfDNkFYd/LSlHTtou+AgMB nFQcuj06TeKnXxCXTFtiKbCHqqoiMJnkhL90xFPQNcSC0NOkLUzzWeSoaavoDrJwP7dD JiGSZKX5sghpjnKJ1Y8Hg6WTyBFWr5RbM4TIgcFtVGNflSQlqOlHmMs20V7rWMVNbmQ9 UAR1RHtdeYpJQVBJ5k0w4NVEppPJeQ/8ZD8S91Df0tC1kG6s2paWYkufe+Y5HkYMXe2F pTkcFf4opTAXSI7H2D6Ckjgb+FHSbn5TzGjPSX79QEwP6pCI8QoUxMSy6yUA865mlMRg 5a7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uWEQN00J+EsTD7vc+2ZqKDA1RdwxuktjAkTxSvFMBrY=; b=GvPmuwihmYoVHQ3bukbB+cpgW4xCASez3ILlrdEYxrkDO/hS2tT5g/gLOSxaRBf+mf TyI2WXKt3IUD3/XWzvCbMaLaQ8yyifNF82/eDnfraOsm6573BfLB2hp1s5FpG7Au8c3G /aiHyKtxrwPV/nF9Z/r/NJi4y/jYQ28v0m2v84z2KZld4DtN3hDyuWjSXvseDLnKgosv NnhJb4hMox4D8rhoTLIIN5/PLXCQJmdWPMhmK7DWlSKs99Gmvcdy9tAgPfdFLPk5oNH2 uxrNgk3oAHrtIIkxQlmMPMCWdQyMy6UCTbWEgt/AUF9mPeLrqxDdbSGn0rrtqBwqnA12 vyPQ== X-Gm-Message-State: AOAM530a5tYMABAn7+eYiujWee9pCSaTKAbcPp6hA1Ju49A7vA9rhmtX pojTG96fsTdb0KHjQWmRuxQO7VtrHwQ= X-Google-Smtp-Source: ABdhPJy7ytS2Ni71K6mk+qkz7jf5KuIOMyPQ/xUACDDcqjCjZ36nNbnuzWv8T4t6N9aUvAbPxUEPvw== X-Received: by 2002:a17:90a:4f4b:: with SMTP id w11mr29868537pjl.11.1593019176774; Wed, 24 Jun 2020 10:19:36 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:36 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 07/11] net: Introduce global queues Date: Wed, 24 Jun 2020 10:17:46 -0700 Message-Id: <20200624171749.11927-8-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Global queues, or gqids, are an abstract representation of NIC device queues. They are global in the sense that the each gqid can be map to a queue in each device, i.e. if there are multiple devices in the system, a gqid can map to a different queue, a dqid, in each device in a one to many mapping. gqids are used for configuring packet steering on both send and receive in a generic way not bound to a particular device. Each transmit or receive device queue may be reversed mapped to one gqid. Each device maintains a table mapping gqids to local device queues, those tables are used in the data path to convert a gqid receive or transmit queue into a device queue relative to the sending or receiving device. Changes in the patch: - Add a simple index to netdev_queue and netdev_rx_queue This serves as the dqid (it's just the index in the receive or transmit queue array for the device) - Add gqid to netdev_queue and netdev_rx_queue. This is the mapping of a device queue to gqid. If gqid is NO_QUEUE then the gqid is unmapped - The per device gqid to dqid maps are maintained in an array of netdev_queue_map structures in a net_devce for both transmit and receive - Functions that return a dqid where input is gqid and a net_device - Sysfs to set device queue mappings in global_queue_mapping attribyte of the sysfs rx- and tx- queue directory - Create per device gqid to dqid maps in the sysfs function --- include/linux/netdevice.h | 75 ++++++++++++++ net/core/dev.c | 20 +++- net/core/net-sysfs.c | 199 +++++++++++++++++++++++++++++++++++++- 3 files changed, 290 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 48ba1c1fc644..ca163925211a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -606,6 +606,10 @@ struct netdev_queue { #endif #if defined(CONFIG_XPS) && defined(CONFIG_NUMA) int numa_node; +#endif +#ifdef CONFIG_RPS + u16 index; + u16 gqid; #endif unsigned long tx_maxrate; /* @@ -823,6 +827,8 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index, u32 flow_id, /* This structure contains an instance of an RX queue. */ struct netdev_rx_queue { #ifdef CONFIG_RPS + u16 index; + u16 gqid; struct rps_map __rcu *rps_map; struct rps_dev_flow_table __rcu *rps_flow_table; #endif @@ -875,6 +881,25 @@ struct xps_dev_maps { #endif /* CONFIG_XPS */ +#ifdef CONFIG_RPS +/* Structure to map a global queue to a device queue */ +struct netdev_queue_map { + struct rcu_head rcu; + unsigned int max_ents; + unsigned int set_count; + u16 map[0]; +}; + +/* Allocate queue map in blocks to avoid thrashing */ +#define QUEUE_MAP_ALLOC_BLOCK 128 + +#define QUEUE_MAP_ALLOC_NUMBER(_num) \ + ((((_num - 1) / QUEUE_MAP_ALLOC_BLOCK) + 1) * QUEUE_MAP_ALLOC_BLOCK) + +#define QUEUE_MAP_ALLOC_SIZE(_num) (sizeof(struct netdev_queue_map) + \ + (_num) * sizeof(u16)) +#endif /* CONFIG_RPS */ + #define TC_MAX_QUEUE 16 #define TC_BITMASK 15 /* HW offloaded queuing disciplines txq count and offset maps */ @@ -2092,6 +2117,10 @@ struct net_device { rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; +#ifdef CONFIG_RPS + struct netdev_queue_map __rcu *rx_gqueue_map; +#endif + #ifdef CONFIG_NET_CLS_ACT struct mini_Qdisc __rcu *miniq_ingress; #endif @@ -2122,6 +2151,9 @@ struct net_device { struct xps_dev_maps __rcu *xps_cpus_map; struct xps_dev_maps __rcu *xps_rxqs_map; #endif +#ifdef CONFIG_RPS + struct netdev_queue_map __rcu *tx_gqueue_map; +#endif #ifdef CONFIG_NET_CLS_ACT struct mini_Qdisc __rcu *miniq_egress; #endif @@ -2218,6 +2250,36 @@ struct net_device { }; #define to_net_dev(d) container_of(d, struct net_device, dev) +#ifdef CONFIG_RPS +static inline u16 netdev_gqid_to_dqid(const struct netdev_queue_map *map, + u16 gqid) +{ + return (map && gqid < map->max_ents) ? map->map[gqid] : NO_QUEUE; +} + +static inline u16 netdev_tx_gqid_to_dqid(const struct net_device *dev, u16 gqid) +{ + u16 dqid; + + rcu_read_lock(); + dqid = netdev_gqid_to_dqid(rcu_dereference(dev->tx_gqueue_map), gqid); + rcu_read_unlock(); + + return dqid; +} + +static inline u16 netdev_rx_gqid_to_dqid(const struct net_device *dev, u16 gqid) +{ + u16 dqid; + + rcu_read_lock(); + dqid = netdev_gqid_to_dqid(rcu_dereference(dev->rx_gqueue_map), gqid); + rcu_read_unlock(); + + return dqid; +} +#endif + static inline bool netif_elide_gro(const struct net_device *dev) { if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog) @@ -2290,6 +2352,19 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev, f(dev, &dev->_tx[i], arg); } +static inline void netdev_for_each_tx_queue_index(struct net_device *dev, + void (*f)(struct net_device *, + struct netdev_queue *, + unsigned int index, + void *), + void *arg) +{ + unsigned int i; + + for (i = 0; i < dev->num_tx_queues; i++) + f(dev, &dev->_tx[i], i, arg); +} + #define netdev_lockdep_set_classes(dev) \ { \ static struct lock_class_key qdisc_tx_busylock_key; \ diff --git a/net/core/dev.c b/net/core/dev.c index 946940bdd583..f64bf6608775 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9331,6 +9331,10 @@ static int netif_alloc_rx_queues(struct net_device *dev) for (i = 0; i < count; i++) { rx[i].dev = dev; +#ifdef CONFIG_RPS + rx[i].index = i; + rx[i].gqid = NO_QUEUE; +#endif /* XDP RX-queue setup */ err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i); @@ -9363,7 +9367,8 @@ static void netif_free_rx_queues(struct net_device *dev) } static void netdev_init_one_queue(struct net_device *dev, - struct netdev_queue *queue, void *_unused) + struct netdev_queue *queue, + unsigned int index, void *_unused) { /* Initialize queue lock */ spin_lock_init(&queue->_xmit_lock); @@ -9371,6 +9376,10 @@ static void netdev_init_one_queue(struct net_device *dev, queue->xmit_lock_owner = -1; netdev_queue_numa_node_write(queue, NUMA_NO_NODE); queue->dev = dev; +#ifdef CONFIG_RPS + queue->index = index; + queue->gqid = NO_QUEUE; +#endif #ifdef CONFIG_BQL dql_init(&queue->dql, HZ); #endif @@ -9396,7 +9405,7 @@ static int netif_alloc_netdev_queues(struct net_device *dev) dev->_tx = tx; - netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL); + netdev_for_each_tx_queue_index(dev, netdev_init_one_queue, NULL); spin_lock_init(&dev->tx_global_lock); return 0; @@ -9884,7 +9893,7 @@ struct netdev_queue *dev_ingress_queue_create(struct net_device *dev) queue = kzalloc(sizeof(*queue), GFP_KERNEL); if (!queue) return NULL; - netdev_init_one_queue(dev, queue, NULL); + netdev_init_one_queue(dev, queue, 0, NULL); RCU_INIT_POINTER(queue->qdisc, &noop_qdisc); queue->qdisc_sleeping = &noop_qdisc; rcu_assign_pointer(dev->ingress_queue, queue); @@ -10041,6 +10050,11 @@ void free_netdev(struct net_device *dev) { struct napi_struct *p, *n; +#ifdef CONFIG_RPS + WARN_ON(rcu_dereference_protected(dev->tx_gqueue_map, 1)); + WARN_ON(rcu_dereference_protected(dev->rx_gqueue_map, 1)); +#endif + might_sleep(); netif_free_tx_queues(dev); netif_free_rx_queues(dev); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 56d27463d466..3a9d3d9ee8e0 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -875,18 +875,166 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue, return len; } +static void queue_map_release(struct rcu_head *rcu) +{ + struct netdev_queue_map *q_map = container_of(rcu, + struct netdev_queue_map, rcu); + vfree(q_map); +} + +static int set_device_queue_mapping(struct netdev_queue_map **pmap, + u16 gqid, u16 dqid, u16 *p_gqid) +{ + static DEFINE_MUTEX(global_mapping_table); + struct netdev_queue_map *gq_map, *old_gq_map; + u16 old_gqid; + int ret = 0; + + mutex_lock(&global_mapping_table); + + old_gqid = *p_gqid; + if (old_gqid == gqid) { + /* Nothing changing */ + goto out; + } + + gq_map = rcu_dereference_protected(*pmap, + lockdep_is_held(&global_mapping_table)); + old_gq_map = gq_map; + + if (gqid == NO_QUEUE) { + /* Remove any old mapping (we know that old_gqid cannot be + * NO_QUEUE from above) + */ + if (!WARN_ON(!gq_map || old_gqid > gq_map->max_ents || + gq_map->map[old_gqid] != dqid)) { + /* Unset old mapping */ + gq_map->map[old_gqid] = NO_QUEUE; + if (--gq_map->set_count == 0) { + /* Done with map so free */ + rcu_assign_pointer(*pmap, NULL); + call_rcu(&gq_map->rcu, queue_map_release); + } + } + *p_gqid = NO_QUEUE; + + goto out; + } + + if (!gq_map || gqid >= gq_map->max_ents) { + unsigned int max_queues; + int i = 0; + + /* Need to create or expand queue map */ + + max_queues = QUEUE_MAP_ALLOC_NUMBER(gqid + 1); + + gq_map = vmalloc(QUEUE_MAP_ALLOC_SIZE(max_queues)); + if (!gq_map) { + ret = -ENOMEM; + goto out; + } + + gq_map->max_ents = max_queues; + + if (old_gq_map) { + /* Copy old map entries */ + + memcpy(gq_map->map, old_gq_map->map, + old_gq_map->max_ents * sizeof(gq_map->map[0])); + gq_map->set_count = old_gq_map->set_count; + i = old_gq_map->max_ents; + } else { + gq_map->set_count = 0; + } + + /* Initialize entries not copied from old map */ + for (; i < max_queues; i++) + gq_map->map[i] = NO_QUEUE; + } else if (gq_map->map[gqid] != NO_QUEUE) { + /* The global qid is already mapped to another device qid */ + ret = -EBUSY; + goto out; + } + + /* Set map entry */ + gq_map->map[gqid] = dqid; + gq_map->set_count++; + + if (old_gqid != NO_QUEUE) { + /* We know old_gqid is not equal to gqid */ + if (!WARN_ON(!old_gq_map || + old_gqid > old_gq_map->max_ents || + old_gq_map->map[old_gqid] != dqid)) { + /* Unset old mapping in (new) table */ + gq_map->map[old_gqid] = NO_QUEUE; + gq_map->set_count--; + } + } + + if (gq_map != old_gq_map) { + rcu_assign_pointer(*pmap, gq_map); + if (old_gq_map) + call_rcu(&old_gq_map->rcu, queue_map_release); + } + + /* Save for caller */ + *p_gqid = gqid; + +out: + mutex_unlock(&global_mapping_table); + + return ret; +} + +static ssize_t show_rx_queue_global_mapping(struct netdev_rx_queue *queue, + char *buf) +{ + u16 gqid = queue->gqid; + + if (gqid == NO_QUEUE) + return sprintf(buf, "none\n"); + else + return sprintf(buf, "%u\n", gqid); +} + +static ssize_t store_rx_queue_global_mapping(struct netdev_rx_queue *queue, + const char *buf, size_t len) +{ + unsigned long gqid; + int ret; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + ret = kstrtoul(buf, 0, &gqid); + if (ret < 0) + return ret; + + if (gqid > RPS_MAX_QID || WARN_ON(queue->index > RPS_MAX_QID)) + return -EINVAL; + + ret = set_device_queue_mapping(&queue->dev->rx_gqueue_map, + gqid, queue->index, &queue->gqid); + return ret ? : len; +} + static struct rx_queue_attribute rps_cpus_attribute __ro_after_init = __ATTR(rps_cpus, 0644, show_rps_map, store_rps_map); static struct rx_queue_attribute rps_dev_flow_table_cnt_attribute __ro_after_init = __ATTR(rps_flow_cnt, 0644, show_rps_dev_flow_table_cnt, store_rps_dev_flow_table_cnt); +static struct rx_queue_attribute rx_queue_global_mapping_attribute __ro_after_init = + __ATTR(global_queue_mapping, 0644, + show_rx_queue_global_mapping, store_rx_queue_global_mapping); #endif /* CONFIG_RPS */ static struct attribute *rx_queue_default_attrs[] __ro_after_init = { #ifdef CONFIG_RPS &rps_cpus_attribute.attr, &rps_dev_flow_table_cnt_attribute.attr, + &rx_queue_global_mapping_attribute.attr, #endif NULL }; @@ -896,8 +1044,11 @@ static void rx_queue_release(struct kobject *kobj) { struct netdev_rx_queue *queue = to_rx_queue(kobj); #ifdef CONFIG_RPS - struct rps_map *map; struct rps_dev_flow_table *flow_table; + struct rps_map *map; + + set_device_queue_mapping(&queue->dev->rx_gqueue_map, NO_QUEUE, + queue->index, &queue->gqid); map = rcu_dereference_protected(queue->rps_map, 1); if (map) { @@ -1152,6 +1303,46 @@ static ssize_t traffic_class_show(struct netdev_queue *queue, sprintf(buf, "%u\n", tc); } +#ifdef CONFIG_RPS +static ssize_t show_queue_global_queue_mapping(struct netdev_queue *queue, + char *buf) +{ + u16 gqid = queue->gqid; + + if (gqid == NO_QUEUE) + return sprintf(buf, "none\n"); + else + return sprintf(buf, "%u\n", gqid); + return 0; +} + +static ssize_t store_queue_global_queue_mapping(struct netdev_queue *queue, + const char *buf, size_t len) +{ + unsigned long gqid; + int ret; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + ret = kstrtoul(buf, 0, &gqid); + if (ret < 0) + return ret; + + if (gqid > RPS_MAX_QID || WARN_ON(queue->index > RPS_MAX_QID)) + return -EINVAL; + + ret = set_device_queue_mapping(&queue->dev->tx_gqueue_map, + gqid, queue->index, &queue->gqid); + return ret ? : len; +} + +static struct netdev_queue_attribute global_queue_mapping_attribute __ro_after_init = + __ATTR(global_queue_mapping, 0644, + show_queue_global_queue_mapping, + store_queue_global_queue_mapping); +#endif + #ifdef CONFIG_XPS static ssize_t tx_maxrate_show(struct netdev_queue *queue, char *buf) @@ -1483,6 +1674,9 @@ static struct netdev_queue_attribute xps_rxqs_attribute __ro_after_init static struct attribute *netdev_queue_default_attrs[] __ro_after_init = { &queue_trans_timeout.attr, &queue_traffic_class.attr, +#ifdef CONFIG_RPS + &global_queue_mapping_attribute.attr, +#endif #ifdef CONFIG_XPS &xps_cpus_attribute.attr, &xps_rxqs_attribute.attr, @@ -1496,6 +1690,9 @@ static void netdev_queue_release(struct kobject *kobj) { struct netdev_queue *queue = to_netdev_queue(kobj); + set_device_queue_mapping(&queue->dev->tx_gqueue_map, NO_QUEUE, + queue->index, &queue->gqid); + memset(kobj, 0, sizeof(*kobj)); dev_put(queue->dev); } From patchwork Wed Jun 24 17:17:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316469 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=ObvuDFFW; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVJJ6Th8z9sRR for ; Thu, 25 Jun 2020 03:19:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405490AbgFXRTo (ORCPT ); Wed, 24 Jun 2020 13:19:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405475AbgFXRTl (ORCPT ); Wed, 24 Jun 2020 13:19:41 -0400 Received: from mail-pg1-x542.google.com (mail-pg1-x542.google.com [IPv6:2607:f8b0:4864:20::542]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB475C061573 for ; Wed, 24 Jun 2020 10:19:40 -0700 (PDT) Received: by mail-pg1-x542.google.com with SMTP id e18so1718141pgn.7 for ; Wed, 24 Jun 2020 10:19:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Tz/dNrPVum4it+Ly6TC8fULsSd8JuS1v0zXevdgvvhs=; b=ObvuDFFWTmxx/SvK/KVsAor3wSKsnk/ZUgjGsWyvjW8R55vRGplzyzfwHHZeJEgLX+ aHIRjs44PTPlsgq5u/lVovLtomhC8geTUM1MJ9YOjHKB6SccIZRiuqWRtjoY/H60v3YU VGaevvu3HyhJDGv3M9rGgW9gQhPNeLFmoaMAZxYfwTbn562ZQPl4s3pjhankdISF2iv7 Qn+UqvadTe9b5k1iBcrAU2qwV/uHuMzhQHfghdRjvK9fJdct/Ebzn9+j7ipYG0XC6Wuy c52L2LuDW75tnGDKv3eTvSieVH80LIVVkQX5Tbiaq0N5r498KhLhQNuM0/e/VYL/iirc AVKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Tz/dNrPVum4it+Ly6TC8fULsSd8JuS1v0zXevdgvvhs=; b=Va5+9GxCrvLkpuWr1UWvjxKGnTdc6Yx3LLzwG5VqwdzFsZkKy9CH8phUumJOwgswZy LWkx08+oiL65/zXD1idvSEVkCiXnNJi42Lq0VrDRRJGUm+HCh+MkFrhUHov9smAt52oW KQr/LmQyBAdRCbzGgUHFy1UdNg876fKT4OHmHUk0IZkDjt1vg6S9wQUOgHUpUTh5gkmO SKQfwdoVe6UQsDA+lOa6vor2qYDIoNEarkE4i5qLY9hOvwvipGLZEik/qB1/TocA18VX 5i4OYZiYz7vRsc21XbyCxwwyOa2oMhQffc6TQbQfsQrBpnQIwnlAXxtiV9kPPr2+idIn hy2g== X-Gm-Message-State: AOAM531UMMOzzim2QbmywJ5CwO4diZmdXRVhknp8ZsDmyLQtEv4/vNBi qHoinbDjwlmyMLtGiu4nuerlF/jj0cQ= X-Google-Smtp-Source: ABdhPJzKAtBETvPWNaCV0bwfd8SU/QRIzQ7FwF62ylXKfvhzUPnzTsJPfYTNTzXVNYO1YBbuaAbe4A== X-Received: by 2002:aa7:9404:: with SMTP id x4mr30433227pfo.158.1593019179276; Wed, 24 Jun 2020 10:19:39 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:38 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 08/11] ptq: Per Thread Queues Date: Wed, 24 Jun 2020 10:17:47 -0700 Message-Id: <20200624171749.11927-9-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Per Thread Queues allows assigning a transmit and receive queue to each thread via a cgroup controller. These queues are "global queues" that are mapped to a real device queue when the sending or receiving device is known. Patch includes: - Create net_queues cgroup controller - Cgroup controller includes attributes to set a transmit and receive queue range, unique assignment attributes, and symmetric assignment attribute. Also, a read-only attribute that lists task of the cgroup and their assigned queues - Add a net_queue_pair to task_struct - Make ptq_cgroup_queue_desc which defines a index range by a base index and length of the range - Assign queues to tasks when they attach to the cgroup. For each of receive and transmit, a queue is selected from the perspective range configured in the cgroup. If the "assign" attribute is set for receive or transmit, a unique queue (one not previously assigned to another task) is chosen. If the "symmetric" attribute is set then the receive and transmit queues are selected to be the same number. If there are no queues available (e.g. assign attribute is set for receive and all the queues in the receive range are already assigned) then assignment silently fails. - The assigned transmit and receive queues are set in net_queue_pair structure for the task_struct --- include/linux/cgroup_subsys.h | 4 + include/linux/sched.h | 4 + include/net/ptq.h | 45 +++ kernel/fork.c | 4 + net/Kconfig | 18 + net/core/Makefile | 1 + net/core/ptq.c | 688 ++++++++++++++++++++++++++++++++++ 7 files changed, 764 insertions(+) create mode 100644 include/net/ptq.h create mode 100644 net/core/ptq.c diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index acb77dcff3b4..9f80cde69890 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -49,6 +49,10 @@ SUBSYS(perf_event) SUBSYS(net_prio) #endif +#if IS_ENABLED(CONFIG_CGROUP_NET_QUEUES) +SUBSYS(net_queues) +#endif + #if IS_ENABLED(CONFIG_CGROUP_HUGETLB) SUBSYS(hugetlb) #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index b62e6aaf28f0..97cb8288faca 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -32,6 +32,7 @@ #include #include #include +#include /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; @@ -1313,6 +1314,9 @@ struct task_struct { __mce_reserved : 62; struct callback_head mce_kill_me; #endif +#ifdef CONFIG_PER_THREAD_QUEUES + struct net_queue_pair ptq_queues; +#endif /* * New fields for task_struct should be added above here, so that diff --git a/include/net/ptq.h b/include/net/ptq.h new file mode 100644 index 000000000000..a8ce39a85136 --- /dev/null +++ b/include/net/ptq.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Per thread queues + * + * Copyright (c) 2020 Tom Herbert + */ + +#ifndef _NET_PTQ_H +#define _NET_PTQ_H + +#include +#include + +struct ptq_cgroup_queue_desc { + struct rcu_head rcu; + + unsigned short base; + unsigned short num; + unsigned long alloced[0]; +}; + +struct ptq_css { + struct cgroup_subsys_state css; + + struct ptq_cgroup_queue_desc __rcu *txqs; + struct ptq_cgroup_queue_desc __rcu *rxqs; + + unsigned short flags; +#define PTQ_F_RX_ASSIGN BIT(0) +#define PTQ_F_TX_ASSIGN BIT(1) +#define PTQ_F_SYMMETRIC BIT(2) +}; + +static inline struct ptq_css *css_to_ptq_css(struct cgroup_subsys_state *css) +{ + return (struct ptq_css *)css; +} + +static inline struct ptq_cgroup_queue_desc **pcqd_select_desc( + struct ptq_css *pss, bool doing_tx) +{ + return doing_tx ? &pss->txqs : &pss->rxqs; +} + +#endif /* _NET_PTQ_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 142b23645d82..5d604e778f4d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -958,6 +958,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; #endif + +#ifdef CONFIG_PER_THREAD_QUEUES + init_net_queue_pair(&tsk->ptq_queues); +#endif return tsk; free_stack: diff --git a/net/Kconfig b/net/Kconfig index d1672280d6a4..fd2d1da89cb9 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -256,6 +256,24 @@ config RFS_ACCEL select CPU_RMAP default y +config CGROUP_NET_QUEUES + depends on PER_THREAD_QUEUES + depends on CGROUPS + bool + +config PER_THREAD_QUEUES + bool "Per thread queues" + depends on RPS + depends on RFS_ACCEL + select CGROUP_NET_QUEUES + default y + help + Assign network hardware queues to tasks. This creates a + cgroup subsys net_queues that allows associating a hardware + transmit queue and a receive queue with a thread. The interface + specifies a range of queues for each side from which queues + are assigned to each task. + config XPS bool depends on SMP diff --git a/net/core/Makefile b/net/core/Makefile index 3e2c378e5f31..156a152e2b0a 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -35,3 +35,4 @@ obj-$(CONFIG_NET_DEVLINK) += devlink.o obj-$(CONFIG_GRO_CELLS) += gro_cells.o obj-$(CONFIG_FAILOVER) += failover.o obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o +obj-$(CONFIG_PER_THREAD_QUEUES) += ptq.o diff --git a/net/core/ptq.c b/net/core/ptq.c new file mode 100644 index 000000000000..edf6718e0a71 --- /dev/null +++ b/net/core/ptq.c @@ -0,0 +1,688 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* net/core/ptq.c + * + * Copyright (c) 2020 Tom Herbert + */ +#include +#include +#include +#include +#include +#include +#include + +struct ptq_cgroup_queue_desc null_pcdesc; + +static DEFINE_MUTEX(ptq_mutex); + +#define NETPTQ_ID_MAX USHRT_MAX + +/* Check is a queue identifier is in the range of a descriptor */ + +static inline bool idx_in_range(struct ptq_cgroup_queue_desc *pcdesc, + unsigned short idx) +{ + return (idx >= pcdesc->base && idx < (pcdesc->base + pcdesc->num)); +} + +/* Mutex held */ +static int assign_one(struct ptq_cgroup_queue_desc *pcdesc, + bool assign, unsigned short requested_idx) +{ + unsigned short idx; + + if (!pcdesc->num) + return NO_QUEUE; + + if (idx_in_range(pcdesc, requested_idx)) { + /* Try to use requested queue id */ + + if (assign) { + idx = requested_idx - pcdesc->base; + if (!test_bit(idx, pcdesc->alloced)) { + set_bit(idx, pcdesc->alloced); + return requested_idx; + } + } else { + return requested_idx; + } + } + + /* Need new queue id */ + + if (assign) { + idx = find_first_zero_bit(pcdesc->alloced, pcdesc->num); + if (idx >= pcdesc->num) + return -EBUSY; + set_bit(idx, pcdesc->alloced); + return pcdesc->base + idx; + } + + /* Checked for zero ranged above */ + return pcdesc->base + (get_random_u32() % pcdesc->num); +} + +/* Compute the overlap between two queue ranges. Indicate + * the overlap by returning the relative offsets for the two + * queue descriptors where the overlap starts and the + * length of the overlap region. + */ +static inline unsigned short +make_relative_idxs(struct ptq_cgroup_queue_desc *pcdesc0, + struct ptq_cgroup_queue_desc *pcdesc1, + unsigned short *rel_idx0, + unsigned short *rel_idx1) +{ + if (pcdesc0->base + pcdesc0->num <= pcdesc1->base || + pcdesc1->base + pcdesc1->num <= pcdesc0->base) { + /* No overlap */ + return 0; + } + + if (pcdesc0->base >= pcdesc1->base) { + *rel_idx0 = 0; + *rel_idx1 = pcdesc0->base - pcdesc1->base; + } else { + *rel_idx0 = pcdesc1->base - pcdesc0->base; + *rel_idx1 = 0; + } + + return min_t(unsigned short, pcdesc0->num - *rel_idx0, + pcdesc1->num - *rel_idx1); +} + +/* Mutex held */ +static int assign_symmetric(struct ptq_css *pss, + struct ptq_cgroup_queue_desc *tx_pcdesc, + struct ptq_cgroup_queue_desc *rx_pcdesc, + unsigned short requested_idx1, + unsigned short requested_idx2) +{ + unsigned short base_tidx, base_ridx, overlap; + unsigned short tidx, ridx, num_tx, num_rx; + unsigned int requested_idx = NO_QUEUE; + int ret; + + if (idx_in_range(tx_pcdesc, requested_idx1) && + idx_in_range(rx_pcdesc, requested_idx1)) + requested_idx = requested_idx1; + else if (idx_in_range(tx_pcdesc, requested_idx2) && + idx_in_range(rx_pcdesc, requested_idx2)) + requested_idx = requested_idx2; + + if (requested_idx != NO_QUEUE) { + unsigned short tidx = requested_idx - tx_pcdesc->base; + unsigned short ridx = requested_idx - rx_pcdesc->base; + + /* Try to use requested queue id */ + + ret = requested_idx; /* Be optimisitic */ + + if ((pss->flags & (PTQ_F_TX_ASSIGN | PTQ_F_RX_ASSIGN)) == + (PTQ_F_TX_ASSIGN | PTQ_F_RX_ASSIGN)) { + if (!test_bit(tidx, tx_pcdesc->alloced) && + !test_bit(ridx, rx_pcdesc->alloced)) { + set_bit(tidx, tx_pcdesc->alloced); + set_bit(ridx, rx_pcdesc->alloced); + + goto out; + } + } else if (pss->flags & PTQ_F_TX_ASSIGN) { + if (!test_bit(tidx, tx_pcdesc->alloced)) { + set_bit(tidx, tx_pcdesc->alloced); + + goto out; + } + } else if (pss->flags & PTQ_F_RX_ASSIGN) { + if (!test_bit(ridx, rx_pcdesc->alloced)) { + set_bit(ridx, rx_pcdesc->alloced); + + goto out; + } + } else { + goto out; + } + } + + /* Need new queue id */ + + overlap = make_relative_idxs(tx_pcdesc, rx_pcdesc, &base_tidx, + &base_ridx); + if (!overlap) { + /* No overlap in ranges */ + ret = -ERANGE; + goto out; + } + + num_tx = base_tidx + overlap; + num_rx = base_ridx + overlap; + + ret = -EBUSY; + + if ((pss->flags & (PTQ_F_TX_ASSIGN | PTQ_F_RX_ASSIGN)) == + (PTQ_F_TX_ASSIGN | PTQ_F_RX_ASSIGN)) { + /* Both sides need to be assigned, find common cleared + * bit in respective bitmaps + */ + for (tidx = base_tidx; + (tidx = find_next_zero_bit(tx_pcdesc->alloced, + num_tx, tidx)) < num_tx; + tidx++) { + ridx = base_ridx + (tidx - base_tidx); + if (!test_bit(ridx, rx_pcdesc->alloced)) + break; + } + if (tidx < num_tx) { + /* Found symmetric queue index that is unassigned + * for both transmit and receive + */ + + set_bit(tidx, tx_pcdesc->alloced); + set_bit(ridx, rx_pcdesc->alloced); + ret = tx_pcdesc->base + tidx; + } + } else if (pss->flags & PTQ_F_TX_ASSIGN) { + tidx = find_next_zero_bit(tx_pcdesc->alloced, + num_tx, base_tidx); + if (tidx < num_tx) { + set_bit(tidx, tx_pcdesc->alloced); + ret = tx_pcdesc->base + tidx; + } + } else if (pss->flags & PTQ_F_RX_ASSIGN) { + ridx = find_next_zero_bit(rx_pcdesc->alloced, + num_rx, base_ridx); + if (ridx < num_rx) { + set_bit(ridx, rx_pcdesc->alloced); + ret = rx_pcdesc->base + ridx; + } + } else { + /* Overlap can't be zero from check above */ + ret = tx_pcdesc->base + base_tidx + + (get_random_u32() % overlap); + } +out: + return ret; +} + +/* Mutex held */ +static int assign_queues(struct ptq_css *pss, struct task_struct *task) +{ + struct ptq_cgroup_queue_desc *tx_pcdesc, *rx_pcdesc; + unsigned short txq_id = NO_QUEUE, rxq_id = NO_QUEUE; + struct net_queue_pair *qpair = &task->ptq_queues; + int ret, ret2; + + tx_pcdesc = rcu_dereference_protected(pss->txqs, + mutex_is_locked(&ptq_mutex)); + rx_pcdesc = rcu_dereference_protected(pss->rxqs, + mutex_is_locked(&ptq_mutex)); + + if (pss->flags & PTQ_F_SYMMETRIC) { + /* Assigning symmetric queues. Requested identifier is from + * existing queue pair corresponding to side (TX or RX) + * that is being tracked based on assign flag. + */ + ret = assign_symmetric(pss, tx_pcdesc, rx_pcdesc, + qpair->txq_id, qpair->rxq_id); + if (ret >= 0) { + txq_id = ret; + rxq_id = ret; + ret = 0; + } + } else { + /* Not doing symmetric assignment. Assign transmit and + * receive queues independently. + */ + ret = assign_one(tx_pcdesc, pss->flags & PTQ_F_TX_ASSIGN, + qpair->txq_id); + if (ret >= 0) + txq_id = ret; + + ret2 = assign_one(rx_pcdesc, pss->flags & PTQ_F_RX_ASSIGN, + qpair->rxq_id); + if (ret2 >= 0) + rxq_id = ret2; + + /* Return error if either assignment failed. Note that one + * assignment for side may succeed and the other may fail. + */ + if (ret2 < 0) + ret = ret2; + else if (ret >= 0) + ret = 0; + } + + qpair->txq_id = txq_id; + qpair->rxq_id = rxq_id; + + return ret; +} + +/* Mutex held */ +static void unassign_one(struct ptq_cgroup_queue_desc *pcdesc, + unsigned short idx, bool assign) +{ + if (!pcdesc->num) { + WARN_ON(idx != NO_QUEUE); + return; + } + if (!assign || WARN_ON(!idx_in_range(pcdesc, idx))) + return; + + idx -= pcdesc->base; + clear_bit(idx, pcdesc->alloced); +} + +/* Mutex held */ +static void unassign_queues(struct ptq_css *pss, struct task_struct *task) +{ + struct ptq_cgroup_queue_desc *tx_pcdesc, *rx_pcdesc; + struct net_queue_pair *qpair = &task->ptq_queues; + + tx_pcdesc = rcu_dereference_protected(pss->txqs, + mutex_is_locked(&ptq_mutex)); + rx_pcdesc = rcu_dereference_protected(pss->rxqs, + mutex_is_locked(&ptq_mutex)); + + unassign_one(tx_pcdesc, qpair->txq_id, pss->flags & PTQ_F_TX_ASSIGN); + unassign_one(rx_pcdesc, qpair->rxq_id, pss->flags & PTQ_F_RX_ASSIGN); + + init_net_queue_pair(qpair); +} + +/* Mutex held */ +static void reassign_queues_all(struct ptq_css *pss) +{ + struct ptq_cgroup_queue_desc *tx_pcdesc, *rx_pcdesc; + struct task_struct *task; + struct css_task_iter it; + + tx_pcdesc = rcu_dereference_protected(pss->txqs, + mutex_is_locked(&ptq_mutex)); + rx_pcdesc = rcu_dereference_protected(pss->rxqs, + mutex_is_locked(&ptq_mutex)); + + /* PTQ configuration has changed, attempt to reassign queues for new + * configuration. The assignment functions try to keep threads using + * the same queues as much as possible to avoid thrashing. + */ + + /* Clear the bitmaps, we will resonstruct them in the assignments */ + bitmap_zero(tx_pcdesc->alloced, tx_pcdesc->num); + bitmap_zero(rx_pcdesc->alloced, rx_pcdesc->num); + + css_task_iter_start(&pss->css, 0, &it); + while ((task = css_task_iter_next(&it))) + assign_queues(pss, task); + css_task_iter_end(&it); +} + +static struct cgroup_subsys_state * +cgrp_css_alloc(struct cgroup_subsys_state *parent_css) +{ + struct ptq_css *pss; + + pss = kzalloc(sizeof(*pss), GFP_KERNEL); + if (!pss) + return ERR_PTR(-ENOMEM); + + RCU_INIT_POINTER(pss->txqs, &null_pcdesc); + RCU_INIT_POINTER(pss->rxqs, &null_pcdesc); + + return &pss->css; +} + +static int cgrp_css_online(struct cgroup_subsys_state *css) +{ + struct cgroup_subsys_state *parent_css = css->parent; + int ret = 0; + + if (css->id > NETPTQ_ID_MAX) + return -ENOSPC; + + if (!parent_css) + return 0; + + /* Don't inherit from parent for the time being */ + + return ret; +} + +static void cgrp_css_free(struct cgroup_subsys_state *css) +{ + kfree(css); +} + +static u64 read_ptqidx(struct cgroup_subsys_state *css, struct cftype *cft) +{ + return css->id; +} + +/* Takes mutex */ +static int ptq_can_attach(struct cgroup_taskset *tset) +{ + struct cgroup_subsys_state *dst_css, *src_css; + struct task_struct *task; + + /* Unassign queues for tasks in preparation for attaching the tasks + * to a different css + */ + + mutex_lock(&ptq_mutex); + + cgroup_taskset_for_each(task, dst_css, tset) { + src_css = task_css(task, net_queues_cgrp_id); + unassign_queues(css_to_ptq_css(src_css), task); + } + + mutex_unlock(&ptq_mutex); + + return 0; +} + +/* Takes mutex */ +static void ptq_attach(struct cgroup_taskset *tset) +{ + struct cgroup_subsys_state *css; + struct task_struct *task; + + mutex_lock(&ptq_mutex); + + /* Assign queues for tasks their new css */ + + cgroup_taskset_for_each(task, css, tset) + assign_queues(css_to_ptq_css(css), task); + + mutex_unlock(&ptq_mutex); +} + +/* Takes mutex */ +static void ptq_cancel_attach(struct cgroup_taskset *tset) +{ + struct cgroup_subsys_state *dst_css, *src_css; + struct task_struct *task; + + mutex_lock(&ptq_mutex); + + /* Attach failed, reassign queues for tasks in their original + * cgroup (previously they were unassigned in can_attach) + */ + + cgroup_taskset_for_each(task, dst_css, tset) { + /* Reassign in old cgroup */ + src_css = task_css(task, net_queues_cgrp_id); + assign_queues(css_to_ptq_css(src_css), task); + } + + mutex_unlock(&ptq_mutex); +} + +/* Takes mutex */ +static void ptq_fork(struct task_struct *task) +{ + struct cgroup_subsys_state *css = + task_css(task, net_queues_cgrp_id); + + mutex_lock(&ptq_mutex); + assign_queues(css_to_ptq_css(css), task); + mutex_unlock(&ptq_mutex); +} + +/* Takes mutex */ +static void ptq_exit(struct task_struct *task) +{ + struct cgroup_subsys_state *css = + task_css(task, net_queues_cgrp_id); + + mutex_lock(&ptq_mutex); + unassign_queues(css_to_ptq_css(css), task); + mutex_unlock(&ptq_mutex); +} + +static u64 read_flag(struct cgroup_subsys_state *css, unsigned int flag) +{ + return !!(css_to_ptq_css(css)->flags & flag); +} + +/* Takes mutex */ +static int write_flag(struct cgroup_subsys_state *css, unsigned int flag, + u64 val) +{ + struct ptq_css *pss = css_to_ptq_css(css); + int ret = 0; + + mutex_lock(&ptq_mutex); + + if (val) + pss->flags |= flag; + else + pss->flags &= ~flag; + + /* If we've changed a flag that affects how queues are assigned then + * reassign the queues. + */ + if (flag & (PTQ_F_TX_ASSIGN | PTQ_F_RX_ASSIGN | PTQ_F_SYMMETRIC)) + reassign_queues_all(pss); + + mutex_unlock(&ptq_mutex); + + return ret; +} + +static int show_queue_desc(struct seq_file *sf, + struct ptq_cgroup_queue_desc *pcdesc) +{ + seq_printf(sf, "%u:%u\n", pcdesc->base, pcdesc->num); + + return 0; +} + +static int parse_queues(char *buf, unsigned short *base, + unsigned short *num) +{ + return (sscanf(buf, "%hu:%hu", base, num) != 2) ? -EINVAL : 0; +} + +static void format_queue(char *buf, unsigned short idx) +{ + if (idx == NO_QUEUE) + sprintf(buf, "none"); + else + sprintf(buf, "%hu", idx); +} + +static int cgroup_procs_show(struct seq_file *sf, void *v) +{ + struct net_queue_pair *qpair; + struct task_struct *task = v; + char buf1[32], buf2[32]; + + qpair = &task->ptq_queues; + format_queue(buf1, qpair->txq_id); + format_queue(buf2, qpair->rxq_id); + + seq_printf(sf, "%d: %s %s\n", task_pid_vnr(v), buf1, buf2); + return 0; +} + +#define QDESC_LEN(NUM) (sizeof(struct ptq_cgroup_queue_desc) + \ + BITS_TO_LONGS(NUM) * sizeof(unsigned long)) + +/* Takes mutex */ +static int set_queue_desc(struct ptq_css *pss, + struct ptq_cgroup_queue_desc **pcdescp, + unsigned short base, unsigned short num) +{ + struct ptq_cgroup_queue_desc *new_pcdesc = &null_pcdesc, *old_pcdesc; + int ret = 0; + + /* Check if RPS maximum queues can accommodate the range */ + ret = rps_check_max_queues(base + num); + if (ret) + return ret; + + mutex_lock(&ptq_mutex); + + old_pcdesc = rcu_dereference_protected(*pcdescp, + mutex_is_locked(&ptq_mutex)); + + if (old_pcdesc && old_pcdesc->base == base && old_pcdesc->num == num) { + /* Nothing to do */ + goto out; + } + + if (num != 0) { + new_pcdesc = kzalloc(QDESC_LEN(num), GFP_KERNEL); + if (!new_pcdesc) { + ret = -ENOMEM; + goto out; + } + new_pcdesc->base = base; + new_pcdesc->num = num; + } + rcu_assign_pointer(*pcdescp, new_pcdesc); + if (old_pcdesc != &null_pcdesc) + kfree_rcu(old_pcdesc, rcu); + + reassign_queues_all(pss); +out: + mutex_unlock(&ptq_mutex); + + return ret; +} + +static ssize_t write_tx_queues(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct ptq_css *pss = css_to_ptq_css(of_css(of)); + unsigned short base, num; + int ret; + + ret = parse_queues(buf, &base, &num); + if (ret < 0) + return ret; + + return set_queue_desc(pss, &pss->txqs, base, num) ? : nbytes; +} + +static int read_tx_queues(struct seq_file *sf, void *v) +{ + int ret; + + rcu_read_lock(); + ret = show_queue_desc(sf, rcu_dereference(css_to_ptq_css(seq_css(sf))-> + txqs)); + rcu_read_unlock(); + + return ret; +} + +static ssize_t write_rx_queues(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct ptq_css *pss = css_to_ptq_css(of_css(of)); + unsigned short base, num; + int ret; + + ret = parse_queues(buf, &base, &num); + if (ret < 0) + return ret; + + return set_queue_desc(pss, &pss->rxqs, base, num) ? : nbytes; +} + +static int read_rx_queues(struct seq_file *sf, void *v) +{ + int ret; + + rcu_read_lock(); + ret = show_queue_desc(sf, rcu_dereference(css_to_ptq_css(seq_css(sf))-> + rxqs)); + rcu_read_unlock(); + + return ret; +} + +static u64 read_tx_assign(struct cgroup_subsys_state *css, struct cftype *cft) +{ + return read_flag(css, PTQ_F_TX_ASSIGN); +} + +static int write_tx_assign(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + return write_flag(css, PTQ_F_TX_ASSIGN, val); +} + +static u64 read_rx_assign(struct cgroup_subsys_state *css, struct cftype *cft) +{ + return read_flag(css, PTQ_F_RX_ASSIGN); +} + +static int write_rx_assign(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + return write_flag(css, PTQ_F_RX_ASSIGN, val); +} + +static u64 read_symmetric(struct cgroup_subsys_state *css, struct cftype *cft) +{ + return read_flag(css, PTQ_F_SYMMETRIC); +} + +static int write_symmetric(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + return write_flag(css, PTQ_F_SYMMETRIC, val); +} + +static struct cftype ss_files[] = { + { + .name = "ptqidx", + .read_u64 = read_ptqidx, + }, + { + .name = "rx-queues", + .seq_show = read_rx_queues, + .write = write_rx_queues, + }, + { + .name = "tx-queues", + .seq_show = read_tx_queues, + .write = write_tx_queues, + }, + { + .name = "rx-assign", + .read_u64 = read_rx_assign, + .write_u64 = write_rx_assign, + }, + { + .name = "tx-assign", + .read_u64 = read_tx_assign, + .write_u64 = write_tx_assign, + }, + { + .name = "symmetric", + .read_u64 = read_symmetric, + .write_u64 = write_symmetric, + }, + { + .name = "task-queues", + .seq_start = cgroup_threads_start, + .seq_next = cgroup_procs_next, + .seq_show = cgroup_procs_show, + }, + { } /* terminate */ +}; + +struct cgroup_subsys net_queues_cgrp_subsys = { + .css_alloc = cgrp_css_alloc, + .css_online = cgrp_css_online, + .css_free = cgrp_css_free, + .attach = ptq_attach, + .can_attach = ptq_can_attach, + .cancel_attach = ptq_cancel_attach, + .fork = ptq_fork, + .exit = ptq_exit, + .legacy_cftypes = ss_files, +}; From patchwork Wed Jun 24 17:17:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316470 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=gmhtc1+M; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVJP6RZtz9sPF for ; Thu, 25 Jun 2020 03:19:49 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405427AbgFXRTs (ORCPT ); Wed, 24 Jun 2020 13:19:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405486AbgFXRTn (ORCPT ); Wed, 24 Jun 2020 13:19:43 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDE02C061573 for ; Wed, 24 Jun 2020 10:19:42 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id h22so1421788pjf.1 for ; Wed, 24 Jun 2020 10:19:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2OQX8d52ytdpe1I2jFepdSnY7MQ3IgjSgV7EDnS0Lvk=; b=gmhtc1+MV80WWG0qj3axJDEtLOorRgPozciZiqL3nxVjJaIMql+Ed5k4JxwLkez+jX /r8IZk7SIhmCggN+pk0x7s0nt0nzjW83K9nzKuq+M0DNR2exUew/QAc3P/AmuBR/RuVZ 5/1cueb+Iv1vrdi2dQUYRVRVWHeWupsC/2+G3jVCuJ85yT2iQsltRD3/+5HQceix75bx Cl85qB7jmQbjBppmOHynl90RGjmiPYbSq7dcZk2opNRP/lv3FfVWHgMXqqdWVwUQnJ6F hMdwbo2k+EtjfhUQaKaMNHCgPfO/PvBxnqkZYPAOC60Kp6HV9wCcm0hEb0U3tqUKPHLE AtyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2OQX8d52ytdpe1I2jFepdSnY7MQ3IgjSgV7EDnS0Lvk=; b=NEbWIThuyvgqSqpEeqxLv0OxedYYIASq71lAhJcxeOCrOlg9AaoMWoMjbu61y4ETYo wB0WST2jR5Jidudu6dKsnTnINpOJfL2tkNDeYvlZUnZubLj6jejeJKVGarKdssrZ8iFz lEeEktCdFPUouUWib1wYuvA8t74tzLHdY5f1X3YwUR/e2AEBFjijDcdWoaW9tUtx0FCV IzAaPUSX68MEaE7DwN/B40c/iPeoIwRthqma7YP8i/SgB0wDqnLJ2pP7r43/JEPulQN8 ciQnIJXjxIYQkwhu9ZwIUv96PqmI+JQWiirlLL7KhbkX4sRmU0tSAqye3XQYcQ4E2A87 ikZg== X-Gm-Message-State: AOAM530oUawO+eeHwdYg6O3pWoLNmrjcio4h6LIulxYXUbzFOOx74wN6 5kmpKkF5KZlJnAKcCiENpc7fWzerwtg= X-Google-Smtp-Source: ABdhPJxBY+EJ+1WlNxRYRcW9/DQ3X+LXUb6EKTotiex2pfRuujuWNtWYlvu87p0OHHBbXwPm70hnbQ== X-Received: by 2002:a17:902:ab98:: with SMTP id f24mr30802753plr.154.1593019181924; Wed, 24 Jun 2020 10:19:41 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:41 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 09/11] ptq: Hook up transmit side of Per Queue Threads Date: Wed, 24 Jun 2020 10:17:48 -0700 Message-Id: <20200624171749.11927-10-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Support to select device queue for transmit based on the per thread transmit queue. Patch includes: - Add a global queue (gqid) mapping to sock - Function to convert gqid in a sock to a device queue (dqid) by calling sk_tx_gqid_to_dqid_get - Function sock_record_tx_queue to record a queue in a socket taken from ptq_threads in struct task - Call sock_record_tx_queue from af_inet send, listen, and accept functions to populate the socket's gqid for steerig - In netdev_pick_tx try to take the queue index from the socket using sk_tx_gqid_to_dqid_get --- include/net/sock.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++ net/core/dev.c | 9 ++++--- net/ipv4/af_inet.c | 6 +++++ 3 files changed, 75 insertions(+), 3 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index acb76cfaae1b..5ec9d02e7ad0 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -140,6 +140,7 @@ typedef __u64 __bitwise __addrpair; * @skc_node: main hash linkage for various protocol lookup tables * @skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol * @skc_tx_queue_mapping: tx queue number for this connection + * @skc_tx_gqid_mapping: global tx queue number for sending * @skc_rx_queue_mapping: rx queue number for this connection * @skc_flags: place holder for sk_flags * %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, @@ -225,6 +226,9 @@ struct sock_common { struct hlist_nulls_node skc_nulls_node; }; unsigned short skc_tx_queue_mapping; +#ifdef CONFIG_RPS + unsigned short skc_tx_gqid_mapping; +#endif #ifdef CONFIG_XPS unsigned short skc_rx_queue_mapping; #endif @@ -353,6 +357,9 @@ struct sock { #define sk_nulls_node __sk_common.skc_nulls_node #define sk_refcnt __sk_common.skc_refcnt #define sk_tx_queue_mapping __sk_common.skc_tx_queue_mapping +#ifdef CONFIG_RPS +#define sk_tx_gqid_mapping __sk_common.skc_tx_gqid_mapping +#endif #ifdef CONFIG_XPS #define sk_rx_queue_mapping __sk_common.skc_rx_queue_mapping #endif @@ -1792,6 +1799,34 @@ static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb, return __sk_receive_skb(sk, skb, nested, 1, true); } +static inline int sk_tx_gqid_get(const struct sock *sk) +{ +#ifdef CONFIG_RPS + if (sk && sk->sk_tx_gqid_mapping != NO_QUEUE) + return sk->sk_tx_gqid_mapping; +#endif + + return -1; +} + +static inline void sk_tx_gqid_set(struct sock *sk, int gqid) +{ +#ifdef CONFIG_RPS + /* sk_tx_queue_mapping accept only up to RPS_MAX_QID (0x7ffe) */ + if (WARN_ON_ONCE((unsigned int)gqid > RPS_MAX_QID && + gqid != NO_QUEUE)) + return; + sk->sk_tx_gqid_mapping = gqid; +#endif +} + +static inline void sk_tx_gqid_clear(struct sock *sk) +{ +#ifdef CONFIG_RPS + sk->sk_tx_gqid_mapping = NO_QUEUE; +#endif +} + static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) { /* sk_tx_queue_mapping accept only upto a 16-bit value */ @@ -1803,6 +1838,9 @@ static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) static inline void sk_tx_queue_clear(struct sock *sk) { sk->sk_tx_queue_mapping = NO_QUEUE; + + /* Clear tx_gqid at same points */ + sk_tx_gqid_clear(sk); } static inline int sk_tx_queue_get(const struct sock *sk) @@ -1813,6 +1851,31 @@ static inline int sk_tx_queue_get(const struct sock *sk) return -1; } +static inline int sk_tx_gqid_to_dqid_get(const struct net_device *dev, + const struct sock *sk) +{ + int ret = -1; +#ifdef CONFIG_RPS + int gqid; + u16 dqid; + + gqid = sk_tx_gqid_get(sk); + if (gqid >= 0) { + dqid = netdev_tx_gqid_to_dqid(dev, gqid); + if (dqid != NO_QUEUE) + ret = dqid; + } +#endif + return ret; +} + +static inline void sock_record_tx_queue(struct sock *sk) +{ +#ifdef CONFIG_PER_THREAD_QUEUES + sk_tx_gqid_set(sk, current->ptq_queues.txq_id); +#endif +} + static inline void sk_rx_queue_set(struct sock *sk, const struct sk_buff *skb) { #ifdef CONFIG_XPS diff --git a/net/core/dev.c b/net/core/dev.c index f64bf6608775..f4478c9b1c9c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3982,10 +3982,13 @@ u16 netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, if (queue_index < 0 || skb->ooo_okay || queue_index >= dev->real_num_tx_queues) { - int new_index = get_xps_queue(dev, sb_dev, skb); + int new_index = sk_tx_gqid_to_dqid_get(dev, sk); - if (new_index < 0) - new_index = skb_tx_hash(dev, sb_dev, skb); + if (new_index < 0) { + new_index = get_xps_queue(dev, sb_dev, skb); + if (new_index < 0) + new_index = skb_tx_hash(dev, sb_dev, skb); + } if (queue_index != new_index && sk && sk_fullsock(sk) && diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 02aa5cb3a4fd..9b36aa3d1622 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -201,6 +201,8 @@ int inet_listen(struct socket *sock, int backlog) lock_sock(sk); + sock_record_tx_queue(sk); + err = -EINVAL; if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM) goto out; @@ -630,6 +632,8 @@ int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, } } + sock_record_tx_queue(sk); + switch (sock->state) { default: err = -EINVAL; @@ -742,6 +746,7 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, lock_sock(sk2); sock_rps_record_flow(sk2); + sock_record_tx_queue(sk2); WARN_ON(!((1 << sk2->sk_state) & (TCPF_ESTABLISHED | TCPF_SYN_RECV | TCPF_CLOSE_WAIT | TCPF_CLOSE))); @@ -794,6 +799,7 @@ EXPORT_SYMBOL(inet_getname); int inet_send_prepare(struct sock *sk) { sock_rps_record_flow(sk); + sock_record_tx_queue(sk); /* We may need to bind the socket. */ if (!inet_sk(sk)->inet_num && !sk->sk_prot->no_autobind && From patchwork Wed Jun 24 17:17:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316471 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=Br8IBGH2; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVJR42WPz9sPF for ; Thu, 25 Jun 2020 03:19:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405498AbgFXRTv (ORCPT ); Wed, 24 Jun 2020 13:19:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405491AbgFXRTq (ORCPT ); Wed, 24 Jun 2020 13:19:46 -0400 Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73FF1C061573 for ; Wed, 24 Jun 2020 10:19:45 -0700 (PDT) Received: by mail-pf1-x443.google.com with SMTP id z63so1455019pfb.1 for ; Wed, 24 Jun 2020 10:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kHpXfg6D3vEApiJp8FQJ17UW76ojekkZfCE2qlXVgXY=; b=Br8IBGH2Xmb73N/0o9TRZOormUqUmQirAesW+uM30rLSADM+9b0Gwt8ZnXWUBHSUsG D19XU8quYUF+2lp1lyCCBwOvpGBpv54j+KDdoJsmq8xBQ4uqlE5Feoby9nGL8aJTg0gP lJmLMvOOKLzHV7Q4Q0v1gyFX5fGmtJL+tL97zTyjs/rNgZeT510fQGCPOZpTOO6zHVcl SbUcPIsvZk/rAlLaCdwYQVbVlnFOMVHlpnAm//cZXzVTRFLYvyhvHu8C432zERcqCUca yhSy0TuY48gG6iVPggS2R3jrAFDdnZj1CWkErdhMZeQS3rxYNUUMXKe2KOXQukv+2WQC w92w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kHpXfg6D3vEApiJp8FQJ17UW76ojekkZfCE2qlXVgXY=; b=gbbnQFqqKq4L+WKaszIiks4Fb4h/DHP3cgShATnVNimibYRmobMxkV1xKayI7N8fkr xQNo31yqaLEwxxxatNsQdu75jVBegvA2TILrdgTuVC2Rbly2WqViFeLDesmR23cop/+P FKBgjGQZRCX1PhGXMsjlxRGUlsebww1XnZJtBVHzh9La4eFlE2UDa87e63LF/ibkyKKI MMIBPQFlTkxTfAPtnbJSOugyG6PxVsog0TllDlZXuE9NUdRFEyk4l0MFFbQWUsiTaRUY NW7JBxyOQwT7GulrlsNqE+z9I7AOgCvJdT5PjrU3A/afaxc4nYS4MMFr+B/uvKDzDbXb QASw== X-Gm-Message-State: AOAM533OHip1sLxyDS1MA9QSpn451xDodBwOTywRyYaAB62Y8XZ6JMQn Whym1HRX35Z3bnwpk2ss8tJpH6J0vv4= X-Google-Smtp-Source: ABdhPJwYTaoHrrPEGizGjp15Mv9FGUcMrnWdlabyII631oaj14Xznqo2Dcto8bf6c7ccd6dX4UDwiw== X-Received: by 2002:a63:564e:: with SMTP id g14mr6655093pgm.326.1593019184487; Wed, 24 Jun 2020 10:19:44 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:43 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 10/11] ptq: Hook up receive side of Per Queue Threads Date: Wed, 24 Jun 2020 10:17:49 -0700 Message-Id: <20200624171749.11927-11-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add code to set the queue in an rflow as opposed to just setting the CPU in an rps_dev_flow entry. set_rps_qid is the analogue for set_rps_cpu but for setting queues. In get_rps_cpu, a check is performed that identifier in the sock_flow_table refers to a queue; when it does call set_rps_qid after converting the global qid in the sock_flow_table to a device qid. In rps_record_sock_flow check is there is a per task receive queue for current (i.e. current->ptq_queues.rxq_id != NO_QUEUE). If there is a queue then set in sock_flow_table instead of setting the running CPU. Subsequently, the receive queue for the flow can be programmed by aRFS logic (ndo_rx_flow_steer). --- include/linux/netdevice.h | 28 ++++++++++++++++++++++++---- net/core/dev.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ca163925211a..3b39be470720 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -731,12 +731,25 @@ static inline void rps_dev_flow_set_cpu(struct rps_dev_flow *dev_flow, u16 cpu) if (WARN_ON(cpu > RPS_MAX_CPU)) return; - /* Set the rflow target to the CPU atomically */ + /* Set the device flow target to the CPU atomically */ cpu_qid.use_qid = 0; cpu_qid.cpu = cpu; dev_flow->cpu_qid = cpu_qid; } +static inline void rps_dev_flow_set_qid(struct rps_dev_flow *dev_flow, u16 qid) +{ + struct rps_cpu_qid cpu_qid; + + if (WARN_ON(qid > RPS_MAX_QID)) + return; + + /* Set the device flow target to the CPU atomically */ + cpu_qid.use_qid = 1; + cpu_qid.qid = qid; + dev_flow->cpu_qid = cpu_qid; +} + /* * The rps_dev_flow_table structure contains a table of flow mappings. */ @@ -797,11 +810,18 @@ static inline void rps_record_sock_flow(struct rps_sock_flow_table *table, u32 hash) { if (table && hash) { - u32 val = hash & table->cpu_masks.hash_mask; unsigned int index = hash & table->mask; + u32 val; - /* We only give a hint, preemption can change CPU under us */ - val |= raw_smp_processor_id(); +#ifdef CONFIG_PER_THREAD_QUEUES + if (current->ptq_queues.rxq_id != NO_QUEUE) + val = RPS_SOCK_FLOW_USE_QID | + (hash & table->queue_masks.hash_mask) | + current->ptq_queues.rxq_id; + else +#endif + val = (hash & table->cpu_masks.hash_mask) | + raw_smp_processor_id(); if (table->ents[index] != val) table->ents[index] = val; diff --git a/net/core/dev.c b/net/core/dev.c index f4478c9b1c9c..1cad776e8847 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4308,6 +4308,25 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb, return rflow; } +static struct rps_dev_flow * +set_rps_qid(struct net_device *dev, struct sk_buff *skb, + struct rps_dev_flow *rflow, u16 qid) +{ + if (qid > RPS_MAX_QID) { + rps_dev_flow_clear(rflow); + return rflow; + } + +#ifdef CONFIG_RFS_ACCEL + /* Should we steer this flow to a different hardware queue? */ + if (skb_rx_queue_recorded(skb) && (dev->features & NETIF_F_NTUPLE) && + qid != skb_get_rx_queue(skb) && qid < dev->real_num_rx_queues) + set_arfs_queue(dev, skb, rflow, qid); +#endif + rps_dev_flow_set_qid(rflow, qid); + return rflow; +} + /* * get_rps_cpu is called from netif_receive_skb and returns the target * CPU from the RPS map of the receiving queue for a given skb. @@ -4356,6 +4375,10 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, /* First check into global flow table if there is a match */ ident = sock_flow_table->ents[hash & sock_flow_table->mask]; + + if (ident == RPS_SOCK_FLOW_NO_IDENT) + goto try_rps; + comparator = ((ident & RPS_SOCK_FLOW_USE_QID) ? sock_flow_table->queue_masks.hash_mask : sock_flow_table->cpu_masks.hash_mask); @@ -4372,8 +4395,21 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, * CPU. Proceed accordingly. */ if (ident & RPS_SOCK_FLOW_USE_QID) { + u16 dqid, gqid; + /* A queue identifier is in the sock_flow_table entry */ + gqid = ident & sock_flow_table->queue_masks.mask; + dqid = netdev_rx_gqid_to_dqid(dev, gqid); + + /* rflow has desired receive qid. Just set the qid in + * HW and return to use current CPU. Note that we + * don't consider OOO in this case. + */ + rflow = set_rps_qid(dev, skb, rflow, dqid); + + *rflowp = rflow; + /* Don't use aRFS to set CPU in this case, skip to * trying RPS */ From patchwork Wed Jun 24 17:17:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Herbert X-Patchwork-Id: 1316472 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=herbertland.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=herbertland-com.20150623.gappssmtp.com header.i=@herbertland-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=rXqb529D; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49sVJZ4vRqz9sSS for ; Thu, 25 Jun 2020 03:19:58 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405510AbgFXRT4 (ORCPT ); Wed, 24 Jun 2020 13:19:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405495AbgFXRTu (ORCPT ); Wed, 24 Jun 2020 13:19:50 -0400 Received: from mail-pj1-x1041.google.com (mail-pj1-x1041.google.com [IPv6:2607:f8b0:4864:20::1041]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DA9DC061573 for ; Wed, 24 Jun 2020 10:19:50 -0700 (PDT) Received: by mail-pj1-x1041.google.com with SMTP id d6so1418490pjs.3 for ; Wed, 24 Jun 2020 10:19:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/5gyd7jvxJZUbppQ1X68CyB7v126AL9t5W3BAs6GaxY=; b=rXqb529DtF/YuUK9qskzQ0Qn9nEqOj9aPV4jKAlmSCPFN4GJ/uj59yCEZozKARpjfE nNUoDwXVxVHgDt/FIOElIljTF0+y1N65dqW77/ovJxpmi7TWKrruSiwFLWM1jRz6Xeui Lb1RH8kQA6AnJv1wIl7a9EIlA+xbW+bqahxD0r+rsTGNz66xTq18ezF21OnyA9DFRwx0 LYkylQcoTGzO5CifL3N2JelaNsRZZwBJ9YdwhsIiS49W6vDscEXdUylaz/f7r8C5Nbxc /obDWChyL5dfcP5HoZXpp1mQfQ3c14F86ci2cGJbf2QPmnqaByDb7IZ2muJMK1nW4T3S 7llw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/5gyd7jvxJZUbppQ1X68CyB7v126AL9t5W3BAs6GaxY=; b=iZUsU9t2qO7wxGqaZC/XwqAhtTxM/oSuhrqSzVK6c7nk03DLCxyqOantI9mjz1fbgc d14B/Dt0ZEt30cq2benfq3U7U6+4Ys7s0EWEidwgE8X7UX5h7UoWYo1DoBmWGz2brQ7K H7rnfJPB7KdlCPbIQRS1HGYs+98L7pRWKJSV5buGOrCwUAUxbQtqZtY7KI4hBsSUOpSk 6F5y3bU3eAYrMBKbp9hpXx8A32tuw1OWVf94x4odew5WxziUDtFi/LtSd0Waoi8LHvVJ wSMzZFlY4hxndNU+Vfg4A0HTn6zR1Ym5/ZYn0lt9QdudeWF8mClxe5ualwjYwRZd203d 84sQ== X-Gm-Message-State: AOAM533tKr2qvMj6T5Ex/OztSLUX0o8V2DkSqgeEb8EDx+MoY7I29VrA JnI9npvAVY2QuojnhR1tkisQQVu3gkM= X-Google-Smtp-Source: ABdhPJxotMCEQ5yjrW2HzJVe+MPR0y1VlLmWWiKbJ3JIngBOzab8MSbGiPmBv56CNRG699mczYg9eA== X-Received: by 2002:a17:902:8b86:: with SMTP id ay6mr28572322plb.329.1593019188745; Wed, 24 Jun 2020 10:19:48 -0700 (PDT) Received: from localhost.localdomain (c-73-202-182-113.hsd1.ca.comcast.net. [73.202.182.113]) by smtp.gmail.com with ESMTPSA id w18sm17490241pgj.31.2020.06.24.10.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 10:19:47 -0700 (PDT) From: Tom Herbert To: netdev@vger.kernel.org Cc: Tom Herbert Subject: [RFC PATCH 11/11] doc: Documentation for Per Thread Queues Date: Wed, 24 Jun 2020 10:17:50 -0700 Message-Id: <20200624171749.11927-12-tom@herbertland.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624171749.11927-1-tom@herbertland.com> References: <20200624171749.11927-1-tom@herbertland.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a section on Per Thread Queues to scaling.rst. --- Documentation/networking/scaling.rst | 195 ++++++++++++++++++++++++++- 1 file changed, 194 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst index 8f0347b9fb3d..42f1dc639ab7 100644 --- a/Documentation/networking/scaling.rst +++ b/Documentation/networking/scaling.rst @@ -250,7 +250,7 @@ RFS: Receive Flow Steering While RPS steers packets solely based on hash, and thus generally provides good load distribution, it does not take into account application locality. This is accomplished by Receive Flow Steering -(RFS). The goal of RFS is to increase datacache hitrate by steering +(RFS). The goal of RFS is to increase datacache hit rate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running. RFS relies on the same RPS mechanisms to enqueue packets onto the backlog of another CPU and to wake up that @@ -508,6 +508,199 @@ a max-rate attribute is supported, by setting a Mbps value to:: A value of zero means disabled, and this is the default. +PTQ: Per Thread Queues +====================== + +Per Thread Queues allows application threads to be assigned dedicated +hardware network queues for both transmit and receive. This facility +provides a high degree of traffic isolation between applications and +can also help facilitate high performance due to fine grained packet +steering. + +PTQ has three major design components: + - A method to assign transmit and receive queues to threads + - A means to associate packets with threads and then to steer + those packets to the queues assigned to the threads + - Mechanisms to process the per thread hardware queues + +Global network queues +~~~~~~~~~~~~~~~~~~~~~ + +Global network queues are an abstraction of hardware networking +queues that can be used in generic non-device specific configuration. +Global queues may mapped to real device queues. The mapping is +performed on a per device queue basis. A device sysfs parameter +"global_queue_mapping" in queues/{tx,rx}- indicates the mapping +of a device queue to a global queue. Each device maintains a table +that maps global queues to device queues for the device. Note that +for a single device, the global to device queue mapping is 1 to 1, +however each device may map a global queue to a different device +queue. + +net_queues cgroup controller +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For assigning queues to the threads, a cgroup controller named +"net_queues" is used. A cgroup can be configured with pools of transmit +and receive global queues from which individual threads are assigned +queues. The contents of the net_queues controller are described below in +the configuration section. + +Handling PTQ in the transmit path +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a socket operation is performed that may result in sending packets +(i.e. listen, accept, sendmsg, sendpage), the task structure for the +current thread is consulted to see if there is an assigned transmit +queue for the thread. If there is a queue assignment, the queue index is +set in a field of the sock structure for the corresponding socket. +Subsequently, when transmit queue selection is performed, the sock +structure associated with packet being sent is consulted. If a transmit +global queue is set in the sock then that index is mapped to a device +queue for the output networking device. If a valid device queue is +discovered then that queue is used, else if a device queue is not found +then queue selection proceeds to XPS. + +Handling PTQ in the receive path +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The receive path uses the infrastructure of RFS which is extended +to steer based on the assigned received global queue for a thread in +addition to steering based on the CPU. The rps_sock_flow_table is +modified to contain either the desired CPU for flows or the desired +receive global queue. A queue is updated at the same time that the +desired CPU would updated during calls to recvmsg and sendmsg (see RFS +description above). The process is to consult the running task structure +to see if a receive queue is assigned to the task. If a queue is assigned +to the task then the corresponding queue index is set in the +rps_sock_flow_table; if no queue is assigned then the current CPU is +set as the desired per canonical RFS. + +When packets are received, the rps_sock_flow table is consulted to check +if they were received on the proper queue. If the rps_sock_flow_table +entry for a corresponding flow of a received packet contains a global +queue index, then the index is mapped to a device queue on the received +device. If the mapped device queue is equal to the receive queue then +packets are being steered properly. If there is a mismatch then the +local flow to queue mapping in the device is changed and +ndo_rx_flow_steer is invoked to set the receive queue for the flow in +the device as described in the aRFS section. + +Processing queues in Per Queue Threads +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When Per Queue Threads is used, the queue "follows" the thread. So when +a thread is rescheduled from one queue to another we expect that the +processing of the device queues that map to the thread are processed on +the CPU where the thread is currently running. This is a bit tricky +especially with respect to the canonical device interrupt driven model. +There are at least three possible approaches: + - Arrange for interrupts to follow threads as they are + rescheduled, or alternatively pin threads to CPUs and + statically configure the interrupt mappings for the queues for + each thread + - Use busy polling + - Use "sleeping busy-poll" with completion queues. The basic + idea is to have one CPU busy poll a device completion queue + that reports device queues with received or completed transmit + packets. When a queue is ready, the thread associated with the + queue (derived by reverse mapping the queue back to its + assigned thread) is scheduled. When the thread runs it polls + its queues to process any packets. + +Future work may further elaborate on solutions in this area. + +Reducing flow state in devices +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +PTQ (and aRFS as well) potentially create per flow state in a device. +This is costly in at least two ways: 1) State requires device memory +which is almost always much less than host memory can and thus the +number of flows that can be instantiated in a device are less than that +in the host. 2) State requires instantiation and synchronization +messages, i.e. ndo_rx_flow_steer causes a message over PCIe bus; if +there is a highly turnover rate of connections this messaging becomes +a bottleneck. + +Mitigations to reduce the amount of flow state in the device should be +considered. + +In PTQ (and aRFS) the device flow state is a considered cache. A flow +entry is only set in the device on a cache miss which occurs when the +receive queue for a packet doesn't match the desired receive queue. So +conceptually, if a packets for a flow are always received on the desired +queue from the beginning of the flow then a flow state might never need +to be instantiated in the device. This motivates a strategy to try to +use stateless steering mechanisms before resorting to stateful ones. + +As an example of applying this strategy, consider an application that +creates four threads where each threads creates a TCP listener socket +for some port that is shared amongst the threads via SO_REUSEPORT. +Four global queues can be assigned to the application (via a cgroup +for the application), and a filter rule can be set up in each device +that matches the listener port and any bound destination address. The +filter maps to a set of four device queues that map to the four global +queues for the application. When a packet is received that matches the +filter, one of the four queues is chosen via a hash over the packet's +four tuple. So in this manner, packets for the application are +distributed amongst the four threads. As long as processing for sockets +doesn't move between threads and the number of listener threads is +constant then packets are always received on the desired queue and no +flow state needs to be instantiated. In practice, we want to allow +elasticity in applications to create and destroy threads on demand, so +additional techniques, such as consistent hashing, are probably needed. + +Per Thread Queues Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Per Thread Queues is only available if the kernel is compiled with +CONFIG_PER_THREAD_QUEUES. For PTQ in the receive path, aRFS needs to be +supported and configured (see aRFS section above). + +The net_queues cgroup controller is in: + /sys/fs/cgroup//net_queues + +The net_queues controller contains the following attributes: + - tx-queues, rx-queues + Specifies the transmit queue pool and receive queue pool + respectively as a range of global queue indices. The + format of these entries is ":" where + is the first queue index in the pool, and + is the number of queues in the range of pool. + If is zero the queue pool is empty. + - tx-assign,rx-assign + Boolean attributes ("0" or "1") that indicate unique + queue assignment from the respective transmit or receive + queue pool. When the "assign" attribute is enabled, a + thread is assigned a queue that is not already assigned + to another thread. + - symmetric + A boolean attribute ("0" or "1") that indicates the + receive and transmit queue assignment for a thread + should be the same. That is the assigned transmit queue + index is equal to the assigned receive queue index. + - task-queues + A read-only attribute that lists the threads of the + cgroup and their assigned queues. + +The mapping of global queues to device queues is in: + + /sys/class/net//queues/tx-/global_queue_mapping + -and - + /sys/class/net//queues/rx-/global_queue_mapping + +A value of "none" indicates no mapping, an integer value (up to +a maximum of 32,766) indicates a global queue. + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~ + +Unlike aRFS, PTQ requires per application application configuration. To +most effectively use PTQ some understanding of the threading model of +the application is warranted. The section above describes one possible +configuration strategy for a canonical application using SO_REUSEPORT. + + Further Information =================== RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into