From patchwork Wed Jun 19 22:31:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1119046 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Okqdd3FC"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Tfp72rsYz9s5c for ; Thu, 20 Jun 2019 08:32:15 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730020AbfFSWcN (ORCPT ); Wed, 19 Jun 2019 18:32:13 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:36300 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726246AbfFSWcN (ORCPT ); Wed, 19 Jun 2019 18:32:13 -0400 Received: by mail-pl1-f194.google.com with SMTP id k8so475861plt.3 for ; Wed, 19 Jun 2019 15:32:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6K7yZXHFPP2wKshtf+TUy7FQS3tnc0HHzK+82UP50PA=; b=Okqdd3FCp0+ps634JPbzpufWGpAf86ZW5jt/HDB55XYUe59sUQlcD564ZqByKV2puf SHwOm2J/6P1cGlU/jVfXghUMgr8Hw5ahN5re9VFqfqJRn1ZJUBS1suu8YlLyrjHNC2cM yGIaNBFgwC7e5Z9EX6fqlf3kEwRQFuSFedsk4e1ScbszsqJOPR4N7LAI0OU7Wf3DcubL iJUnKLnEtzl/XVMOcDPu4LGHr3jheMplfOPNGLCSKwMjKTnhxkTjLGg0HM/Xug3nKZ88 fuzEuLTQoKUM8MA1EGpE1K/ZjnMnIu5x9aepyfDKR2evMbwoQBUnoNlIAAa8N8j9PhWh VDtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6K7yZXHFPP2wKshtf+TUy7FQS3tnc0HHzK+82UP50PA=; b=ZJkaJ9m3omySbgtpEs0TfH2EyCqCeIVJqzq2AchRxQSrAaulg+fDemUSnJ7vBMbV2Y +lsaVx31/6xhgxDWgkRH6dZDLJGTchrWhDiGeqRA76JkvybxeT7pI4BAgUe8AyA1y3rr 3xvlZ+ans7ACZp1lAIO+lVFBfbo6sm2NeR/6K9v344QpTjJHci0KZPb5eAkdzWjJwyTK 8ZvGuMM8DGCED5WhF+z+KLj+PnH8C3Uy0iuB/Vf2Rt7ltw2CcQjlKTv625nQ1Akq+oc7 7ZS8NexwFyZQwlPTT25SWR+ayHDzDm97uEQG5sbkOSBwa6HQWQXttFrVtl7PkXWmDhuG HjMg== X-Gm-Message-State: APjAAAVWBqvh5gymIYk3N9S5myWauzwE3zWgFfXon6qfBMSlQ+o+8dP3 HVyeOezWAVJnre8tyQiKiRs= X-Google-Smtp-Source: APXvYqykrNTV/ZAIqf12GMXD47PLtwDLLELfwFkMRvQEmN7ObLeMpxBqnQMjUu23NOR/uN9QBORYKQ== X-Received: by 2002:a17:902:8c83:: with SMTP id t3mr94700032plo.93.1560983532523; Wed, 19 Jun 2019 15:32:12 -0700 (PDT) Received: from weiwan0.svl.corp.google.com ([2620:15c:2c4:201:9310:64cb:677b:dcba]) by smtp.gmail.com with ESMTPSA id g8sm20037687pgd.29.2019.06.19.15.32.11 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 19 Jun 2019 15:32:11 -0700 (PDT) From: Wei Wang To: David Miller , netdev@vger.kernel.org Cc: Eric Dumazet , Mahesh Bandewar , Martin KaFai Lau , David Ahern , Wei Wang Subject: [PATCH v2 net-next 1/5] ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route() Date: Wed, 19 Jun 2019 15:31:54 -0700 Message-Id: <20190619223158.35829-2-tracywwnj@gmail.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190619223158.35829-1-tracywwnj@gmail.com> References: <20190619223158.35829-1-tracywwnj@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Wei Wang This new flag is to instruct the route lookup function to not take refcnt on the dst entry. The user which does route lookup with this flag must properly use rcu protection. ip6_pol_route() is the major route lookup function for both tx and rx path. In this function: Do not take refcnt on dst if RT6_LOOKUP_F_DST_NOREF flag is set, and directly return the route entry. The caller should be holding rcu lock when using this flag, and decide whether to take refcnt or not. One note on the dst cache in the uncached_list: As uncached_list does not consume refcnt, one refcnt is always returned back to the caller even if RT6_LOOKUP_F_DST_NOREF flag is set. Uncached dst is only possible in the output path. So in such call path, caller MUST check if the dst is in the uncached_list before assuming that there is no refcnt taken on the returned dst. Signed-off-by: Wei Wang Acked-by: Eric Dumazet Acked-by: Mahesh Bandewar --- include/net/ip6_route.h | 1 + net/ipv6/route.c | 73 +++++++++++++++++------------------------ 2 files changed, 31 insertions(+), 43 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 7375a165fd98..82bced2fc1e3 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -36,6 +36,7 @@ struct route_info { #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010 #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020 #define RT6_LOOKUP_F_IGNORE_LINKSTATE 0x00000040 +#define RT6_LOOKUP_F_DST_NOREF 0x00000080 /* We do not (yet ?) support IPv6 jumbograms (RFC 2675) * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header diff --git a/net/ipv6/route.c b/net/ipv6/route.c index c4d285fe0adc..9dcbc56e4151 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1391,9 +1391,6 @@ static struct rt6_info *rt6_get_pcpu_route(const struct fib6_result *res) pcpu_rt = this_cpu_read(*res->nh->rt6i_pcpu); - if (pcpu_rt) - ip6_hold_safe(NULL, &pcpu_rt); - return pcpu_rt; } @@ -1403,12 +1400,9 @@ static struct rt6_info *rt6_make_pcpu_route(struct net *net, struct rt6_info *pcpu_rt, *prev, **p; pcpu_rt = ip6_rt_pcpu_alloc(res); - if (!pcpu_rt) { - dst_hold(&net->ipv6.ip6_null_entry->dst); - return net->ipv6.ip6_null_entry; - } + if (!pcpu_rt) + return NULL; - dst_hold(&pcpu_rt->dst); p = this_cpu_ptr(res->nh->rt6i_pcpu); prev = cmpxchg(p, NULL, pcpu_rt); BUG_ON(prev); @@ -2189,9 +2183,12 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, const struct sk_buff *skb, int flags) { struct fib6_result res = {}; - struct rt6_info *rt; + struct rt6_info *rt = NULL; int strict = 0; + WARN_ON_ONCE((flags & RT6_LOOKUP_F_DST_NOREF) && + !rcu_read_lock_held()); + strict |= flags & RT6_LOOKUP_F_IFACE; strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE; if (net->ipv6.devconf_all->forwarding == 0) @@ -2200,23 +2197,15 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, rcu_read_lock(); fib6_table_lookup(net, table, oif, fl6, &res, strict); - if (res.f6i == net->ipv6.fib6_null_entry) { - rt = net->ipv6.ip6_null_entry; - rcu_read_unlock(); - dst_hold(&rt->dst); - return rt; - } + if (res.f6i == net->ipv6.fib6_null_entry) + goto out; fib6_select_path(net, &res, fl6, oif, false, skb, strict); /*Search through exception table */ rt = rt6_find_cached_rt(&res, &fl6->daddr, &fl6->saddr); if (rt) { - if (ip6_hold_safe(net, &rt)) - dst_use_noref(&rt->dst, jiffies); - - rcu_read_unlock(); - return rt; + goto out; } else if (unlikely((fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH) && !res.nh->fib_nh_gw_family)) { /* Create a RTF_CACHE clone which will not be @@ -2224,40 +2213,38 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, * the daddr in the skb during the neighbor look-up is different * from the fl6->daddr used to look-up route here. */ - struct rt6_info *uncached_rt; + rt = ip6_rt_cache_alloc(&res, &fl6->daddr, NULL); - uncached_rt = ip6_rt_cache_alloc(&res, &fl6->daddr, NULL); - - rcu_read_unlock(); - - if (uncached_rt) { - /* Uncached_rt's refcnt is taken during ip6_rt_cache_alloc() - * No need for another dst_hold() + if (rt) { + /* 1 refcnt is taken during ip6_rt_cache_alloc(). + * As rt6_uncached_list_add() does not consume refcnt, + * this refcnt is always returned to the caller even + * if caller sets RT6_LOOKUP_F_DST_NOREF flag. */ - rt6_uncached_list_add(uncached_rt); + rt6_uncached_list_add(rt); atomic_inc(&net->ipv6.rt6_stats->fib_rt_uncache); - } else { - uncached_rt = net->ipv6.ip6_null_entry; - dst_hold(&uncached_rt->dst); - } + rcu_read_unlock(); - return uncached_rt; + return rt; + } } else { /* Get a percpu copy */ - - struct rt6_info *pcpu_rt; - local_bh_disable(); - pcpu_rt = rt6_get_pcpu_route(&res); + rt = rt6_get_pcpu_route(&res); - if (!pcpu_rt) - pcpu_rt = rt6_make_pcpu_route(net, &res); + if (!rt) + rt = rt6_make_pcpu_route(net, &res); local_bh_enable(); - rcu_read_unlock(); - - return pcpu_rt; } +out: + if (!rt) + rt = net->ipv6.ip6_null_entry; + if (!(flags & RT6_LOOKUP_F_DST_NOREF)) + ip6_hold_safe(net, &rt); + rcu_read_unlock(); + + return rt; } EXPORT_SYMBOL_GPL(ip6_pol_route); From patchwork Wed Jun 19 22:31:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1119047 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="HUY5aA6y"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45Tfp83nD9z9s9y for ; Thu, 20 Jun 2019 08:32:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730248AbfFSWcP (ORCPT ); Wed, 19 Jun 2019 18:32:15 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:42239 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726246AbfFSWcO (ORCPT ); Wed, 19 Jun 2019 18:32:14 -0400 Received: by mail-pl1-f193.google.com with SMTP id ay6so460276plb.9 for ; Wed, 19 Jun 2019 15:32:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZBRiDHj2Qi4Q8hZbrg7ij7OVMKRMOh5DuaaYEMelFvk=; b=HUY5aA6y1auGd5N/OSsL/rBwJnGsrQYOGAwPeHKlmTfAI0QYoF6b59g5VZmZ3Mm4+t jTFje7J4DNxxy0pZ94KFlankJ5GzjssHmGD+B8SBvLWo+piDvk/tz3hRaTgvLGRRdZl9 YnIeQAtTXJ6DVTZV21gsmlKG5SwNUvfFuyOXLPVoJoStIgOB0xV+uW5F7eX2kJGKTKqP MnD29HCKLPd3T9qCKGiij2gMfFF2Sx5sS3lbyu/pTBcvkFPO7aeG0eKHSVhmAPFf08dA GfhRpmaXEegxcnWLzb4O3qo9ttXSS5h4Ic85RB7I/cL5Rxbj1iDsmtRxRTKbhO2SbwRg T4HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZBRiDHj2Qi4Q8hZbrg7ij7OVMKRMOh5DuaaYEMelFvk=; b=Xu8QwmS65V7NKR2U6o4dmSUPu52OFRyrlYLEbhHALKyvS5yUTiK5E/Cdf0SKQKfGJw wvM1MmHiDnxjfWOetd6nylfsoKTjerFRtJLEtWLd8EeEMHkCYlJHkN8dsnLr0rOy5ICW SyWFFRunB0woaWpnaq3xZN8IzVne9DayTp/64UAiT49EAQaY/yAU/skfvP577BYOeV2r +60rKsROiVexNE3QEZefCA1DJGVhzAYjKhAiEYwoUwXPpSYEGCcv5cL3srZd+lHpVWbI MfKzyBDCwD5juIC7KrZyyOSbmjEFxbUE6JCKqW2VrEXipk4cgKQm0n5zfuInRBU3jSwi TkyA== X-Gm-Message-State: APjAAAX5yyw0J35PLo8GP+hBh40xdI+MFbvkdec2aS/nC+knHMzKhwda yg/GAWHoBOnvdFHq0OddTyw= X-Google-Smtp-Source: APXvYqzaF6BlpRd0ly+G8f4Dz1zTc5bm7pIaCfYpzaBlGwgJtVlpaUhBfbtSRSirWl+sRplK2wZiLA== X-Received: by 2002:a17:902:aa0a:: with SMTP id be10mr115480053plb.27.1560983533625; Wed, 19 Jun 2019 15:32:13 -0700 (PDT) Received: from weiwan0.svl.corp.google.com ([2620:15c:2c4:201:9310:64cb:677b:dcba]) by smtp.gmail.com with ESMTPSA id g8sm20037687pgd.29.2019.06.19.15.32.12 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 19 Jun 2019 15:32:12 -0700 (PDT) From: Wei Wang To: David Miller , netdev@vger.kernel.org Cc: Eric Dumazet , Mahesh Bandewar , Martin KaFai Lau , David Ahern , Wei Wang Subject: [PATCH v2 net-next 2/5] ipv6: initialize rt6->rt6i_uncached in all pre-allocated dst entries Date: Wed, 19 Jun 2019 15:31:55 -0700 Message-Id: <20190619223158.35829-3-tracywwnj@gmail.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190619223158.35829-1-tracywwnj@gmail.com> References: <20190619223158.35829-1-tracywwnj@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Wei Wang Initialize rt6->rt6i_uncached on the following pre-allocated dsts: net->ipv6.ip6_null_entry net->ipv6.ip6_prohibit_entry net->ipv6.ip6_blk_hole_entry This is a preparation patch for later commits to be able to distinguish dst entries in uncached list by doing: !list_empty(rt6->rt6i_uncached) Signed-off-by: Wei Wang Acked-by: Eric Dumazet Acked-by: Mahesh Bandewar --- net/ipv6/route.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 9dcbc56e4151..33dc8af9a4bf 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -6010,6 +6010,7 @@ static int __net_init ip6_route_net_init(struct net *net) net->ipv6.ip6_null_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_null_entry->dst, ip6_template_metrics, true); + INIT_LIST_HEAD(&net->ipv6.ip6_null_entry->rt6i_uncached); #ifdef CONFIG_IPV6_MULTIPLE_TABLES net->ipv6.fib6_has_custom_rules = false; @@ -6021,6 +6022,7 @@ static int __net_init ip6_route_net_init(struct net *net) net->ipv6.ip6_prohibit_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_prohibit_entry->dst, ip6_template_metrics, true); + INIT_LIST_HEAD(&net->ipv6.ip6_prohibit_entry->rt6i_uncached); net->ipv6.ip6_blk_hole_entry = kmemdup(&ip6_blk_hole_entry_template, sizeof(*net->ipv6.ip6_blk_hole_entry), @@ -6030,6 +6032,7 @@ static int __net_init ip6_route_net_init(struct net *net) net->ipv6.ip6_blk_hole_entry->dst.ops = &net->ipv6.ip6_dst_ops; dst_init_metrics(&net->ipv6.ip6_blk_hole_entry->dst, ip6_template_metrics, true); + INIT_LIST_HEAD(&net->ipv6.ip6_blk_hole_entry->rt6i_uncached); #endif net->ipv6.sysctl.flush_delay = 0; From patchwork Wed Jun 19 22:31:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1119048 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="rIdl9wAT"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45TfpB2qjqz9s5c for ; Thu, 20 Jun 2019 08:32:18 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730449AbfFSWcR (ORCPT ); Wed, 19 Jun 2019 18:32:17 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:34208 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730182AbfFSWcP (ORCPT ); Wed, 19 Jun 2019 18:32:15 -0400 Received: by mail-pf1-f196.google.com with SMTP id c85so441860pfc.1 for ; Wed, 19 Jun 2019 15:32:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UtXeAU0owx71Zr1h7JGJYmNFV8PORkz4rS8g11qc07g=; b=rIdl9wATc5QuJqo0iIr9w3OwayT7lC7QaTNgAEq2UEdxAUaBSRL6pgOZ7vmMr8hJsx 3iv8Kq5udC4uVCPSW2DlZft/AXpE0SO6pnaI9U260indjMwvnpGLdz819h4Vd7hwB/j1 aMdGscakUC+eQUoPbSXU0B/9hmPdt+6t6dVEkvCrWkiWS/kvIteaz5cJv/QzFGY4jG/D ewLka2nn4nn9gz9YtZaZeUpm5dlnNBnlU14W5xGiUKC3mxSGN88J7m7J35wMJHGBU4RV 4WYE/4/DtBHcDlBQ2E0YEOs4jnm1l3mpCm3OWZPn50PpxkBQBqzpgBOb0FRVy161uNJ0 teDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UtXeAU0owx71Zr1h7JGJYmNFV8PORkz4rS8g11qc07g=; b=SmQq4wxYtBz9fZ6P1sboQi0IBmmWxj+qcl5Z8bipCXv9mQAVz6r0WhQt69IOnuOhl5 bm4OGCWDI1pduSlidyabD+N3reZ+/W+CJ9SJf8Hdl5VdbS+8uqxDgdK0DZMV4DkVbtlp jlVhFfQcrjVjLwC256lY+KURNZTlbfOP27BrM211Z/lZ7SZzwETdoMSA5MBmmhC0c0Ev W7DpT+hLfTHE6Gh22fN5EXdyIpijlklWV1LLnnHK440JLPLbxGauablzmMwF9LKKTZOQ benJFebcIsjyaHgbIVzMq+zc+vbpVMUZbuolQ9wkUshkUBzQ2cnicJH2BwjbCr/FOHZs 6nvQ== X-Gm-Message-State: APjAAAVlJJ0egZao0DtyJem37P+G5157UJH3wSNTlzGiCtu8qvTvrrvZ SSKX+O98/xwo02qRW/ACORI= X-Google-Smtp-Source: APXvYqz6rFKEL2TUtZ3/6scnDRSKH4udbCrXQkvEFux7C8+d81+E1eGZA56ewT/3q+3uwqHbBKOehA== X-Received: by 2002:a17:90a:208e:: with SMTP id f14mr13825998pjg.57.1560983534780; Wed, 19 Jun 2019 15:32:14 -0700 (PDT) Received: from weiwan0.svl.corp.google.com ([2620:15c:2c4:201:9310:64cb:677b:dcba]) by smtp.gmail.com with ESMTPSA id g8sm20037687pgd.29.2019.06.19.15.32.13 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 19 Jun 2019 15:32:14 -0700 (PDT) From: Wei Wang To: David Miller , netdev@vger.kernel.org Cc: Eric Dumazet , Mahesh Bandewar , Martin KaFai Lau , David Ahern , Wei Wang Subject: [PATCH v2 net-next 3/5] ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic Date: Wed, 19 Jun 2019 15:31:56 -0700 Message-Id: <20190619223158.35829-4-tracywwnj@gmail.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190619223158.35829-1-tracywwnj@gmail.com> References: <20190619223158.35829-1-tracywwnj@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Wei Wang This patch specifically converts the rule lookup logic to honor this flag and not release refcnt when traversing each rule and calling lookup() on each routing table. Similar to previous patch, we also need some special handling of dst entries in uncached list because there is always 1 refcnt taken for them even if RT6_LOOKUP_F_DST_NOREF flag is set. Signed-off-by: Wei Wang --- include/net/ip6_route.h | 10 ++++++++++ net/ipv6/fib6_rules.c | 12 +++++++----- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 82bced2fc1e3..0709835c01ad 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -94,6 +94,16 @@ static inline struct dst_entry *ip6_route_output(struct net *net, return ip6_route_output_flags(net, sk, fl6, 0); } +/* Only conditionally release dst if flags indicates + * !RT6_LOOKUP_F_DST_NOREF or dst is in uncached_list. + */ +static inline void ip6_rt_put_flags(struct rt6_info *rt, int flags) +{ + if (!(flags & RT6_LOOKUP_F_DST_NOREF) || + !list_empty(&rt->rt6i_uncached)) + ip6_rt_put(rt); +} + struct dst_entry *ip6_route_lookup(struct net *net, struct flowi6 *fl6, const struct sk_buff *skb, int flags); struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c index bcfae13409b5..d22b6c140f23 100644 --- a/net/ipv6/fib6_rules.c +++ b/net/ipv6/fib6_rules.c @@ -113,14 +113,15 @@ struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6, rt = lookup(net, net->ipv6.fib6_local_tbl, fl6, skb, flags); if (rt != net->ipv6.ip6_null_entry && rt->dst.error != -EAGAIN) return &rt->dst; - ip6_rt_put(rt); + ip6_rt_put_flags(rt, flags); rt = lookup(net, net->ipv6.fib6_main_tbl, fl6, skb, flags); if (rt->dst.error != -EAGAIN) return &rt->dst; - ip6_rt_put(rt); + ip6_rt_put_flags(rt, flags); } - dst_hold(&net->ipv6.ip6_null_entry->dst); + if (!(flags & RT6_LOOKUP_F_DST_NOREF)) + dst_hold(&net->ipv6.ip6_null_entry->dst); return &net->ipv6.ip6_null_entry->dst; } @@ -237,13 +238,14 @@ static int __fib6_rule_action(struct fib_rule *rule, struct flowi *flp, goto out; } again: - ip6_rt_put(rt); + ip6_rt_put_flags(rt, flags); err = -EAGAIN; rt = NULL; goto out; discard_pkt: - dst_hold(&rt->dst); + if (!(flags & RT6_LOOKUP_F_DST_NOREF)) + dst_hold(&rt->dst); out: res->rt6 = rt; return err; From patchwork Wed Jun 19 22:31:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1119049 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="TdjyCxLK"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45TfpD47VFz9s5c for ; Thu, 20 Jun 2019 08:32:20 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730574AbfFSWcT (ORCPT ); Wed, 19 Jun 2019 18:32:19 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:45586 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730259AbfFSWcQ (ORCPT ); Wed, 19 Jun 2019 18:32:16 -0400 Received: by mail-pg1-f195.google.com with SMTP id s21so421046pga.12 for ; Wed, 19 Jun 2019 15:32:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2F/7HrSaq4HN4leZCeGphvq0LzpgNEDY/DjINVDcoZ8=; b=TdjyCxLKljYJgdc3avWSMPBdR3f/4Rd6iJpo9SpTQe1k7YBmtwTHjSDr2+WTf1HjOh ciqeBJPzZ3dAZ+FLAR4qRk6TWZWI6hgzI7v+xSyQNJC0wGpMqQjDUNddyuwek+m+z1cE 1rkgLIYUTECr6wBGP3cAIvvp+5ptMjL/HDiYrgvoO8xFSFXQ1DPx9s6zUQDSgUbqdO1B l1g7CBA+75hYKABOWfpvSo9mZdecfbFe+4Uz+rVYSgVJWxrvq7cNGaEgOeMnFbJXoObq KmZU4jeAItFAZMLxvILzhjMOUZ6q3y7kRxFk/Rxrj3aCOjxjFUiK+meC6ez9l+pP1puS RAXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2F/7HrSaq4HN4leZCeGphvq0LzpgNEDY/DjINVDcoZ8=; b=h+Oz28rVKqcNxNNBTk8ctO2ADUkDsa3siXUY5J3PozXUGq+rvNEX3O9ZZPCPVwawsC +6PkKCp6/En/0/eDP4Vkbt/fb3VR8ZYehtsVFEKwZniiffNaxiZ1nYp7rG7XNDKJkyJh xdt23PtYBV/nDPqDektSJC6LnREqnx/swtck+ccThN8uB+ILvbrMNbKiT8g+zNI3tikQ APCcoIvAQ1In2EyJqjiDTGMKlidOYss6AZyqL3OqaJ+srCw5xE32dOtfNgv+yRj/Ck2W w1j8EorkP3C620ueFjBCMXTtFEARh/s6OLDlo6y1hJ2+g37KlpfsIWalRNy5j9JJt3sP ibTA== X-Gm-Message-State: APjAAAV3AqEtW1YSCqWAQcNDwNUljISgLNR4Jf/jJ9hdXJ4fTF1DIZH3 8q0ExyWL0mesO4Kepw7VLug= X-Google-Smtp-Source: APXvYqzKlFLjUHKssMHCPrxwAPvpEVn0K94MJzc4K+cYQAOLEkApGJKyYGgj2Auf6VXXenH4xN0yPg== X-Received: by 2002:a17:90b:d8b:: with SMTP id bg11mr13647863pjb.30.1560983535930; Wed, 19 Jun 2019 15:32:15 -0700 (PDT) Received: from weiwan0.svl.corp.google.com ([2620:15c:2c4:201:9310:64cb:677b:dcba]) by smtp.gmail.com with ESMTPSA id g8sm20037687pgd.29.2019.06.19.15.32.14 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 19 Jun 2019 15:32:15 -0700 (PDT) From: Wei Wang To: David Miller , netdev@vger.kernel.org Cc: Eric Dumazet , Mahesh Bandewar , Martin KaFai Lau , David Ahern , Wei Wang Subject: [PATCH v2 net-next 4/5] ipv6: convert rx data path to not take refcnt on dst Date: Wed, 19 Jun 2019 15:31:57 -0700 Message-Id: <20190619223158.35829-5-tracywwnj@gmail.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190619223158.35829-1-tracywwnj@gmail.com> References: <20190619223158.35829-1-tracywwnj@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Wei Wang ip6_route_input() is the key function to do the route lookup in the rx data path. All the callers to this function are already holding rcu lock. So it is fairly easy to convert it to not take refcnt on the dst: We pass in flag RT6_LOOKUP_F_DST_NOREF and do skb_dst_set_noref(). This saves a few atomic inc or dec operations and should boost performance overall. This also makes the logic more aligned with v4. Signed-off-by: Wei Wang Acked-by: Eric Dumazet Acked-by: Mahesh Bandewar --- net/ipv6/route.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 33dc8af9a4bf..d2b287635aab 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2375,11 +2375,12 @@ u32 rt6_multipath_hash(const struct net *net, const struct flowi6 *fl6, return mhash >> 1; } +/* Called with rcu held */ void ip6_route_input(struct sk_buff *skb) { const struct ipv6hdr *iph = ipv6_hdr(skb); struct net *net = dev_net(skb->dev); - int flags = RT6_LOOKUP_F_HAS_SADDR; + int flags = RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_DST_NOREF; struct ip_tunnel_info *tun_info; struct flowi6 fl6 = { .flowi6_iif = skb->dev->ifindex, @@ -2401,8 +2402,8 @@ void ip6_route_input(struct sk_buff *skb) if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6)) fl6.mp_hash = rt6_multipath_hash(net, &fl6, skb, flkeys); skb_dst_drop(skb); - skb_dst_set(skb, - ip6_route_input_lookup(net, skb->dev, &fl6, skb, flags)); + skb_dst_set_noref(skb, ip6_route_input_lookup(net, skb->dev, + &fl6, skb, flags)); } static struct rt6_info *ip6_pol_route_output(struct net *net, From patchwork Wed Jun 19 22:31:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1119050 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LIE4Dg9w"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 45TfpG6ZWbz9s5c for ; Thu, 20 Jun 2019 08:32:22 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730587AbfFSWcW (ORCPT ); Wed, 19 Jun 2019 18:32:22 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:43833 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730182AbfFSWcS (ORCPT ); Wed, 19 Jun 2019 18:32:18 -0400 Received: by mail-pg1-f195.google.com with SMTP id f25so426688pgv.10 for ; Wed, 19 Jun 2019 15:32:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UfXiWrDNVd1YNRVEVnBNuKA+zj+k+vrtb3sOAGihE6M=; b=LIE4Dg9wIC2wrdqfWS6+XJhikgClvIFQtkqHr7SLOVGp/Z/UDxW/EY1gGz7fv7GpNJ PQx5Y+n4N1L46wXTnmBpWQpsBTk0SqKkbIXAaVvmNLEfRQCYAZJSHcpbnd05hwtPCKKf sUpxf/4IP6Gc0VpU6joyLyhLyvb2CsyUHR9Jgqrq10F4iJJdl59R4VSRq2scQ1KseR2c 9fkVgbkvHc/XtbOZJpEJwv8w/PSfPRR174Huh0iKuhZeVdu0p762HxYYYiUmMX2AU0Py t/3zXu4gywpjQLvcvuArTmH4VUkYQxE/azb7amOZLDZTi6As005m9rTWpJTHe42dWVN7 INFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UfXiWrDNVd1YNRVEVnBNuKA+zj+k+vrtb3sOAGihE6M=; b=G718F0TY3kfiuUiHTQVdnznS1Ggbz4qKdtCUiq/h4d7DXGWzTUId9g67IM3G6GNLAu gmSNPJqkAGINVJW9kzYs0lMNUkmojSSkASgtiIk2gAqxFNXtdqcOhgOOUZ9shktL9+i2 04L17Yd1v8JW6l5X3Dx5nZcxW0rqkck8DI8FCTKKDp9E8BHAMpFRFi3qmtqh44CSa93N wkiRJ4DE0HZxzsrU/vNKnGqtR9P4RWEvxfwpKnJQQVevlIE3TQLCTjKosUBbzEp7rLIF F1Bku/Sm18aLu/cVG8/ylM2WXlk4iBOamUdRRkV1Ue6mZ7GcJ5UAqsE/QLXB+82GqCks KB6A== X-Gm-Message-State: APjAAAVG/HNan6uZgBFpmOd/lFgiZiSg8kamJUp51F/+8WJo+W0ZNPJ0 OO1gGa04mNQ1n5gXfzFbg+4= X-Google-Smtp-Source: APXvYqzngSsrBQQBOa8Qw2e3D8L+EfLEmwpOQH+fc2fEbuNLGab/lKnjgXQs6eD4NNl+jIGYQ067RA== X-Received: by 2002:a17:90a:a601:: with SMTP id c1mr13074061pjq.24.1560983537075; Wed, 19 Jun 2019 15:32:17 -0700 (PDT) Received: from weiwan0.svl.corp.google.com ([2620:15c:2c4:201:9310:64cb:677b:dcba]) by smtp.gmail.com with ESMTPSA id g8sm20037687pgd.29.2019.06.19.15.32.15 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 19 Jun 2019 15:32:16 -0700 (PDT) From: Wei Wang To: David Miller , netdev@vger.kernel.org Cc: Eric Dumazet , Mahesh Bandewar , Martin KaFai Lau , David Ahern , Wei Wang Subject: [PATCH v2 net-next 5/5] ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF Date: Wed, 19 Jun 2019 15:31:58 -0700 Message-Id: <20190619223158.35829-6-tracywwnj@gmail.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190619223158.35829-1-tracywwnj@gmail.com> References: <20190619223158.35829-1-tracywwnj@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Wei Wang For tx path, in most cases, we still have to take refcnt on the dst cause the caller is caching the dst somewhere. But it still is beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the route lookup. It is cause this flag prevents manipulating refcnt on net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each routing table. The null_entry is a shared object and constant updates on it cause false sharing. We converted the current major lookup function ip6_route_output_flags() to make use of RT6_LOOKUP_F_DST_NOREF. Together with the change in the rx path, we see noticable performance boost: I ran synflood tests between 2 hosts under the same switch. Both hosts have 20G mlx NIC, and 8 tx/rx queues. Sender sends pure SYN flood with random src IPs and ports using trafgen. Receiver has a simple TCP listener on the target port. Both hosts have multiple custom rules: - For incoming packets, only local table is traversed. - For outgoing packets, 3 tables are traversed to find the route. The packet processing rate on the receiver is as follows: - Before the fix: 3.78Mpps - After the fix: 5.50Mpps Signed-off-by: Wei Wang Acked-by: Eric Dumazet Acked-by: Mahesh Bandewar --- drivers/net/vrf.c | 11 ++++++----- include/net/ip6_route.h | 25 +++++++++++++++++++++++-- include/net/l3mdev.h | 11 +++++++---- net/ipv6/route.c | 10 ++++++---- net/l3mdev/l3mdev.c | 22 +++++++++++----------- 5 files changed, 53 insertions(+), 26 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 11b9525dff27..1d1ac78b167e 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1072,12 +1072,14 @@ static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev, #if IS_ENABLED(CONFIG_IPV6) /* send to link-local or multicast address via interface enslaved to * VRF device. Force lookup to VRF table without changing flow struct + * No refcnt is taken on the dst. */ -static struct dst_entry *vrf_link_scope_lookup(const struct net_device *dev, - struct flowi6 *fl6) +static struct dst_entry *vrf_link_scope_lookup_noref( + const struct net_device *dev, + struct flowi6 *fl6) { struct net *net = dev_net(dev); - int flags = RT6_LOOKUP_F_IFACE; + int flags = RT6_LOOKUP_F_IFACE | RT6_LOOKUP_F_DST_NOREF; struct dst_entry *dst = NULL; struct rt6_info *rt; @@ -1087,7 +1089,6 @@ static struct dst_entry *vrf_link_scope_lookup(const struct net_device *dev, */ if (fl6->flowi6_oif == dev->ifindex) { dst = &net->ipv6.ip6_null_entry->dst; - dst_hold(dst); return dst; } @@ -1107,7 +1108,7 @@ static const struct l3mdev_ops vrf_l3mdev_ops = { .l3mdev_l3_rcv = vrf_l3_rcv, .l3mdev_l3_out = vrf_l3_out, #if IS_ENABLED(CONFIG_IPV6) - .l3mdev_link_scope_lookup = vrf_link_scope_lookup, + .l3mdev_link_scope_lookup_noref = vrf_link_scope_lookup_noref, #endif }; diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 0709835c01ad..66802ecd81e5 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -84,8 +84,29 @@ struct dst_entry *ip6_route_input_lookup(struct net *net, struct flowi6 *fl6, const struct sk_buff *skb, int flags); -struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk, - struct flowi6 *fl6, int flags); +struct dst_entry *ip6_route_output_flags_noref(struct net *net, + const struct sock *sk, + struct flowi6 *fl6, int flags); + +static inline struct dst_entry *ip6_route_output_flags(struct net *net, + const struct sock *sk, + struct flowi6 *fl6, + int flags) { + struct dst_entry *dst; + struct rt6_info *rt6; + + rcu_read_lock(); + dst = ip6_route_output_flags_noref(net, sk, fl6, flags); + rt6 = (struct rt6_info *)dst; + /* For dst cached in uncached_list, refcnt is already taken. */ + if (list_empty(&rt6->rt6i_uncached) && !dst_hold_safe(dst)) { + dst = &net->ipv6.ip6_null_entry->dst; + dst_hold(dst); + } + rcu_read_unlock(); + + return dst; +} static inline struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk, diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h index e942372b077b..d8c37317bb86 100644 --- a/include/net/l3mdev.h +++ b/include/net/l3mdev.h @@ -31,8 +31,9 @@ struct l3mdev_ops { u16 proto); /* IPv6 ops */ - struct dst_entry * (*l3mdev_link_scope_lookup)(const struct net_device *dev, - struct flowi6 *fl6); + struct dst_entry * (*l3mdev_link_scope_lookup_noref)( + const struct net_device *dev, + struct flowi6 *fl6); }; #ifdef CONFIG_NET_L3_MASTER_DEV @@ -140,7 +141,8 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex) return rc; } -struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6); +struct dst_entry *l3mdev_link_scope_lookup_noref(struct net *net, + struct flowi6 *fl6); static inline struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto) @@ -251,7 +253,8 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex) } static inline -struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6) +struct dst_entry *l3mdev_link_scope_lookup_noref(struct net *net, + struct flowi6 *fl6) { return NULL; } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index d2b287635aab..602d00794b30 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2415,8 +2415,9 @@ static struct rt6_info *ip6_pol_route_output(struct net *net, return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, skb, flags); } -struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk, - struct flowi6 *fl6, int flags) +struct dst_entry *ip6_route_output_flags_noref(struct net *net, + const struct sock *sk, + struct flowi6 *fl6, int flags) { bool any_src; @@ -2424,13 +2425,14 @@ struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk, (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL)) { struct dst_entry *dst; - dst = l3mdev_link_scope_lookup(net, fl6); + dst = l3mdev_link_scope_lookup_noref(net, fl6); if (dst) return dst; } fl6->flowi6_iif = LOOPBACK_IFINDEX; + flags |= RT6_LOOKUP_F_DST_NOREF; any_src = ipv6_addr_any(&fl6->saddr); if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl6->daddr) || (fl6->flowi6_oif && any_src)) @@ -2443,7 +2445,7 @@ struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk, return fib6_rule_lookup(net, fl6, NULL, flags, ip6_pol_route_output); } -EXPORT_SYMBOL_GPL(ip6_route_output_flags); +EXPORT_SYMBOL_GPL(ip6_route_output_flags_noref); struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig) { diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c index cfc9fcb97465..06133426549b 100644 --- a/net/l3mdev/l3mdev.c +++ b/net/l3mdev/l3mdev.c @@ -114,35 +114,35 @@ u32 l3mdev_fib_table_by_index(struct net *net, int ifindex) EXPORT_SYMBOL_GPL(l3mdev_fib_table_by_index); /** - * l3mdev_link_scope_lookup - IPv6 route lookup based on flow for link - * local and multicast addresses + * l3mdev_link_scope_lookup_noref - IPv6 route lookup based on flow + * for link local and multicast addresses * @net: network namespace for device index lookup * @fl6: IPv6 flow struct for lookup + * This function does not hold refcnt on the returned dst. + * Caller must hold rcu_read_lock(). */ -struct dst_entry *l3mdev_link_scope_lookup(struct net *net, - struct flowi6 *fl6) +struct dst_entry *l3mdev_link_scope_lookup_noref(struct net *net, + struct flowi6 *fl6) { struct dst_entry *dst = NULL; struct net_device *dev; + WARN_ON_ONCE(!rcu_read_lock_held()); if (fl6->flowi6_oif) { - rcu_read_lock(); - dev = dev_get_by_index_rcu(net, fl6->flowi6_oif); if (dev && netif_is_l3_slave(dev)) dev = netdev_master_upper_dev_get_rcu(dev); if (dev && netif_is_l3_master(dev) && - dev->l3mdev_ops->l3mdev_link_scope_lookup) - dst = dev->l3mdev_ops->l3mdev_link_scope_lookup(dev, fl6); - - rcu_read_unlock(); + dev->l3mdev_ops->l3mdev_link_scope_lookup_noref) + dst = dev->l3mdev_ops-> + l3mdev_link_scope_lookup_noref(dev, fl6); } return dst; } -EXPORT_SYMBOL_GPL(l3mdev_link_scope_lookup); +EXPORT_SYMBOL_GPL(l3mdev_link_scope_lookup_noref); /** * l3mdev_fib_rule_match - Determine if flowi references an