From patchwork Fri Sep 18 18:17:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 1367155 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=efficios.com header.i=@efficios.com header.a=rsa-sha256 header.s=default header.b=GiP7EP/B; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BtMh40Ddrz9sSW for ; Sat, 19 Sep 2020 04:25:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726280AbgIRSZD (ORCPT ); Fri, 18 Sep 2020 14:25:03 -0400 Received: from mail.efficios.com ([167.114.26.124]:59980 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726156AbgIRSZC (ORCPT ); Fri, 18 Sep 2020 14:25:02 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 7B3602724CA; Fri, 18 Sep 2020 14:18:09 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 9835xoVbywmi; Fri, 18 Sep 2020 14:18:09 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id DC07E2720C7; Fri, 18 Sep 2020 14:18:08 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com DC07E2720C7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1600453088; bh=BuUEY6vVZAx7YvCmo3Gcuwe0FMd/Hehs/wIysAuLkpc=; h=From:To:Date:Message-Id; b=GiP7EP/B3CRv0eXlfaTRwzTSZalzLlEufumkFHOwg9uiFQI2jA3Md+WG48gJGwRgR /0SLxPUoeXQK+yFB3wPkCh9F1D3VJLLK+hLB4JVoYuL/Uc1ZAGRJ5W1iR+RMoIBysm 8p1GZ6AWmbQvEF06eLmq0sR7O6odQfbaAtS/BHQvfS/xXPmQuC5/JWlZpB4GwWAEzE 1LzfmzsxOh7l0vTbVAxojadIW1Dpnd6pCGgOu1B1prhpl1iCsm9kgo2MVagVdaUuo5 5fNpjULVIDCM8ZPXVFzOC743cYi9TjWHwcBC0JaU+sbvAd+Uceh7t42P1ZZugvfcoV C6Xl+ZMR0b/iQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id NdqFME_FHCcc; Fri, 18 Sep 2020 14:18:08 -0400 (EDT) Received: from localhost.localdomain (192-222-181-218.qc.cable.ebox.net [192.222.181.218]) by mail.efficios.com (Postfix) with ESMTPSA id A72292727B1; Fri, 18 Sep 2020 14:18:08 -0400 (EDT) From: Mathieu Desnoyers To: David Ahern , "David S . Miller" , netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers Subject: [RFC PATCH v2 1/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) Date: Fri, 18 Sep 2020 14:17:59 -0400 Message-Id: <20200918181801.2571-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200918181801.2571-1-mathieu.desnoyers@efficios.com> References: <20200918181801.2571-1-mathieu.desnoyers@efficios.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org As per RFC792, ICMP errors should be sent to the source host. However, in configurations with Virtual Routing and Forwarding tables, looking up which routing table to use is currently done by using the destination net_device. commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain") changes the interface passed to l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev to skb_dst(skb_in)->dev. This effectively uses the destination device rather than the source device for choosing which routing table should be used to lookup where to send the ICMP error. Therefore, if the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source host in the destination interface's routing table will fail if the destination interface's routing table contains no route to the source host. One observable effect of this issue is that traceroute does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Preferably use the source device routing table when sending ICMP error messages. If no source device is set, fall-back on the destination device routing table. Else, use the main routing table (index 0). [ It has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate, but is outside of the scope of this investigation. ] [ It has also been pointed out that a similar issue exists with unreachable / fragmentation needed messages, which can be triggered by changing the MTU of eth1 in r1 to 1400 and running: ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 Some investigation points to raw_icmp_error() and raw_err() as being involved in this last scenario. The focus of this patch is TTL expired ICMP messages, which go through icmp_route_lookup. Investigation of failure modes related to raw_icmp_error() is beyond this investigation's scope. ] Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain") Link: https://tools.ietf.org/html/rfc792 Signed-off-by: Mathieu Desnoyers Cc: David Ahern Cc: David S. Miller Cc: netdev@vger.kernel.org --- Changes since v1: - Introduce icmp_get_route_lookup_dev. --- net/ipv4/icmp.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index cf36f955bfe6..9ea66d903c41 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -457,6 +457,23 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) local_bh_enable(); } +/* + * The device used for looking up which routing table to use for sending an ICMP + * error is preferably the source whenever it is set, which should ensure the + * icmp error can be sent to the source host, else lookup using the routing + * table of the destination device, else use the main routing table (index 0). + */ +static struct net_device *icmp_get_route_lookup_dev(struct sk_buff *skb) +{ + struct net_device *route_lookup_dev = NULL; + + if (skb->dev) + route_lookup_dev = skb->dev; + else if (skb_dst(skb)) + route_lookup_dev = skb_dst(skb)->dev; + return route_lookup_dev; +} + static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4, struct sk_buff *skb_in, @@ -465,6 +482,7 @@ static struct rtable *icmp_route_lookup(struct net *net, int type, int code, struct icmp_bxm *param) { + struct net_device *route_lookup_dev; struct rtable *rt, *rt2; struct flowi4 fl4_dec; int err; @@ -479,7 +497,8 @@ static struct rtable *icmp_route_lookup(struct net *net, fl4->flowi4_proto = IPPROTO_ICMP; fl4->fl4_icmp_type = type; fl4->fl4_icmp_code = code; - fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev); + route_lookup_dev = icmp_get_route_lookup_dev(skb_in); + fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev); security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4)); rt = ip_route_output_key_hash(net, fl4, skb_in); @@ -503,7 +522,7 @@ static struct rtable *icmp_route_lookup(struct net *net, if (err) goto relookup_failed; - if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev, + if (inet_addr_type_dev_table(net, route_lookup_dev, fl4_dec.saddr) == RTN_LOCAL) { rt2 = __ip_route_output_key(net, &fl4_dec); if (IS_ERR(rt2)) From patchwork Fri Sep 18 18:18:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 1367156 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=efficios.com header.i=@efficios.com header.a=rsa-sha256 header.s=default header.b=cJHLomqS; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BtMh60hBmz9sSW for ; Sat, 19 Sep 2020 04:25:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726306AbgIRSZE (ORCPT ); Fri, 18 Sep 2020 14:25:04 -0400 Received: from mail.efficios.com ([167.114.26.124]:59982 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726152AbgIRSZC (ORCPT ); Fri, 18 Sep 2020 14:25:02 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id B12732727B5; Fri, 18 Sep 2020 14:18:09 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CPhPFy4Gzcoo; Fri, 18 Sep 2020 14:18:09 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id F0FEB2725C6; Fri, 18 Sep 2020 14:18:08 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com F0FEB2725C6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1600453088; bh=yrIS8m3rknvWl9hL7dggbRVvRNlMDfO6Jcr+vUoXE2M=; h=From:To:Date:Message-Id; b=cJHLomqSDxRtxQI2tUIxCkf++c9Ln5Bhg1ClmYDc1Oi4ynMdpMcffQKX13NGydfeW N4DE7v/0Ia86k47S9VA7rTFGD78Ou4GPfBdXMcLLOip5xs/HFC7wHB/pp/MGnt1BV+ yjMhQuKMmvYuLvRchCL30ZWqkwhm1DoOTWW/dyvtXbEluPnRBQkjC35qA/EmgAo5q/ rZsppCC4XtoBAdRqhtg0s2f6AWq/R+SyExldJXkUx7D3cKrsJSC/YdqEw8Wd2c1KXQ SIVrFkBNWNgNfvE0o9h29XMtb2Yka9hAV61hqhw6dd9eLd4xuD5PRfg8t18aXUmlkU 7MK/A/1pDfWHA== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id kAV2HjzB4I_W; Fri, 18 Sep 2020 14:18:08 -0400 (EDT) Received: from localhost.localdomain (192-222-181-218.qc.cable.ebox.net [192.222.181.218]) by mail.efficios.com (Postfix) with ESMTPSA id D01B92727B2; Fri, 18 Sep 2020 14:18:08 -0400 (EDT) From: Mathieu Desnoyers To: David Ahern , "David S . Miller" , netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers Subject: [RFC PATCH v2 2/3] ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) Date: Fri, 18 Sep 2020 14:18:00 -0400 Message-Id: <20200918181801.2571-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200918181801.2571-1-mathieu.desnoyers@efficios.com> References: <20200918181801.2571-1-mathieu.desnoyers@efficios.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org As per RFC4443, the destination address field for ICMPv6 error messages is copied from the source address field of the invoking packet. In configurations with Virtual Routing and Forwarding tables, looking up which routing table to use for sending ICMPv6 error messages is currently done by using the destination net_device. If the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source address of the invoking packet in the destination interface's routing table will fail if the destination interface's routing table contains no route to the invoking packet's source address. One observable effect of this issue is that traceroute6 does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Use the source device routing table when sending ICMPv6 error messages. [ In the context of ipv4, it has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate whether ipv6 has similar issues, but is outside of the scope of this investigation. ] [ Testing shows that similar issues exist with ipv6 unreachable / fragmentation needed messages. However, investigation of this additional failure mode is beyond this investigation's scope. ] Link: https://tools.ietf.org/html/rfc4443 Signed-off-by: Mathieu Desnoyers Cc: David Ahern Cc: David S. Miller Cc: netdev@vger.kernel.org --- Changes since v1: - Introduce icmp6_get_route_lookup_dev. - Use skb->dev for routing table lookup, because it is guaranteed to be non-NULL. --- net/ipv6/icmp.c | 7 +++++-- net/ipv6/ip6_output.c | 2 -- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index a4e4912ad607..91209a2760aa 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -501,8 +501,11 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, if (__ipv6_addr_needs_scope_id(addr_type)) { iif = icmp6_iif(skb); } else { - dst = skb_dst(skb); - iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev); + /* + * The source device is used for looking up which routing table + * to use for sending an ICMP error. + */ + iif = l3mdev_master_ifindex(skb->dev); } /* diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index c78e67d7747f..cd623068de53 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -468,8 +468,6 @@ int ip6_forward(struct sk_buff *skb) * check and decrement ttl */ if (hdr->hop_limit <= 1) { - /* Force OUTPUT device used as source address */ - skb->dev = dst->dev; icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0); __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS); From patchwork Fri Sep 18 18:18:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 1367157 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=efficios.com header.i=@efficios.com header.a=rsa-sha256 header.s=default header.b=MlzvjXCw; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BtMhD6gYkz9sSW for ; Sat, 19 Sep 2020 04:25:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726354AbgIRSZN (ORCPT ); Fri, 18 Sep 2020 14:25:13 -0400 Received: from mail.efficios.com ([167.114.26.124]:59978 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726139AbgIRSZD (ORCPT ); Fri, 18 Sep 2020 14:25:03 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 7BE052722EB; Fri, 18 Sep 2020 14:18:12 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id h_HXxvXnVAXA; Fri, 18 Sep 2020 14:18:11 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id BC9AF2725C7; Fri, 18 Sep 2020 14:18:11 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com BC9AF2725C7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1600453091; bh=y+aC+JhQOFlD15KBj3nejJKSzsxsVtGLbUvCAx5zp1k=; h=From:To:Date:Message-Id; b=MlzvjXCwBFTPgEo8YvgcN0Zt6prR8pn67FKJQf3bLXd4nt8k7F3JRgdCeZxgynOcy fbvoMBAez9YyXx1/8Z8+pKcJoVjec1+QQO8ZqJPhnj4rJTBNTkkKReCj3q4jSKCdPS W9U/B9Ppbh3kPQ60yN2SdGa4zgfOALgmF4eJBEnSVBQYF9jI1elw8az9pCUxmzWq63 k3zF2+IlhkmgxsLhDhbBvKxUJJ94fzjtgPGN8suFwC7rBBm9em65g61/ylxVZyL56S lKwcluZRivthFowWAeV04rLlhbhv7zz/xF2GITGkVVkOdhRV9gIOv0IQKwo6Za4fqG PzhhRqkAvmCAg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id dBi8f60TKXzG; Fri, 18 Sep 2020 14:18:11 -0400 (EDT) Received: from localhost.localdomain (192-222-181-218.qc.cable.ebox.net [192.222.181.218]) by mail.efficios.com (Postfix) with ESMTPSA id EF18D2726CF; Fri, 18 Sep 2020 14:18:08 -0400 (EDT) From: Mathieu Desnoyers To: David Ahern , "David S . Miller" , netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Michael Jeanson Subject: [RFC PATCH v2 3/3] selftests: Add VRF icmp error route lookup test Date: Fri, 18 Sep 2020 14:18:01 -0400 Message-Id: <20200918181801.2571-4-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200918181801.2571-1-mathieu.desnoyers@efficios.com> References: <20200918181801.2571-1-mathieu.desnoyers@efficios.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Michael Jeanson The objective is to check that the incoming vrf routing table is selected to send an ICMP error back to the source. We test two scenarios: when the ttl of a packet reaches 1 while it is forwarded between different vrfs and when a packet is bigger than the mtu of the second interface is forwarded between different vrfs. The first ttl test sends a ping with a ttl of 1 from h1 to h2 and parses the output of the command to check that a ttl expired error is received. The second ttl test runs traceroute from h1 to h2 and parses the output to check for a hop on r1. The mtu test sends a ping with a payload of 1450 from h1 to h2, through r1 which has an interface with a mtu of 1400 and parses the output of the command to check that a fragmentation needed error is received. Signed-off-by: Michael Jeanson Cc: David Ahern Cc: David S. Miller Cc: netdev@vger.kernel.org --- Changes since v1: - Formating and indentation fixes - Added '-t' to getopts - Reworked verbose output of grep'd commands with a new function - Expanded ip command names - Added fragmentation tests --- tools/testing/selftests/net/Makefile | 1 + .../selftests/net/vrf_icmp_error_route.sh | 475 ++++++++++++++++++ 2 files changed, 476 insertions(+) create mode 100755 tools/testing/selftests/net/vrf_icmp_error_route.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 9491bbaa0831..a716fbf780b3 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -19,6 +19,7 @@ TEST_PROGS += txtimestamp.sh TEST_PROGS += vrf-xfrm-tests.sh TEST_PROGS += rxtimestamp.sh TEST_PROGS += devlink_port_split.py +TEST_PROGS += vrf_icmp_error_route.sh TEST_PROGS_EXTENDED := in_netns.sh TEST_GEN_FILES = socket nettest TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any diff --git a/tools/testing/selftests/net/vrf_icmp_error_route.sh b/tools/testing/selftests/net/vrf_icmp_error_route.sh new file mode 100755 index 000000000000..42c412bd79ab --- /dev/null +++ b/tools/testing/selftests/net/vrf_icmp_error_route.sh @@ -0,0 +1,475 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2019 David Ahern . All rights reserved. +# Copyright (c) 2020 Michael Jeanson . All rights reserved. +# +# blue red +# .253 +----+ .253 +# +----| r1 |----+ +# | +----+ | +# +----+ | | +----+ +# | h1 |--------------+ +--------------| h2 | +# +----+ .1 | | .2 +----+ +# 172.16.1/24 | +----+ | 172.16.2/24 +# 2001:db8:16:1/64 +----| r2 |----+ 2001:db8:16:2/64 +# .254 +----+ .254 +# +# +# Route from h1 to h2 goes through r1, incoming vrf blue has a route to the +# outgoing vrf red for the n2 network but red doesn't have a route back to n1. +# Route from h2 to h1 goes through r2. +# +# The objective is to check that the incoming vrf routing table is selected +# to send an ICMP error back to the source when the ttl of a packet reaches 1 +# while it is forwarded between different vrfs. +# +# The first test sends a ping with a ttl of 1 from h1 to h2 and parses the +# output of the command to check that a ttl expired error is received. +# +# The second test runs traceroute from h1 to h2 and parses the output to check +# for a hop on r1. +# +# Requires CONFIG_NET_VRF, CONFIG_VETH, CONFIG_BRIDGE and CONFIG_NET_NS. + +VERBOSE=0 +PAUSE_ON_FAIL=no + +H1_N1_IP=172.16.1.1 +R1_N1_IP=172.16.1.253 +R2_N1_IP=172.16.1.254 + +H1_N1_IP6=2001:db8:16:1::1 +R1_N1_IP6=2001:db8:16:1::253 +R2_N1_IP6=2001:db8:16:1::254 + +H2_N2=172.16.2.0/24 +H2_N2_6=2001:db8:16:2::/64 + +H2_N2_IP=172.16.2.2 +R1_N2_IP=172.16.2.253 +R2_N2_IP=172.16.2.254 + +H2_N2_IP6=2001:db8:16:2::2 +R1_N2_IP6=2001:db8:16:2::253 +R2_N2_IP6=2001:db8:16:2::254 + +################################################################################ +# helpers + +log_section() +{ + echo + echo "###########################################################################" + echo "$*" + echo "###########################################################################" + echo +} + +log_test() +{ + local rc=$1 + local expected=$2 + local msg="$3" + + if [ "${rc}" -eq "${expected}" ]; then + printf "TEST: %-60s [ OK ]\n" "${msg}" + nsuccess=$((nsuccess+1)) + else + ret=1 + nfail=$((nfail+1)) + printf "TEST: %-60s [FAIL]\n" "${msg}" + if [ "${PAUSE_ON_FAIL}" = "yes" ]; then + echo + echo "hit enter to continue, 'q' to quit" + read -r a + [ "$a" = "q" ] && exit 1 + fi + fi +} + +run_cmd() +{ + local cmd="$*" + local out + local rc + + if [ "$VERBOSE" = "1" ]; then + echo "COMMAND: $cmd" + fi + + out=$(eval $cmd 2>&1) + rc=$? + if [ "$VERBOSE" = "1" ] && [ -n "$out" ]; then + echo "$out" + fi + + [ "$VERBOSE" = "1" ] && echo + + return $rc +} + +run_cmd_grep() +{ + local grep_pattern="$1" + shift + local cmd="$*" + local out + local rc + + if [ "$VERBOSE" = "1" ]; then + echo "COMMAND: $cmd" + fi + + out=$(eval $cmd 2>&1) + if [ "$VERBOSE" = "1" ] && [ -n "$out" ]; then + echo "$out" + fi + + echo "$out" | grep -q "$grep_pattern" + rc=$? + + [ "$VERBOSE" = "1" ] && echo + + return $rc +} + +################################################################################ +# setup and teardown + +cleanup() +{ + local ns + + setup=0 + + for ns in h1 h2 r1 r2; do + ip netns del $ns 2>/dev/null + done +} + +setup_vrf() +{ + local ns=$1 + + ip -netns "${ns}" rule del pref 0 + ip -netns "${ns}" rule add pref 32765 from all lookup local + ip -netns "${ns}" -6 rule del pref 0 + ip -netns "${ns}" -6 rule add pref 32765 from all lookup local +} + +create_vrf() +{ + local ns=$1 + local vrf=$2 + local table=$3 + + ip -netns "${ns}" link add "${vrf}" type vrf table "${table}" + ip -netns "${ns}" link set "${vrf}" up + ip -netns "${ns}" route add vrf "${vrf}" unreachable default metric 8192 + ip -netns "${ns}" -6 route add vrf "${vrf}" unreachable default metric 8192 + + ip -netns "${ns}" addr add 127.0.0.1/8 dev "${vrf}" + ip -netns "${ns}" -6 addr add ::1 dev "${vrf}" nodad +} + +setup() +{ + local ns + + if [ "${setup}" -eq 1 ]; then + return 0 + fi + + # make sure we are starting with a clean slate + cleanup + + setup=1 + + # + # create nodes as namespaces + # + for ns in h1 h2 r1 r2; do + ip netns add $ns + ip -netns $ns link set lo up + + case "${ns}" in + h[12]) ip netns exec $ns sysctl -q -w net.ipv6.conf.all.forwarding=0 + ip netns exec $ns sysctl -q -w net.ipv6.conf.all.keep_addr_on_down=1 + ;; + r[12]) ip netns exec $ns sysctl -q -w net.ipv4.ip_forward=1 + ip netns exec $ns sysctl -q -w net.ipv6.conf.all.forwarding=1 + esac + done + + # + # create interconnects + # + ip -netns h1 link add eth0 type veth peer name r1h1 + ip -netns h1 link set r1h1 netns r1 name eth0 up + + ip -netns h1 link add eth1 type veth peer name r2h1 + ip -netns h1 link set r2h1 netns r2 name eth0 up + + ip -netns h2 link add eth0 type veth peer name r1h2 + ip -netns h2 link set r1h2 netns r1 name eth1 up + + ip -netns h2 link add eth1 type veth peer name r2h2 + ip -netns h2 link set r2h2 netns r2 name eth1 up + + # + # h1 + # + ip -netns h1 link add br0 type bridge + ip -netns h1 link set br0 up + ip -netns h1 addr add dev br0 ${H1_N1_IP}/24 + ip -netns h1 -6 addr add dev br0 ${H1_N1_IP6}/64 nodad + ip -netns h1 link set eth0 master br0 up + ip -netns h1 link set eth1 master br0 up + + # h1 to h2 via r1 + ip -netns h1 route add ${H2_N2} via ${R1_N1_IP} dev br0 + ip -netns h1 -6 route add ${H2_N2_6} via "${R1_N1_IP6}" dev br0 + + # + # h2 + # + ip -netns h2 link add br0 type bridge + ip -netns h2 link set br0 up + ip -netns h2 addr add dev br0 ${H2_N2_IP}/24 + ip -netns h2 -6 addr add dev br0 ${H2_N2_IP6}/64 nodad + ip -netns h2 link set eth0 master br0 up + ip -netns h2 link set eth1 master br0 up + + ip -netns h2 route add default via ${R2_N2_IP} dev br0 + ip -netns h2 -6 route add default via ${R2_N2_IP6} dev br0 + + # + # r1 + # + setup_vrf r1 + create_vrf r1 blue 1101 + create_vrf r1 red 1102 + ip -netns r1 link set mtu 1400 dev eth1 + ip -netns r1 link set eth0 vrf blue up + ip -netns r1 link set eth1 vrf red up + ip -netns r1 addr add dev eth0 ${R1_N1_IP}/24 + ip -netns r1 -6 addr add dev eth0 ${R1_N1_IP6}/64 nodad + ip -netns r1 addr add dev eth1 ${R1_N2_IP}/24 + ip -netns r1 -6 addr add dev eth1 ${R1_N2_IP6}/64 nodad + + # Route leak from blue to red + ip -netns r1 route add vrf blue ${H2_N2} dev red + ip -netns r1 -6 route add vrf blue ${H2_N2_6} dev red + + # + # r2 + # + ip -netns r2 addr add dev eth0 ${R2_N1_IP}/24 + ip -netns r2 -6 addr add dev eth0 ${R2_N1_IP6}/64 nodad + ip -netns r2 addr add dev eth1 ${R2_N2_IP}/24 + ip -netns r2 -6 addr add dev eth1 ${R2_N2_IP6}/64 nodad + + # Wait for ip config to settle + sleep 2 +} + +check_connectivity4() +{ + ip netns exec h1 ping -c1 -w1 ${H2_N2_IP} >/dev/null 2>&1 +} + +check_connectivity6() +{ + ip netns exec h1 "${ping6}" -c1 -w1 ${H2_N2_IP6} >/dev/null 2>&1 +} + +ipv4_traceroute() +{ + log_section "IPv4: VRF ICMP error route lookup traceroute" + + if [ ! -x "$(command -v traceroute)" ]; then + echo "SKIP: Could not run IPV4 test without traceroute" + return + fi + + setup + + # verify connectivity + if ! check_connectivity4; then + echo "Error: Basic connectivity is broken" + ret=1 + return + fi + + run_cmd_grep "${R1_N1_IP}" ip netns exec h1 traceroute ${H2_N2_IP} + log_test $? 0 "Traceroute reports a hop on r1" +} + +ipv6_traceroute() +{ + log_section "IPv6: VRF ICMP error route lookup traceroute" + + if [ ! -x "$(command -v traceroute6)" ]; then + echo "SKIP: Could not run IPV6 test without traceroute6" + return + fi + + setup + + # verify connectivity + if ! check_connectivity6; then + echo "Error: Basic connectivity is broken" + ret=1 + return + fi + + run_cmd_grep "${R1_N1_IP6}" ip netns exec h1 traceroute6 ${H2_N2_IP6} + log_test $? 0 "Traceroute6 reports a hop on r1" +} + +ipv4_ping_ttl() +{ + log_section "IPv4: VRF ICMP ttl error route lookup ping" + + setup + + # verify connectivity + if ! check_connectivity4; then + echo "Error: Basic connectivity is broken" + ret=1 + return + fi + + run_cmd_grep "Time to live exceeded" ip netns exec h1 ping -t1 -c1 -W2 ${H2_N2_IP} + log_test $? 0 "Ping received ICMP ttl exceeded" +} + +ipv4_ping_frag() +{ + log_section "IPv4: VRF ICMP fragmentation error route lookup ping" + + setup + + # verify connectivity + if ! check_connectivity4; then + echo "Error: Basic connectivity is broken" + ret=1 + return + fi + + run_cmd_grep "Frag needed" ip netns exec h1 ping -s 1450 -Mdo -c1 -W2 ${H2_N2_IP} + log_test $? 0 "Ping received ICMP frag needed" +} + +ipv6_ping_ttl() +{ + log_section "IPv6: VRF ICMP ttl error route lookup ping" + + setup + + # verify connectivity + if ! check_connectivity6; then + echo "Error: Basic connectivity is broken" + ret=1 + return + fi + + run_cmd_grep "Time exceeded: Hop limit" ip netns exec h1 "${ping6}" -t1 -c1 -W2 ${H2_N2_IP6} + log_test $? 0 "Ping received ICMP frag" +} + +ipv6_ping_frag() +{ + log_section "IPv6: VRF ICMP fragmentation error route lookup ping" + + setup + + # verify connectivity + if ! check_connectivity6; then + echo "Error: Basic connectivity is broken" + ret=1 + return + fi + + run_cmd_grep "Packet too big" ip netns exec h1 "${ping6}" -s 1450 -Mdo -c1 -W2 ${H2_N2_IP6} + log_test $? 0 "Ping received ICMP frag needed" +} + +################################################################################ +# usage + +usage() +{ + cat < /dev/null 2>&1 && ping6=$(command -v ping6) || ping6=$(command -v ping) + +TESTS_IPV4="ipv4_ping_ttl ipv4_ping_frag ipv4_traceroute" +TESTS_IPV6="ipv6_ping_ttl ipv6_ping_frag ipv6_traceroute" + +ret=0 +nsuccess=0 +nfail=0 +setup=0 + +while getopts :46t:pvh o +do + case $o in + 4) TESTS=ipv4;; + 6) TESTS=ipv6;; + t) TESTS=$OPTARG;; + p) PAUSE_ON_FAIL=yes;; + v) VERBOSE=1;; + h) usage; exit 0;; + *) usage; exit 1;; + esac +done + +# +# show user test config +# +if [ -z "$TESTS" ]; then + TESTS="$TESTS_IPV4 $TESTS_IPV6" +elif [ "$TESTS" = "ipv4" ]; then + TESTS="$TESTS_IPV4" +elif [ "$TESTS" = "ipv6" ]; then + TESTS="$TESTS_IPV6" +fi + +for t in $TESTS +do + case $t in + ipv4_ping_ttl|ping) ipv4_ping_ttl;;& + ipv4_ping_frag|ping) ipv4_ping_frag;;& + ipv4_traceroute|traceroute) ipv4_traceroute;;& + + ipv6_ping_ttl|ping) ipv6_ping_ttl;;& + ipv6_ping_frag|ping) ipv6_ping_frag;;& + ipv6_traceroute|traceroute) ipv6_traceroute;;& + + # setup namespaces and config, but do not run any tests + setup) setup; exit 0;; + + help) echo "Test names: $TESTS"; exit 0;; + esac +done + +cleanup + +printf "\nTests passed: %3d\n" ${nsuccess} +printf "Tests failed: %3d\n" ${nfail} + +exit $ret