Message ID | 20190515004610.102519-1-tracywwnj@gmail.com |
---|---|
State | Superseded |
Delegated to: | David Miller |
Headers | show |
Series | [net] ipv6: fix src addr routing with the exception table | expand |
On 5/14/19 6:46 PM, Wei Wang wrote: > From: Wei Wang <weiwan@google.com> > > When inserting route cache into the exception table, the key is > generated with both src_addr and dest_addr with src addr routing. > However, current logic always assumes the src_addr used to generate the > key is a /128 host address. This is not true in the following scenarios: > 1. When the route is a gateway route or does not have next hop. > (rt6_is_gw_or_nonexthop() == false) > 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL. > This means, when looking for a route cache in the exception table, we > have to do the lookup twice: first time with the passed in /128 host > address, second time with the src_addr stored in fib6_info. > > This solves the pmtu discovery issue reported by Mikael Magnusson where > a route cache with a lower mtu info is created for a gateway route with > src addr. However, the lookup code is not able to find this route cache. > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se> > Bisected-by: David Ahern <dsahern@gmail.com> > Signed-off-by: Wei Wang <weiwan@google.com> > Acked-by: Eric Dumazet <edumazet@google.com> > --- > net/ipv6/route.c | 33 ++++++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 23a20d62daac..c36900a07a78 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res, > struct rt6_exception *rt6_ex; > struct rt6_info *ret = NULL; > > - bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > - > #ifdef CONFIG_IPV6_SUBTREES > /* fib6i_src.plen != 0 indicates f6i is in subtree > * and exception table is indexed by a hash of > * both fib6_dst and fib6_src. > - * Otherwise, the exception table is indexed by > - * a hash of only fib6_dst. > + * However, the src addr used to create the hash > + * might not be exactly the passed in saddr which > + * is a /128 addr from the flow. > + * So we need to use f6i->fib6_src to redo lookup > + * if the passed in saddr does not find anything. > + * (See the logic in ip6_rt_cache_alloc() on how > + * rt->rt6i_src is updated.) > */ > if (res->f6i->fib6_src.plen) > src_key = saddr; > +find_ex: > #endif > + bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > ret = rt6_ex->rt6i; > > +#ifdef CONFIG_IPV6_SUBTREES > + /* Use fib6_src as src_key and redo lookup */ > + if (!ret && src_key == saddr) { > + src_key = &res->f6i->fib6_src.addr; > + goto find_ex; > + } > +#endif > + > return ret; > } > > @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res, > #ifdef CONFIG_IPV6_SUBTREES > if (f6i->fib6_src.plen) > src_key = saddr; > +find_ex: > #endif > - > bucket = rcu_dereference(f6i->rt6i_exception_bucket); > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU); > +#ifdef CONFIG_IPV6_SUBTREES > + /* Similar logic as in rt6_find_cached_rt(). > + * We need to use f6i->fib6_src to redo lookup in exception > + * table if saddr did not yield any result. > + */ > + else if (src_key == saddr) { > + src_key = &f6i->fib6_src.addr; > + goto find_ex; > + } > +#endif > > if (likely(!mtu)) { > struct net_device *dev = nh->fib_nh_dev; > What about rt6_remove_exception_rt? You can add a 'cache' hook to ip/iproute.c to delete the cached routes and verify that it works. I seem to have misplaced my patch to do it.
On 5/15/19 9:56 AM, David Ahern wrote: > You can add a 'cache' hook to ip/iproute.c to delete the cached routes > and verify that it works. I seem to have misplaced my patch to do it. found it. From 7a328753a93321a07a5228fb32ed881d82d7a537 Mon Sep 17 00:00:00 2001 From: David Ahern <dsahern@gmail.com> Date: Mon, 6 May 2019 08:09:01 -0700 Subject: [PATCH iproute2-next] route: Add cache keyword to iproute_modify Kernel supports deleting cached routes (e.g., exceptions). Add cache keyword to iproute_modify to set RTM_F_CLONED in the request. Signed-off-by: David Ahern <dsahern@gmail.com> --- ip/iproute.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ip/iproute.c b/ip/iproute.c index 2b3dcc5dbd53..d7a812a39047 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -74,7 +74,7 @@ static void usage(void) " ip route { add | del | change | append | replace } ROUTE\n" "SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX ]\n" " [ table TABLE_ID ] [ vrf NAME ] [ proto RTPROTO ]\n" - " [ type TYPE ] [ scope SCOPE ]\n" + " [ type TYPE ] [ scope SCOPE ] [ cache ]\n" "ROUTE := NODE_SPEC [ INFO_SPEC ]\n" "NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]\n" " [ table TABLE_ID ] [ proto RTPROTO ]\n" @@ -1444,6 +1444,8 @@ static int iproute_modify(int cmd, unsigned int flags, int argc, char **argv) if (fastopen_no_cookie != 1 && fastopen_no_cookie != 0) invarg("\"fastopen_no_cookie\" value should be 0 or 1\n", *argv); rta_addattr32(mxrta, sizeof(mxbuf), RTAX_FASTOPEN_NO_COOKIE, fastopen_no_cookie); + } else if (!strcmp(*argv, "cache")) { + req.r.rtm_flags |= RTM_F_CLONED; } else { int type; inet_prefix dst;
> > What about rt6_remove_exception_rt? > > You can add a 'cache' hook to ip/iproute.c to delete the cached routes > and verify that it works. I seem to have misplaced my patch to do it. I don't think rt6_remove_exception_rt() needs any change. It is because it gets the route cache rt6_info as the input parameter, not specific saddr or daddr from a flow or a packet. It is guaranteed that the hash used in the exception table is generated from rt6_info->rt6i_dst and rt6_info->rt6i_src. For the case where user tries to delete a cache route, ip6_route_del() calls rt6_find_cached_rt() to find the cached route first. And rt6_find_cached_rt() is taken care of to find the cached route according to both passed in src addr and f6i->fib6_src. So I think we are good here. From: David Ahern <dsahern@gmail.com> Date: Wed, May 15, 2019 at 9:38 AM To: Wei Wang, David Miller, <netdev@vger.kernel.org> Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet > On 5/15/19 9:56 AM, David Ahern wrote: > > You can add a 'cache' hook to ip/iproute.c to delete the cached routes > > and verify that it works. I seem to have misplaced my patch to do it. > > found it.
From: Wei Wang <weiwan@google.com> Date: Wed, May 15, 2019 at 10:25 AM To: David Ahern Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin KaFai Lau, Mikael Magnusson, Eric Dumazet > > > > What about rt6_remove_exception_rt? > > > > You can add a 'cache' hook to ip/iproute.c to delete the cached routes > > and verify that it works. I seem to have misplaced my patch to do it. > I don't think rt6_remove_exception_rt() needs any change. > It is because it gets the route cache rt6_info as the input parameter, > not specific saddr or daddr from a flow or a packet. > It is guaranteed that the hash used in the exception table is > generated from rt6_info->rt6i_dst and rt6_info->rt6i_src. > > For the case where user tries to delete a cache route, ip6_route_del() > calls rt6_find_cached_rt() to find the cached route first. And > rt6_find_cached_rt() is taken care of to find the cached route > according to both passed in src addr and f6i->fib6_src. > So I think we are good here. > > From: David Ahern <dsahern@gmail.com> > Date: Wed, May 15, 2019 at 9:38 AM > To: Wei Wang, David Miller, <netdev@vger.kernel.org> > Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet > > > On 5/15/19 9:56 AM, David Ahern wrote: > > > You can add a 'cache' hook to ip/iproute.c to delete the cached routes > > > and verify that it works. I seem to have misplaced my patch to do it. > > > > found it. Thanks. I patched it to iproute2 and tried it. The route cache is removed by doing: ip netns exec a ./ip -6 route del fd01::c from fd00::a cache
On 5/15/19 11:28 AM, Wei Wang wrote: > From: Wei Wang <weiwan@google.com> > Date: Wed, May 15, 2019 at 10:25 AM > To: David Ahern > Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin > KaFai Lau, Mikael Magnusson, Eric Dumazet > >>> >>> What about rt6_remove_exception_rt? >>> >>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes >>> and verify that it works. I seem to have misplaced my patch to do it. >> I don't think rt6_remove_exception_rt() needs any change. >> It is because it gets the route cache rt6_info as the input parameter, >> not specific saddr or daddr from a flow or a packet. >> It is guaranteed that the hash used in the exception table is >> generated from rt6_info->rt6i_dst and rt6_info->rt6i_src. >> >> For the case where user tries to delete a cache route, ip6_route_del() >> calls rt6_find_cached_rt() to find the cached route first. And >> rt6_find_cached_rt() is taken care of to find the cached route >> according to both passed in src addr and f6i->fib6_src. >> So I think we are good here. >> >> From: David Ahern <dsahern@gmail.com> >> Date: Wed, May 15, 2019 at 9:38 AM >> To: Wei Wang, David Miller, <netdev@vger.kernel.org> >> Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet >> >>> On 5/15/19 9:56 AM, David Ahern wrote: >>>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes >>>> and verify that it works. I seem to have misplaced my patch to do it. >>> >>> found it. > > Thanks. I patched it to iproute2 and tried it. > The route cache is removed by doing: > ip netns exec a ./ip -6 route del fd01::c from fd00::a cache > you have to pass in a device. The first line in ip6_del_cached_rt: if (cfg->fc_ifindex && rt->dst.dev->ifindex != cfg->fc_ifindex) goto out; 'ip route get' is one way to check if it has been deleted. We really need to add support for dumping exception routes.
From: David Ahern <dsahern@gmail.com> Date: Wed, May 15, 2019 at 10:33 AM To: Wei Wang Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin KaFai Lau, Mikael Magnusson, Eric Dumazet > On 5/15/19 11:28 AM, Wei Wang wrote: > > From: Wei Wang <weiwan@google.com> > > Date: Wed, May 15, 2019 at 10:25 AM > > To: David Ahern > > Cc: Wei Wang, David Miller, Linux Kernel Network Developers, Martin > > KaFai Lau, Mikael Magnusson, Eric Dumazet > > > >>> > >>> What about rt6_remove_exception_rt? > >>> > >>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes > >>> and verify that it works. I seem to have misplaced my patch to do it. > >> I don't think rt6_remove_exception_rt() needs any change. > >> It is because it gets the route cache rt6_info as the input parameter, > >> not specific saddr or daddr from a flow or a packet. > >> It is guaranteed that the hash used in the exception table is > >> generated from rt6_info->rt6i_dst and rt6_info->rt6i_src. > >> > >> For the case where user tries to delete a cache route, ip6_route_del() > >> calls rt6_find_cached_rt() to find the cached route first. And > >> rt6_find_cached_rt() is taken care of to find the cached route > >> according to both passed in src addr and f6i->fib6_src. > >> So I think we are good here. > >> > >> From: David Ahern <dsahern@gmail.com> > >> Date: Wed, May 15, 2019 at 9:38 AM > >> To: Wei Wang, David Miller, <netdev@vger.kernel.org> > >> Cc: Martin KaFai Lau, Wei Wang, Mikael Magnusson, Eric Dumazet > >> > >>> On 5/15/19 9:56 AM, David Ahern wrote: > >>>> You can add a 'cache' hook to ip/iproute.c to delete the cached routes > >>>> and verify that it works. I seem to have misplaced my patch to do it. > >>> > >>> found it. > > > > Thanks. I patched it to iproute2 and tried it. > > The route cache is removed by doing: > > ip netns exec a ./ip -6 route del fd01::c from fd00::a cache > > > > you have to pass in a device. The first line in ip6_del_cached_rt: > > if (cfg->fc_ifindex && rt->dst.dev->ifindex != cfg->fc_ifindex) > goto out; > > 'ip route get' is one way to check if it has been deleted. We really > need to add support for dumping exception routes. Without passing in dev, fc_ifindex = 0. So it won't goto out. Isn't it? The way I checked if the route cache is being removed is by doing: ip netns exec a cat /proc/net/rt6_stats The 5th counter is the number of cached routes right now in the system. The output I get after I run the reproducer: # ip netns exec a cat /proc/net/rt6_stats 000b 0006 000e 0006 0001 0005 0000 # ip netns exec a ./ip -6 route del fd01::c from fd00::/64 cache # ip netns exec a cat /proc/net/rt6_stats 000b 0006 0012 0006 0000 0004 0000 The same behavior if I pass in dev: # ip netns exec a cat /proc/net/rt6_stats 000b 0006 000c 0006 0001 0004 0000 # ip netns exec a ./ip -6 route del fd01::c from fd00::/64 dev vethab cache # ip netns exec a cat /proc/net/rt6_stats 000b 0006 0013 0006 0000 0003 0000
On 5/15/19 11:45 AM, Wei Wang wrote: >> >> you have to pass in a device. The first line in ip6_del_cached_rt: >> >> if (cfg->fc_ifindex && rt->dst.dev->ifindex != cfg->fc_ifindex) >> goto out; >> >> 'ip route get' is one way to check if it has been deleted. We really >> need to add support for dumping exception routes. > > Without passing in dev, fc_ifindex = 0. So it won't goto out. Isn't it? ugh, yes, blew right past that. > The way I checked if the route cache is being removed is by doing: > ip netns exec a cat /proc/net/rt6_stats > The 5th counter is the number of cached routes right now in the system. > > The output I get after I run the reproducer: > # ip netns exec a cat /proc/net/rt6_stats > 000b 0006 000e 0006 0001 0005 0000 > # ip netns exec a ./ip -6 route del fd01::c from fd00::/64 cache > # ip netns exec a cat /proc/net/rt6_stats > 000b 0006 0012 0006 0000 0004 0000 > > The same behavior if I pass in dev: > # ip netns exec a cat /proc/net/rt6_stats > 000b 0006 000c 0006 0001 0004 0000 > # ip netns exec a ./ip -6 route del fd01::c from fd00::/64 dev vethab cache > # ip netns exec a cat /proc/net/rt6_stats > 000b 0006 0013 0006 0000 0003 0000 > ok. Reviewed-by: David Ahern <dsahern@gmail.com>
On Tue, May 14, 2019 at 05:46:10PM -0700, Wei Wang wrote: > From: Wei Wang <weiwan@google.com> > > When inserting route cache into the exception table, the key is > generated with both src_addr and dest_addr with src addr routing. > However, current logic always assumes the src_addr used to generate the > key is a /128 host address. This is not true in the following scenarios: > 1. When the route is a gateway route or does not have next hop. > (rt6_is_gw_or_nonexthop() == false) > 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL. > This means, when looking for a route cache in the exception table, we > have to do the lookup twice: first time with the passed in /128 host > address, second time with the src_addr stored in fib6_info. > > This solves the pmtu discovery issue reported by Mikael Magnusson where > a route cache with a lower mtu info is created for a gateway route with > src addr. However, the lookup code is not able to find this route cache. > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se> > Bisected-by: David Ahern <dsahern@gmail.com> > Signed-off-by: Wei Wang <weiwan@google.com> > Acked-by: Eric Dumazet <edumazet@google.com> > --- > net/ipv6/route.c | 33 ++++++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 23a20d62daac..c36900a07a78 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res, > struct rt6_exception *rt6_ex; > struct rt6_info *ret = NULL; > > - bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > - > #ifdef CONFIG_IPV6_SUBTREES > /* fib6i_src.plen != 0 indicates f6i is in subtree > * and exception table is indexed by a hash of > * both fib6_dst and fib6_src. > - * Otherwise, the exception table is indexed by > - * a hash of only fib6_dst. > + * However, the src addr used to create the hash > + * might not be exactly the passed in saddr which > + * is a /128 addr from the flow. > + * So we need to use f6i->fib6_src to redo lookup > + * if the passed in saddr does not find anything. > + * (See the logic in ip6_rt_cache_alloc() on how > + * rt->rt6i_src is updated.) > */ > if (res->f6i->fib6_src.plen) > src_key = saddr; > +find_ex: > #endif > + bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > ret = rt6_ex->rt6i; > > +#ifdef CONFIG_IPV6_SUBTREES > + /* Use fib6_src as src_key and redo lookup */ > + if (!ret && src_key == saddr) { > + src_key = &res->f6i->fib6_src.addr; > + goto find_ex; > + } > +#endif > + > return ret; > } > > @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res, > #ifdef CONFIG_IPV6_SUBTREES > if (f6i->fib6_src.plen) > src_key = saddr; > +find_ex: > #endif > - > bucket = rcu_dereference(f6i->rt6i_exception_bucket); > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU); > +#ifdef CONFIG_IPV6_SUBTREES > + /* Similar logic as in rt6_find_cached_rt(). > + * We need to use f6i->fib6_src to redo lookup in exception > + * table if saddr did not yield any result. > + */ > + else if (src_key == saddr) { > + src_key = &f6i->fib6_src.addr; > + goto find_ex; > + } > +#endif Nit. Instead of repeating this retry logic, can it be consolidated into __rt6_find_exception_xxx() by passing fib6_src.addr as a secondary matching saddr? > > if (likely(!mtu)) { > struct net_device *dev = nh->fib_nh_dev; > -- > 2.21.0.1020.gf2820cf01a-goog >
On Wed, May 15, 2019 at 2:51 PM Martin Lau <kafai@fb.com> wrote: > > On Tue, May 14, 2019 at 05:46:10PM -0700, Wei Wang wrote: > > From: Wei Wang <weiwan@google.com> > > > > When inserting route cache into the exception table, the key is > > generated with both src_addr and dest_addr with src addr routing. > > However, current logic always assumes the src_addr used to generate the > > key is a /128 host address. This is not true in the following scenarios: > > 1. When the route is a gateway route or does not have next hop. > > (rt6_is_gw_or_nonexthop() == false) > > 2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL. > > This means, when looking for a route cache in the exception table, we > > have to do the lookup twice: first time with the passed in /128 host > > address, second time with the src_addr stored in fib6_info. > > > > This solves the pmtu discovery issue reported by Mikael Magnusson where > > a route cache with a lower mtu info is created for a gateway route with > > src addr. However, the lookup code is not able to find this route cache. > > > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > > Reported-by: Mikael Magnusson <mikael.kernel@lists.m7n.se> > > Bisected-by: David Ahern <dsahern@gmail.com> > > Signed-off-by: Wei Wang <weiwan@google.com> > > Acked-by: Eric Dumazet <edumazet@google.com> > > --- > > net/ipv6/route.c | 33 ++++++++++++++++++++++++++++----- > > 1 file changed, 28 insertions(+), 5 deletions(-) > > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 23a20d62daac..c36900a07a78 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res, > > struct rt6_exception *rt6_ex; > > struct rt6_info *ret = NULL; > > > > - bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > > - > > #ifdef CONFIG_IPV6_SUBTREES > > /* fib6i_src.plen != 0 indicates f6i is in subtree > > * and exception table is indexed by a hash of > > * both fib6_dst and fib6_src. > > - * Otherwise, the exception table is indexed by > > - * a hash of only fib6_dst. > > + * However, the src addr used to create the hash > > + * might not be exactly the passed in saddr which > > + * is a /128 addr from the flow. > > + * So we need to use f6i->fib6_src to redo lookup > > + * if the passed in saddr does not find anything. > > + * (See the logic in ip6_rt_cache_alloc() on how > > + * rt->rt6i_src is updated.) > > */ > > if (res->f6i->fib6_src.plen) > > src_key = saddr; > > +find_ex: > > #endif > > + bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); > > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > > > > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > > ret = rt6_ex->rt6i; > > > > +#ifdef CONFIG_IPV6_SUBTREES > > + /* Use fib6_src as src_key and redo lookup */ > > + if (!ret && src_key == saddr) { > > + src_key = &res->f6i->fib6_src.addr; > > + goto find_ex; > > + } > > +#endif > > + > > return ret; > > } > > > > @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res, > > #ifdef CONFIG_IPV6_SUBTREES > > if (f6i->fib6_src.plen) > > src_key = saddr; > > +find_ex: > > #endif > > - > > bucket = rcu_dereference(f6i->rt6i_exception_bucket); > > rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); > > if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) > > mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU); > > +#ifdef CONFIG_IPV6_SUBTREES > > + /* Similar logic as in rt6_find_cached_rt(). > > + * We need to use f6i->fib6_src to redo lookup in exception > > + * table if saddr did not yield any result. > > + */ > > + else if (src_key == saddr) { > > + src_key = &f6i->fib6_src.addr; > > + goto find_ex; > > + } > > +#endif > Nit. > Instead of repeating this retry logic, > can it be consolidated into __rt6_find_exception_xxx() > by passing fib6_src.addr as a secondary matching > saddr? > Thanks Martin. Changing __rt6_find_exception_xxx() might not be easy cause other callers of this function does not really need to back off and use another saddr. And the validation of the result is a bit different for different callers. What about add a new helper for the above 2 cases and just call that from both places? > > > > if (likely(!mtu)) { > > struct net_device *dev = nh->fib_nh_dev; > > -- > > 2.21.0.1020.gf2820cf01a-goog > >
On Wed, May 15, 2019 at 5:07 PM David Ahern <dsahern@gmail.com> wrote: > > On 5/15/19 6:03 PM, Wei Wang wrote: > > Thanks Martin. > > Changing __rt6_find_exception_xxx() might not be easy cause other > > callers of this function does not really need to back off and use > > another saddr. > > And the validation of the result is a bit different for different callers. > > What about add a new helper for the above 2 cases and just call that > > from both places? > > Since this needs to be backported to stable releases, I would say > simplest patch for that is best. > > I have changes queued for this area once net-next opens; I can look at > consolidating as part of that. Thanks David... In that case, I would prefer to stick with the current version. Martin, what do you think?
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 23a20d62daac..c36900a07a78 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1574,23 +1574,36 @@ static struct rt6_info *rt6_find_cached_rt(const struct fib6_result *res, struct rt6_exception *rt6_ex; struct rt6_info *ret = NULL; - bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); - #ifdef CONFIG_IPV6_SUBTREES /* fib6i_src.plen != 0 indicates f6i is in subtree * and exception table is indexed by a hash of * both fib6_dst and fib6_src. - * Otherwise, the exception table is indexed by - * a hash of only fib6_dst. + * However, the src addr used to create the hash + * might not be exactly the passed in saddr which + * is a /128 addr from the flow. + * So we need to use f6i->fib6_src to redo lookup + * if the passed in saddr does not find anything. + * (See the logic in ip6_rt_cache_alloc() on how + * rt->rt6i_src is updated.) */ if (res->f6i->fib6_src.plen) src_key = saddr; +find_ex: #endif + bucket = rcu_dereference(res->f6i->rt6i_exception_bucket); rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) ret = rt6_ex->rt6i; +#ifdef CONFIG_IPV6_SUBTREES + /* Use fib6_src as src_key and redo lookup */ + if (!ret && src_key == saddr) { + src_key = &res->f6i->fib6_src.addr; + goto find_ex; + } +#endif + return ret; } @@ -2683,12 +2696,22 @@ u32 ip6_mtu_from_fib6(const struct fib6_result *res, #ifdef CONFIG_IPV6_SUBTREES if (f6i->fib6_src.plen) src_key = saddr; +find_ex: #endif - bucket = rcu_dereference(f6i->rt6i_exception_bucket); rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key); if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i)) mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU); +#ifdef CONFIG_IPV6_SUBTREES + /* Similar logic as in rt6_find_cached_rt(). + * We need to use f6i->fib6_src to redo lookup in exception + * table if saddr did not yield any result. + */ + else if (src_key == saddr) { + src_key = &f6i->fib6_src.addr; + goto find_ex; + } +#endif if (likely(!mtu)) { struct net_device *dev = nh->fib_nh_dev;