Message ID | 20200501145308.48766-1-dsahern@kernel.org |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [v2,net] ipv6: Use global sernum for dst validation with nexthop objects | expand |
From: David Ahern <dsahern@kernel.org> Date: Fri, 1 May 2020 08:53:08 -0600 > Nik reported a bug with pcpu dst cache when nexthop objects are > used illustrated by the following: ... > Conversion of FIB entries to work with external nexthop objects > missed an important difference between IPv4 and IPv6 - how dst > entries are invalidated when the FIB changes. IPv4 has a per-network > namespace generation id (rt_genid) that is bumped on changes to the FIB. > Checking if a dst_entry is still valid means comparing rt_genid in the > rtable to the current value of rt_genid for the namespace. > > IPv6 also has a per network namespace counter, fib6_sernum, but the > count is saved per fib6_node. With the per-node counter only dst_entries > based on fib entries under the node are invalidated when changes are > made to the routes - limiting the scope of invalidations. IPv6 uses a > reference in the rt6_info, 'from', to track the corresponding fib entry > used to create the dst_entry. When validating a dst_entry, the 'from' > is used to backtrack to the fib6_node and check the sernum of it to the > cookie passed to the dst_check operation. > > With the inline format (nexthop definition inline with the fib6_info), > dst_entries cached in the fib6_nh have a 1:1 correlation between fib > entries, nexthop data and dst_entries. With external nexthops, IPv6 > looks more like IPv4 which means multiple fib entries across disparate > fib6_nodes can all reference the same fib6_nh. That means validation > of dst_entries based on external nexthops needs to use the IPv4 format > - the per-network namespace counter. > > Add sernum to rt6_info and set it when creating a pcpu dst entry. Update > rt6_get_cookie to return sernum if it is set and update dst_check for > IPv6 to look for sernum set and based the check on it if so. Finally, > rt6_get_pcpu_route needs to validate the cached entry before returning > a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and > __mkroute_output for IPv4). > > This problem only affects routes using the new, external nexthops. > > Thanks to the kbuild test robot for catching the IS_ENABLED needed > around rt_genid_ipv6 before I sent this out. > > Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects") > Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> > Signed-off-by: David Ahern <dsahern@kernel.org> > Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> > Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Applied and queued up for -stable, thanks David.
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 80262d2980f5..5bc78a99e8a2 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -203,6 +203,7 @@ struct fib6_info { struct rt6_info { struct dst_entry dst; struct fib6_info __rcu *from; + int sernum; struct rt6key rt6i_dst; struct rt6key rt6i_src; @@ -291,6 +292,9 @@ static inline u32 rt6_get_cookie(const struct rt6_info *rt) struct fib6_info *from; u32 cookie = 0; + if (rt->sernum) + return rt->sernum; + rcu_read_lock(); from = rcu_dereference(rt->from); diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index ab96fb59131c..8e001e049497 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -437,6 +437,13 @@ static inline int rt_genid_ipv4(const struct net *net) return atomic_read(&net->ipv4.rt_genid); } +#if IS_ENABLED(CONFIG_IPV6) +static inline int rt_genid_ipv6(const struct net *net) +{ + return atomic_read(&net->ipv6.fib6_sernum); +} +#endif + static inline void rt_genid_bump_ipv4(struct net *net) { atomic_inc(&net->ipv4.rt_genid); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 310cbddaa533..5d83564d55aa 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1385,9 +1385,18 @@ static struct rt6_info *ip6_rt_pcpu_alloc(const struct fib6_result *res) } ip6_rt_copy_init(pcpu_rt, res); pcpu_rt->rt6i_flags |= RTF_PCPU; + + if (f6i->nh) + pcpu_rt->sernum = rt_genid_ipv6(dev_net(dev)); + return pcpu_rt; } +static bool rt6_is_valid(const struct rt6_info *rt6) +{ + return rt6->sernum == rt_genid_ipv6(dev_net(rt6->dst.dev)); +} + /* It should be called with rcu_read_lock() acquired */ static struct rt6_info *rt6_get_pcpu_route(const struct fib6_result *res) { @@ -1395,6 +1404,19 @@ static struct rt6_info *rt6_get_pcpu_route(const struct fib6_result *res) pcpu_rt = this_cpu_read(*res->nh->rt6i_pcpu); + if (pcpu_rt && pcpu_rt->sernum && !rt6_is_valid(pcpu_rt)) { + struct rt6_info *prev, **p; + + p = this_cpu_ptr(res->nh->rt6i_pcpu); + prev = xchg(p, NULL); + if (prev) { + dst_dev_put(&prev->dst); + dst_release(&prev->dst); + } + + pcpu_rt = NULL; + } + return pcpu_rt; } @@ -2593,6 +2615,9 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie) rt = container_of(dst, struct rt6_info, dst); + if (rt->sernum) + return rt6_is_valid(rt) ? dst : NULL; + rcu_read_lock(); /* All IPV6 dsts are created with ->obsolete set to the value