Message ID | 1369723593-5307-2-git-send-email-timo.teras@iki.fi |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Hello, On Tue, 28 May 2013, Timo Teräs wrote: > The tunnel devices call update_pmtu for each packet sent, this causes > contention on the fnhe_lock. Ignore the pmtu update if pmtu is not > actually changed, and there is still plenty of time before the entry > expires. > > Signed-off-by: Timo Teräs <timo.teras@iki.fi> > --- > net/ipv4/route.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 561a378..a4082be 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) > if (mtu < ip_rt_min_pmtu) > mtu = ip_rt_min_pmtu; > > + if (rt->rt_pmtu == mtu && > + time_before(jiffies, dst->expires - ip_rt_mtu_expires / 2)) > + return; > + Can we also add logic in this patch in update_or_create_fnhe, so that we avoid invalidation for cached routes when only pmtu expiration is updated (same pmtu), i.e.: + if (gw || pmtu != fnhe->fnhe_pmtu) { + /* Exception created; mark the cached routes for the nexthop + ... + } BTW, I now see that previous patch should call for_each_possible_cpu for the both cases, not only when fnhe is created but also on update: bool invalidate; if (fnhe) { invalidate = gw || pmtu != fnhe->fnhe_pmtu; ... } else { ... invalidate = true; } if (invalidate) { /* Exception created; mark the cached routes for the nexthop ... } Regards -- Julian Anastasov <ja@ssi.bg>
On Tue, 28 May 2013 11:45:51 +0300 (EEST) Julian Anastasov <ja@ssi.bg> wrote: > On Tue, 28 May 2013, Timo Teräs wrote: > > > The tunnel devices call update_pmtu for each packet sent, this > > causes contention on the fnhe_lock. Ignore the pmtu update if pmtu > > is not actually changed, and there is still plenty of time before > > the entry expires. > > > > Signed-off-by: Timo Teräs <timo.teras@iki.fi> > > --- > > net/ipv4/route.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > > index 561a378..a4082be 100644 > > --- a/net/ipv4/route.c > > +++ b/net/ipv4/route.c > > @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct rtable > > *rt, struct flowi4 *fl4, u32 mtu) if (mtu < ip_rt_min_pmtu) > > mtu = ip_rt_min_pmtu; > > > > + if (rt->rt_pmtu == mtu && > > + time_before(jiffies, dst->expires - > > ip_rt_mtu_expires / 2)) > > + return; > > + > > Can we also add logic in this patch in > update_or_create_fnhe, so that we avoid invalidation for cached > routes when only pmtu expiration is updated (same pmtu), i.e.: > > + if (gw || pmtu != fnhe->fnhe_pmtu) { > + /* Exception created; mark the cached routes for the > nexthop > + ... > + } > > BTW, I now see that previous patch should > call for_each_possible_cpu for the both cases, not > only when fnhe is created but also on update: Why would this be needed? The idea is that if there is no next hop exception, everyone is using the next hop's dsts. If there is a next hop exception hashed, they will be using those routes in the exception entry. When the exception is created, we need to invalidate the nexthop's routes, to force relookup of these dst's if should go to the exception. Basically it flushes the nexthop's 'default' dst. But if we have a valid exception, and we are just updating it. Everyone is already using the right dst. The update_or_create_fnhe() updates all exception's dst's that are effected. Since the set of hosts to which that exception applies is the same, I don't see why we would need to invalidate the nexthop's 'default' dst. - Timo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Tue, 28 May 2013, Timo Teras wrote: > > > --- a/net/ipv4/route.c > > > +++ b/net/ipv4/route.c > > > @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct rtable > > > *rt, struct flowi4 *fl4, u32 mtu) if (mtu < ip_rt_min_pmtu) > > > mtu = ip_rt_min_pmtu; > > > > > > + if (rt->rt_pmtu == mtu && > > > + time_before(jiffies, dst->expires - > > > ip_rt_mtu_expires / 2)) > > > + return; > > > + > > > > Can we also add logic in this patch in > > update_or_create_fnhe, so that we avoid invalidation for cached > > routes when only pmtu expiration is updated (same pmtu), i.e.: > > > > + if (gw || pmtu != fnhe->fnhe_pmtu) { > > + /* Exception created; mark the cached routes for the > > nexthop > > + ... > > + } > > > > BTW, I now see that previous patch should > > call for_each_possible_cpu for the both cases, not > > only when fnhe is created but also on update: > > Why would this be needed? > > The idea is that if there is no next hop exception, everyone is using > the next hop's dsts. If there is a next hop exception hashed, they will > be using those routes in the exception entry. > > When the exception is created, we need to invalidate the nexthop's > routes, to force relookup of these dst's if should go to the exception. > Basically it flushes the nexthop's 'default' dst. > > But if we have a valid exception, and we are just updating it. Everyone > is already using the right dst. The update_or_create_fnhe() updates all > exception's dst's that are effected. Since the set of hosts to which > that exception applies is the same, I don't see why we would need to > invalidate the nexthop's 'default' dst. Agreed for the NH route. But there is another case: when fnhe exists for fnhe_pmtu!=0 and fnhe_gw=0 and the cached FNHE route has just rt_pmtu!=0 and rt_uses_gateway=0. In the event of redirect we should invalidate fnhe_rth. Users should come again to get the new gw from the same fnhe. As you note, nh_pcpu_rth_output does not need to be invalidated, it was already invalidated when fnhe was created. So, my idea is something like this. If cached route detects change in gw on redirect, it calls update_or_create_fnhe but FNHE can already know this gw, we should invalidate the fnhe_rth only if gw changes. As caller has some stale cache route, it should be invalidated as before, I assume 'kill_route' is properly provided. if (fnhe) { if (fnhe->fnhe_gw != gw && gw) { rt = rcu_dereference_protected(fnhe->fnhe_rth, 1); if (rt) rt->dst.obsolete = DST_OBSOLETE_KILL; fnhe->fnhe_gw = gw; } ... } ... } Also, I don't know the XFRM code well but xfrm_bundle_ok() calls dst_check. Now ipv4_dst_check will not give any indication for recent change in rt_pmtu. I hope this is not a problem because I see some comparisons with cached pmtus. I.e. the main question is: are there any dst_check and ->check users that needs to know for changes in rt_pmtu or all of them just use dst_mtu(). Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 29 May 2013 00:04:47 +0300 (EEST) Julian Anastasov <ja@ssi.bg> wrote: > > Hello, > > On Tue, 28 May 2013, Timo Teras wrote: > > > > > --- a/net/ipv4/route.c > > > > +++ b/net/ipv4/route.c > > > > @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct > > > > rtable *rt, struct flowi4 *fl4, u32 mtu) if (mtu < > > > > ip_rt_min_pmtu) mtu = ip_rt_min_pmtu; > > > > > > > > + if (rt->rt_pmtu == mtu && > > > > + time_before(jiffies, dst->expires - > > > > ip_rt_mtu_expires / 2)) > > > > + return; > > > > + > > > > > > Can we also add logic in this patch in > > > update_or_create_fnhe, so that we avoid invalidation for cached > > > routes when only pmtu expiration is updated (same pmtu), i.e.: > > > > > > + if (gw || pmtu != fnhe->fnhe_pmtu) { > > > + /* Exception created; mark the cached routes for > > > the nexthop > > > + ... > > > + } > > > > > > BTW, I now see that previous patch should > > > call for_each_possible_cpu for the both cases, not > > > only when fnhe is created but also on update: > > > > Why would this be needed? > > > > The idea is that if there is no next hop exception, everyone is > > using the next hop's dsts. If there is a next hop exception hashed, > > they will be using those routes in the exception entry. > > > > When the exception is created, we need to invalidate the nexthop's > > routes, to force relookup of these dst's if should go to the > > exception. Basically it flushes the nexthop's 'default' dst. > > > > But if we have a valid exception, and we are just updating it. > > Everyone is already using the right dst. The > > update_or_create_fnhe() updates all exception's dst's that are > > effected. Since the set of hosts to which that exception applies is > > the same, I don't see why we would need to invalidate the nexthop's > > 'default' dst. > > Agreed for the NH route. But there is another case: > when fnhe exists for fnhe_pmtu!=0 and fnhe_gw=0 and the cached > FNHE route has just rt_pmtu!=0 and rt_uses_gateway=0. In the > event of redirect we should invalidate fnhe_rth. > Users should come again to get the new gw from the > same fnhe. As you note, nh_pcpu_rth_output does not need > to be invalidated, it was already invalidated when > fnhe was created. > > So, my idea is something like this. If cached > route detects change in gw on redirect, it calls > update_or_create_fnhe but FNHE can already know this gw, > we should invalidate the fnhe_rth only if gw changes. > As caller has some stale cache route, it should be > invalidated as before, I assume 'kill_route' is properly > provided. > > if (fnhe) { > if (fnhe->fnhe_gw != gw && gw) { > rt = > rcu_dereference_protected(fnhe->fnhe_rth, 1); if (rt) > rt->dst.obsolete = DST_OBSOLETE_KILL; > fnhe->fnhe_gw = gw; > } > ... > } > ... > } This is already done by the caller of update_or_create_fnhe for the case of gw change. It might be worth to put all this to the same place, but would be worth of separate patch. > Also, I don't know the XFRM code well but > xfrm_bundle_ok() calls dst_check. Now ipv4_dst_check > will not give any indication for recent change in > rt_pmtu. I hope this is not a problem because I see > some comparisons with cached pmtus. I.e. the main > question is: are there any dst_check and ->check users > that needs to know for changes in rt_pmtu or all > of them just use dst_mtu(). In XFRM case, xfrm_bundle_ok() will propagate the pmtu by calling dst_mtu() for each target. All caching users need to use dst_mtu() to check it, so the route.c code is correct - this is how it worked before my patches too. - Timo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Wed, 29 May 2013, Timo Teras wrote: > On Wed, 29 May 2013 00:04:47 +0300 (EEST) > Julian Anastasov <ja@ssi.bg> wrote: > > > So, my idea is something like this. If cached > > route detects change in gw on redirect, it calls > > update_or_create_fnhe but FNHE can already know this gw, > > we should invalidate the fnhe_rth only if gw changes. > > As caller has some stale cache route, it should be > > invalidated as before, I assume 'kill_route' is properly > > provided. > > > > if (fnhe) { > > if (fnhe->fnhe_gw != gw && gw) { > > rt = > > rcu_dereference_protected(fnhe->fnhe_rth, 1); if (rt) > > rt->dst.obsolete = DST_OBSOLETE_KILL; > > fnhe->fnhe_gw = gw; > > } > > ... > > } > > ... > > } > > This is already done by the caller of update_or_create_fnhe for the > case of gw change. It might be worth to put all this to the same place, > but would be worth of separate patch. Looking at do_redirect() for TCP and __sk_dst_check may be such race is very small because first gw change for another socket will invalidate our cached entry and our socket will fail to deliver the new gw to FNHE without a two-step validation as done in inet_csk_update_pmtu(). Still, it should be more safe to change and invalidate at the same place - update_or_create_fnhe. But we can investigate it later. > > Also, I don't know the XFRM code well but > > xfrm_bundle_ok() calls dst_check. Now ipv4_dst_check > > will not give any indication for recent change in > > rt_pmtu. I hope this is not a problem because I see > > some comparisons with cached pmtus. I.e. the main > > question is: are there any dst_check and ->check users > > that needs to know for changes in rt_pmtu or all > > of them just use dst_mtu(). > > In XFRM case, xfrm_bundle_ok() will propagate the pmtu by calling > dst_mtu() for each target. All caching users need to use dst_mtu() to > check it, so the route.c code is correct - this is how it worked before > my patches too. Aha, so nobody needs DST_OBSOLETE_KILL when fnhe_pmtu changes. Thanks for answering my questions! Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Timo Teräs <timo.teras@iki.fi> Date: Tue, 28 May 2013 09:46:32 +0300 > The tunnel devices call update_pmtu for each packet sent, this causes > contention on the fnhe_lock. Ignore the pmtu update if pmtu is not > actually changed, and there is still plenty of time before the entry > expires. > > Signed-off-by: Timo Teräs <timo.teras@iki.fi> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 561a378..a4082be 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) if (mtu < ip_rt_min_pmtu) mtu = ip_rt_min_pmtu; + if (rt->rt_pmtu == mtu && + time_before(jiffies, dst->expires - ip_rt_mtu_expires / 2)) + return; + rcu_read_lock(); if (fib_lookup(dev_net(dst->dev), fl4, &res) == 0) { struct fib_nh *nh = &FIB_RES_NH(res);
The tunnel devices call update_pmtu for each packet sent, this causes contention on the fnhe_lock. Ignore the pmtu update if pmtu is not actually changed, and there is still plenty of time before the entry expires. Signed-off-by: Timo Teräs <timo.teras@iki.fi> --- net/ipv4/route.c | 4 ++++ 1 file changed, 4 insertions(+)