diff mbox

[net-next,2/3] ipv4: rate limit updating of next hop exceptions with same pmtu

Message ID 1369723593-5307-2-git-send-email-timo.teras@iki.fi
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Timo Teras May 28, 2013, 6:46 a.m. UTC
The tunnel devices call update_pmtu for each packet sent, this causes
contention on the fnhe_lock. Ignore the pmtu update if pmtu is not
actually changed, and there is still plenty of time before the entry
expires.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
---
 net/ipv4/route.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Julian Anastasov May 28, 2013, 8:45 a.m. UTC | #1
Hello,

On Tue, 28 May 2013, Timo Teräs wrote:

> The tunnel devices call update_pmtu for each packet sent, this causes
> contention on the fnhe_lock. Ignore the pmtu update if pmtu is not
> actually changed, and there is still plenty of time before the entry
> expires.
> 
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
> ---
>  net/ipv4/route.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 561a378..a4082be 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
>  	if (mtu < ip_rt_min_pmtu)
>  		mtu = ip_rt_min_pmtu;
>  
> +	if (rt->rt_pmtu == mtu &&
> +	    time_before(jiffies, dst->expires - ip_rt_mtu_expires / 2))
> +		return;
> +

	Can we also add logic in this patch in
update_or_create_fnhe, so that we avoid invalidation for cached
routes when only pmtu expiration is updated (same pmtu), i.e.:

+	if (gw || pmtu != fnhe->fnhe_pmtu) {
+		/* Exception created; mark the cached routes for the nexthop
+		...
+	}

	BTW, I now see that previous patch should
call for_each_possible_cpu for the both cases, not
only when fnhe is created but also on update:

	bool invalidate;

	if (fnhe) {
		invalidate = gw || pmtu != fnhe->fnhe_pmtu;
		...
	} else {
		...
		invalidate = true;
	}

	if (invalidate) {
		/* Exception created; mark the cached routes for the nexthop
		...
	}

Regards

--
Julian Anastasov <ja@ssi.bg>
Timo Teras May 28, 2013, 10:07 a.m. UTC | #2
On Tue, 28 May 2013 11:45:51 +0300 (EEST)
Julian Anastasov <ja@ssi.bg> wrote:

> On Tue, 28 May 2013, Timo Teräs wrote:
> 
> > The tunnel devices call update_pmtu for each packet sent, this
> > causes contention on the fnhe_lock. Ignore the pmtu update if pmtu
> > is not actually changed, and there is still plenty of time before
> > the entry expires.
> > 
> > Signed-off-by: Timo Teräs <timo.teras@iki.fi>
> > ---
> >  net/ipv4/route.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 561a378..a4082be 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct rtable
> > *rt, struct flowi4 *fl4, u32 mtu) if (mtu < ip_rt_min_pmtu)
> >  		mtu = ip_rt_min_pmtu;
> >  
> > +	if (rt->rt_pmtu == mtu &&
> > +	    time_before(jiffies, dst->expires -
> > ip_rt_mtu_expires / 2))
> > +		return;
> > +
> 
> 	Can we also add logic in this patch in
> update_or_create_fnhe, so that we avoid invalidation for cached
> routes when only pmtu expiration is updated (same pmtu), i.e.:
> 
> +	if (gw || pmtu != fnhe->fnhe_pmtu) {
> +		/* Exception created; mark the cached routes for the
> nexthop
> +		...
> +	}
> 
> 	BTW, I now see that previous patch should
> call for_each_possible_cpu for the both cases, not
> only when fnhe is created but also on update:

Why would this be needed?

The idea is that if there is no next hop exception, everyone is using
the next hop's dsts. If there is a next hop exception hashed, they will
be using those routes in the exception entry.

When the exception is created, we need to invalidate the nexthop's
routes, to force relookup of these dst's if should go to the exception.
Basically it flushes the nexthop's 'default' dst.

But if we have a valid exception, and we are just updating it. Everyone
is already using the right dst. The update_or_create_fnhe() updates all
exception's dst's that are effected. Since the set of hosts to which
that exception applies is the same, I don't see why we would need to
invalidate the nexthop's 'default' dst.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov May 28, 2013, 9:04 p.m. UTC | #3
Hello,

On Tue, 28 May 2013, Timo Teras wrote:

> > > --- a/net/ipv4/route.c
> > > +++ b/net/ipv4/route.c
> > > @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct rtable
> > > *rt, struct flowi4 *fl4, u32 mtu) if (mtu < ip_rt_min_pmtu)
> > >  		mtu = ip_rt_min_pmtu;
> > >  
> > > +	if (rt->rt_pmtu == mtu &&
> > > +	    time_before(jiffies, dst->expires -
> > > ip_rt_mtu_expires / 2))
> > > +		return;
> > > +
> > 
> > 	Can we also add logic in this patch in
> > update_or_create_fnhe, so that we avoid invalidation for cached
> > routes when only pmtu expiration is updated (same pmtu), i.e.:
> > 
> > +	if (gw || pmtu != fnhe->fnhe_pmtu) {
> > +		/* Exception created; mark the cached routes for the
> > nexthop
> > +		...
> > +	}
> > 
> > 	BTW, I now see that previous patch should
> > call for_each_possible_cpu for the both cases, not
> > only when fnhe is created but also on update:
> 
> Why would this be needed?
> 
> The idea is that if there is no next hop exception, everyone is using
> the next hop's dsts. If there is a next hop exception hashed, they will
> be using those routes in the exception entry.
> 
> When the exception is created, we need to invalidate the nexthop's
> routes, to force relookup of these dst's if should go to the exception.
> Basically it flushes the nexthop's 'default' dst.
> 
> But if we have a valid exception, and we are just updating it. Everyone
> is already using the right dst. The update_or_create_fnhe() updates all
> exception's dst's that are effected. Since the set of hosts to which
> that exception applies is the same, I don't see why we would need to
> invalidate the nexthop's 'default' dst.

	Agreed for the NH route. But there is another case:
when fnhe exists for fnhe_pmtu!=0 and fnhe_gw=0 and the cached
FNHE route has just rt_pmtu!=0 and rt_uses_gateway=0. In the
event of redirect we should invalidate fnhe_rth.
Users should come again to get the new gw from the
same fnhe. As you note, nh_pcpu_rth_output does not need
to be invalidated, it was already invalidated when
fnhe was created.

	So, my idea is something like this. If cached
route detects change in gw on redirect, it calls
update_or_create_fnhe but FNHE can already know this gw,
we should invalidate the fnhe_rth only if gw changes.
As caller has some stale cache route, it should be
invalidated as before, I assume 'kill_route' is properly
provided.

	if (fnhe) {
		if (fnhe->fnhe_gw != gw && gw) {
			rt = rcu_dereference_protected(fnhe->fnhe_rth, 1);
			if (rt)
				rt->dst.obsolete = DST_OBSOLETE_KILL;
			fnhe->fnhe_gw = gw;
		}
		...
	}
	...
}

	Also, I don't know the XFRM code well but
xfrm_bundle_ok() calls dst_check. Now ipv4_dst_check
will not give any indication for recent change in
rt_pmtu. I hope this is not a problem because I see
some comparisons with cached pmtus. I.e. the main
question is: are there any dst_check and ->check users
that needs to know for changes in rt_pmtu or all
of them just use dst_mtu().

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timo Teras May 29, 2013, 5:07 a.m. UTC | #4
On Wed, 29 May 2013 00:04:47 +0300 (EEST)
Julian Anastasov <ja@ssi.bg> wrote:

> 
> 	Hello,
> 
> On Tue, 28 May 2013, Timo Teras wrote:
> 
> > > > --- a/net/ipv4/route.c
> > > > +++ b/net/ipv4/route.c
> > > > @@ -947,6 +947,10 @@ static void __ip_rt_update_pmtu(struct
> > > > rtable *rt, struct flowi4 *fl4, u32 mtu) if (mtu <
> > > > ip_rt_min_pmtu) mtu = ip_rt_min_pmtu;
> > > >  
> > > > +	if (rt->rt_pmtu == mtu &&
> > > > +	    time_before(jiffies, dst->expires -
> > > > ip_rt_mtu_expires / 2))
> > > > +		return;
> > > > +
> > > 
> > > 	Can we also add logic in this patch in
> > > update_or_create_fnhe, so that we avoid invalidation for cached
> > > routes when only pmtu expiration is updated (same pmtu), i.e.:
> > > 
> > > +	if (gw || pmtu != fnhe->fnhe_pmtu) {
> > > +		/* Exception created; mark the cached routes for
> > > the nexthop
> > > +		...
> > > +	}
> > > 
> > > 	BTW, I now see that previous patch should
> > > call for_each_possible_cpu for the both cases, not
> > > only when fnhe is created but also on update:
> > 
> > Why would this be needed?
> > 
> > The idea is that if there is no next hop exception, everyone is
> > using the next hop's dsts. If there is a next hop exception hashed,
> > they will be using those routes in the exception entry.
> > 
> > When the exception is created, we need to invalidate the nexthop's
> > routes, to force relookup of these dst's if should go to the
> > exception. Basically it flushes the nexthop's 'default' dst.
> > 
> > But if we have a valid exception, and we are just updating it.
> > Everyone is already using the right dst. The
> > update_or_create_fnhe() updates all exception's dst's that are
> > effected. Since the set of hosts to which that exception applies is
> > the same, I don't see why we would need to invalidate the nexthop's
> > 'default' dst.
> 
> 	Agreed for the NH route. But there is another case:
> when fnhe exists for fnhe_pmtu!=0 and fnhe_gw=0 and the cached
> FNHE route has just rt_pmtu!=0 and rt_uses_gateway=0. In the
> event of redirect we should invalidate fnhe_rth.
> Users should come again to get the new gw from the
> same fnhe. As you note, nh_pcpu_rth_output does not need
> to be invalidated, it was already invalidated when
> fnhe was created.
> 
> 	So, my idea is something like this. If cached
> route detects change in gw on redirect, it calls
> update_or_create_fnhe but FNHE can already know this gw,
> we should invalidate the fnhe_rth only if gw changes.
> As caller has some stale cache route, it should be
> invalidated as before, I assume 'kill_route' is properly
> provided.
> 
> 	if (fnhe) {
> 		if (fnhe->fnhe_gw != gw && gw) {
> 			rt =
> rcu_dereference_protected(fnhe->fnhe_rth, 1); if (rt)
> 				rt->dst.obsolete = DST_OBSOLETE_KILL;
> 			fnhe->fnhe_gw = gw;
> 		}
> 		...
> 	}
> 	...
> }

This is already done by the caller of update_or_create_fnhe for the
case of gw change. It might be worth to put all this to the same place,
but would be worth of separate patch.

> 	Also, I don't know the XFRM code well but
> xfrm_bundle_ok() calls dst_check. Now ipv4_dst_check
> will not give any indication for recent change in
> rt_pmtu. I hope this is not a problem because I see
> some comparisons with cached pmtus. I.e. the main
> question is: are there any dst_check and ->check users
> that needs to know for changes in rt_pmtu or all
> of them just use dst_mtu().

In XFRM case, xfrm_bundle_ok() will propagate the pmtu by calling
dst_mtu() for each target. All caching users need to use dst_mtu() to
check it, so the route.c code is correct - this is how it worked before
my patches too.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov May 29, 2013, 10:49 p.m. UTC | #5
Hello,

On Wed, 29 May 2013, Timo Teras wrote:

> On Wed, 29 May 2013 00:04:47 +0300 (EEST)
> Julian Anastasov <ja@ssi.bg> wrote:
> 
> > 	So, my idea is something like this. If cached
> > route detects change in gw on redirect, it calls
> > update_or_create_fnhe but FNHE can already know this gw,
> > we should invalidate the fnhe_rth only if gw changes.
> > As caller has some stale cache route, it should be
> > invalidated as before, I assume 'kill_route' is properly
> > provided.
> > 
> > 	if (fnhe) {
> > 		if (fnhe->fnhe_gw != gw && gw) {
> > 			rt =
> > rcu_dereference_protected(fnhe->fnhe_rth, 1); if (rt)
> > 				rt->dst.obsolete = DST_OBSOLETE_KILL;
> > 			fnhe->fnhe_gw = gw;
> > 		}
> > 		...
> > 	}
> > 	...
> > }
> 
> This is already done by the caller of update_or_create_fnhe for the
> case of gw change. It might be worth to put all this to the same place,
> but would be worth of separate patch.

	Looking at do_redirect() for TCP and __sk_dst_check
may be such race is very small because first gw change
for another socket will invalidate our cached entry and
our socket will fail to deliver the new gw to FNHE
without a two-step validation as done in inet_csk_update_pmtu().
Still, it should be more safe to change and invalidate
at the same place - update_or_create_fnhe. But we can
investigate it later.

> > 	Also, I don't know the XFRM code well but
> > xfrm_bundle_ok() calls dst_check. Now ipv4_dst_check
> > will not give any indication for recent change in
> > rt_pmtu. I hope this is not a problem because I see
> > some comparisons with cached pmtus. I.e. the main
> > question is: are there any dst_check and ->check users
> > that needs to know for changes in rt_pmtu or all
> > of them just use dst_mtu().
> 
> In XFRM case, xfrm_bundle_ok() will propagate the pmtu by calling
> dst_mtu() for each target. All caching users need to use dst_mtu() to
> check it, so the route.c code is correct - this is how it worked before
> my patches too.

	Aha, so nobody needs DST_OBSOLETE_KILL when
fnhe_pmtu changes. Thanks for answering my questions!

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 3, 2013, 7:10 a.m. UTC | #6
From: Timo Teräs <timo.teras@iki.fi>
Date: Tue, 28 May 2013 09:46:32 +0300

> The tunnel devices call update_pmtu for each packet sent, this causes
> contention on the fnhe_lock. Ignore the pmtu update if pmtu is not
> actually changed, and there is still plenty of time before the entry
> expires.
> 
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 561a378..a4082be 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -947,6 +947,10 @@  static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 	if (mtu < ip_rt_min_pmtu)
 		mtu = ip_rt_min_pmtu;
 
+	if (rt->rt_pmtu == mtu &&
+	    time_before(jiffies, dst->expires - ip_rt_mtu_expires / 2))
+		return;
+
 	rcu_read_lock();
 	if (fib_lookup(dev_net(dst->dev), fl4, &res) == 0) {
 		struct fib_nh *nh = &FIB_RES_NH(res);