diff mbox

[PATCHv2,net,2/2] ipv4: update RTNH_F_LINKDOWN flag on UP event

Message ID 20151027045203.GG3567@gospo.home.greyhouse.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Andy Gospodarek Oct. 27, 2015, 4:52 a.m. UTC
On Mon, Oct 26, 2015 at 11:59:13PM +0200, Julian Anastasov wrote:
> When nexthop is part of multipath route we should clear the
> LINKDOWN flag when link goes UP or when first address is added.
> This is needed because we always set LINKDOWN flag when DEAD flag
> was set but now on UP the nexthop is not dead anymore. Examples when
> LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered:
> 
> - link goes down (LINKDOWN is set), then link goes UP and device
> shows carrier OK but LINKDOWN remains set
> 
> - last address is deleted (LINKDOWN is set), then address is
> added and device shows carrier OK but LINKDOWN remains set
> 
> Steps to reproduce:
> modprobe dummy
> ifconfig dummy0 192.168.168.1 up
> 
> here add a multipath route where one nexthop is for dummy0:
> 
> ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE
> ifconfig dummy0 down
> ifconfig dummy0 up
> 
> now ip route shows nexthop that is not dead. Now set the sysctl var:
> 
> echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown
> 
> now ip route will show a dead nexthop because the forgotten
> RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD.

I tested this patch and I now see that your reported problem is a result
of dummy never taking carrier down.  There was a presumption that
carrier notification would go down when hardware went down (or when the
logical device backing the hardware went down, but this is clearly not
always the case.

> Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops")
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
>  net/ipv4/fib_semantics.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index f493eff..f657418 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -1445,6 +1445,13 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
>  	if (!(dev->flags & IFF_UP))
>  		return 0;
>  
> +	if (nh_flags & RTNH_F_DEAD) {
> +		unsigned int flags = dev_get_flags(dev);
> +
> +		if (flags & (IFF_RUNNING | IFF_LOWER_UP))
> +			nh_flags |= RTNH_F_LINKDOWN;
> +	}
> +
>  	prev_fi = NULL;
>  	hash = fib_devindex_hashfn(dev->ifindex);
>  	head = &fib_info_devhash[hash];

Logically this patch makes sense, but I feel as though there may be a
slightly better option.  Possibly this:


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Julian Anastasov Oct. 27, 2015, 7:42 a.m. UTC | #1
Hello,

On Tue, 27 Oct 2015, Andy Gospodarek wrote:

> I tested this patch and I now see that your reported problem is a result
> of dummy never taking carrier down.  There was a presumption that
> carrier notification would go down when hardware went down (or when the
> logical device backing the hardware went down, but this is clearly not
> always the case.

	It seems not all devices play with the carrier
after IFF_UP is set and we can not rely on NETDEV_CHANGE
to update the flag.

> > +	if (nh_flags & RTNH_F_DEAD) {
> > +		unsigned int flags = dev_get_flags(dev);
> > +
> > +		if (flags & (IFF_RUNNING | IFF_LOWER_UP))
> > +			nh_flags |= RTNH_F_LINKDOWN;
> > +	}
> > +
> >  	prev_fi = NULL;
> >  	hash = fib_devindex_hashfn(dev->ifindex);
> >  	head = &fib_info_devhash[hash];
> 
> Logically this patch makes sense, but I feel as though there may be a
> slightly better option.  Possibly this:
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 42778d9..7eb7c40 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -1376,7 +1376,8 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
>  					nexthop_nh->nh_flags |= RTNH_F_DEAD;
>  					/* fall through */
>  				case NETDEV_CHANGE:
> -					nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
> +					if (!netif_carrier_ok(dev))
> +						nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;

	There is a problem with this approach. Once
the RTNH_F_DEAD flag is set, eg. when last address is removed,
any NETDEV_CHANGE events are ignored in this function.
As result, we may miss the link-down event if we first
remove the addresses, so we will not set RTNH_F_LINKDOWN.

	Also, when device link goes UP we (FIB) can not guess
just based on events what is the actual carrier state
because the NETDEV_CHANGE notification comes only when
IFF_UP is set. So, this check.

	I also attempted to fully recalculate the flag
in fib_sync_up, i.e. with the option not just to clear it
but also to add nexthop_nh->nh_flags |= nh_flags_set logic
but it complicates the code. So, while we always set
RTNH_F_LINKDOWN when DEAD is set, the logic to conditionally
clear RTNH_F_LINKDOWN in fib_sync_up looks the cheapest one.

	Of course, we have a semantic problem when setting
RTNH_F_LINKDOWN on last address removal, i.e. this event
has nothing to do with the link state. But it works because
RTNH_F_LINKDOWN is valid for lookups only when DEAD flag
is not set, so that is why my patch looks this way.

>  					break;
>  				}
>  				dead++;
> @@ -1396,7 +1397,8 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
>  				fi->fib_flags |= RTNH_F_DEAD;
>  				/* fall through */
>  			case NETDEV_CHANGE:
> -				fi->fib_flags |= RTNH_F_LINKDOWN;
> +				if (!netif_carrier_ok(dev))
> +					fi->fib_flags |= RTNH_F_LINKDOWN;
>  				break;
>  			}
>  			ret++;

	I think, we even do not need the RTNH_F_LINKDOWN flag
in fib_flags, currently it is set but never used.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek Oct. 27, 2015, 6:37 p.m. UTC | #2
On Tue, Oct 27, 2015 at 09:42:25AM +0200, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Tue, 27 Oct 2015, Andy Gospodarek wrote:
> 
> > I tested this patch and I now see that your reported problem is a result
> > of dummy never taking carrier down.  There was a presumption that
> > carrier notification would go down when hardware went down (or when the
> > logical device backing the hardware went down, but this is clearly not
> > always the case.
> 
> 	It seems not all devices play with the carrier
> after IFF_UP is set and we can not rely on NETDEV_CHANGE
> to update the flag.
Agreed.

> 
> > > +	if (nh_flags & RTNH_F_DEAD) {
> > > +		unsigned int flags = dev_get_flags(dev);
> > > +
> > > +		if (flags & (IFF_RUNNING | IFF_LOWER_UP))
> > > +			nh_flags |= RTNH_F_LINKDOWN;
> > > +	}
> > > +
> > >  	prev_fi = NULL;
> > >  	hash = fib_devindex_hashfn(dev->ifindex);
> > >  	head = &fib_info_devhash[hash];
> > 
> > Logically this patch makes sense, but I feel as though there may be a
> > slightly better option.  Possibly this:
> > 
> > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> > index 42778d9..7eb7c40 100644
> > --- a/net/ipv4/fib_semantics.c
> > +++ b/net/ipv4/fib_semantics.c
> > @@ -1376,7 +1376,8 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
> >  					nexthop_nh->nh_flags |= RTNH_F_DEAD;
> >  					/* fall through */
> >  				case NETDEV_CHANGE:
> > -					nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
> > +					if (!netif_carrier_ok(dev))
> > +						nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
> 
> 	There is a problem with this approach. Once
> the RTNH_F_DEAD flag is set, eg. when last address is removed,
> any NETDEV_CHANGE events are ignored in this function.
> As result, we may miss the link-down event if we first
> remove the addresses, so we will not set RTNH_F_LINKDOWN.
Yes, I see that now.  I verified with dummy by setting carrier after a
flush.  Thanks for pointing that out.

> 	Also, when device link goes UP we (FIB) can not guess
> just based on events what is the actual carrier state
> because the NETDEV_CHANGE notification comes only when
> IFF_UP is set. So, this check.
> 
> 	I also attempted to fully recalculate the flag
> in fib_sync_up, i.e. with the option not just to clear it
> but also to add nexthop_nh->nh_flags |= nh_flags_set logic
> but it complicates the code. So, while we always set
> RTNH_F_LINKDOWN when DEAD is set, the logic to conditionally
> clear RTNH_F_LINKDOWN in fib_sync_up looks the cheapest one.
> 
> 	Of course, we have a semantic problem when setting
> RTNH_F_LINKDOWN on last address removal, i.e. this event
> has nothing to do with the link state. But it works because
> RTNH_F_LINKDOWN is valid for lookups only when DEAD flag
> is not set, so that is why my patch looks this way.
The problem you describe here was a concern of mine as well.  I would
really like the output of 'ip route show' to properly reflect the link
state and fix the problem you describe, but it seems like it will not in
this case with your current patch.  I'll do a bit more testing and let
you know.

> 
> >  					break;
> >  				}
> >  				dead++;
> > @@ -1396,7 +1397,8 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
> >  				fi->fib_flags |= RTNH_F_DEAD;
> >  				/* fall through */
> >  			case NETDEV_CHANGE:
> > -				fi->fib_flags |= RTNH_F_LINKDOWN;
> > +				if (!netif_carrier_ok(dev))
> > +					fi->fib_flags |= RTNH_F_LINKDOWN;
> >  				break;
> >  			}
> >  			ret++;
> 
> 	I think, we even do not need the RTNH_F_LINKDOWN flag
> in fib_flags, currently it is set but never used.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov Oct. 29, 2015, 8:15 p.m. UTC | #3
Hello,

On Tue, 27 Oct 2015, Andy Gospodarek wrote:

> > 	Of course, we have a semantic problem when setting
> > RTNH_F_LINKDOWN on last address removal, i.e. this event
> > has nothing to do with the link state. But it works because
> > RTNH_F_LINKDOWN is valid for lookups only when DEAD flag
> > is not set, so that is why my patch looks this way.
> The problem you describe here was a concern of mine as well.  I would
> really like the output of 'ip route show' to properly reflect the link
> state and fix the problem you describe, but it seems like it will not in
> this case with your current patch.  I'll do a bit more testing and let
> you know.

	Another option is to remove the RTNH_F_LINKDOWN
usage from any event, from FIB structs. In case only
netif_carrier_ok check is needed, it looks cheap to just
call it after the IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN check.
When showing it to user space we can again use netif_carrier_ok.

	The new fib_rebalance() is special, it still
needs to be called, just like now on NETDEV_CHANGE
event but only when ignore_routes_with_linkdown=1, i.e.
we can avoid calling fib_sync_up/fib_sync_down_dev in
fib_netdev_event if the flag is not set, guarded by 
CONFIG_IP_ROUTE_MULTIPATH check.

	To make this work we can also call somehow
fib_rebalance() from devinet_conf_proc() context when
the flag is changed, all this is for the
CONFIG_IP_ROUTE_MULTIPATH case.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 42778d9..7eb7c40 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1376,7 +1376,8 @@  int fib_sync_down_dev(struct net_device *dev, unsigned long event)
 					nexthop_nh->nh_flags |= RTNH_F_DEAD;
 					/* fall through */
 				case NETDEV_CHANGE:
-					nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
+					if (!netif_carrier_ok(dev))
+						nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
 					break;
 				}
 				dead++;
@@ -1396,7 +1397,8 @@  int fib_sync_down_dev(struct net_device *dev, unsigned long event)
 				fi->fib_flags |= RTNH_F_DEAD;
 				/* fall through */
 			case NETDEV_CHANGE:
-				fi->fib_flags |= RTNH_F_LINKDOWN;
+				if (!netif_carrier_ok(dev))
+					fi->fib_flags |= RTNH_F_LINKDOWN;
 				break;
 			}
 			ret++;