Message ID | 1492693132-4708-1-git-send-email-rshearma@brocade.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On 4/20/17 6:58 AM, Robert Shearman wrote: > David reported that doing the following: > > ip li add red type vrf table 10 > ip link set dev eth1 vrf red > ip addr add 127.0.0.1/8 dev red > ip link set dev eth1 up > ip li set red up > ping -c1 -w1 -I red 127.0.0.1 > ip li del red > > results in a hang with this message: > > unregister_netdevice: waiting for red to become free. Usage count = 1 I think you misunderstood my comment. The above works fine today. There is no bug with refcnts. It breaks with your patches wanting to use a VRF device with the main table (254).
On 20/04/17 15:21, David Ahern wrote: > On 4/20/17 6:58 AM, Robert Shearman wrote: >> David reported that doing the following: >> >> ip li add red type vrf table 10 >> ip link set dev eth1 vrf red >> ip addr add 127.0.0.1/8 dev red >> ip link set dev eth1 up >> ip li set red up >> ping -c1 -w1 -I red 127.0.0.1 >> ip li del red >> >> results in a hang with this message: >> >> unregister_netdevice: waiting for red to become free. Usage count = 1 > > I think you misunderstood my comment. The above works fine today. There > is no bug with refcnts. > > It breaks with your patches wanting to use a VRF device with the main > table (254). That doesn't seem to match with my experience. I can reproduce this on the net tree with the listed commands and the behaviour matches what I see in the code. Thanks, Rob
On 4/20/17 8:39 AM, Robert Shearman wrote: > On 20/04/17 15:21, David Ahern wrote: >> On 4/20/17 6:58 AM, Robert Shearman wrote: >>> David reported that doing the following: >>> >>> ip li add red type vrf table 10 >>> ip link set dev eth1 vrf red >>> ip addr add 127.0.0.1/8 dev red >>> ip link set dev eth1 up >>> ip li set red up >>> ping -c1 -w1 -I red 127.0.0.1 >>> ip li del red >>> >>> results in a hang with this message: >>> >>> unregister_netdevice: waiting for red to become free. Usage count >>> = 1 >> >> I think you misunderstood my comment. The above works fine today. There >> is no bug with refcnts. >> >> It breaks with your patches wanting to use a VRF device with the main >> table (254). > > That doesn't seem to match with my experience. I can reproduce this on > the net tree with the listed commands and the behaviour matches what I > see in the code. Our mileage varies: root@kenny-jessie3:~# ip li add red type vrf table 10 root@kenny-jessie3:~# ip link set dev eth1 vrf red root@kenny-jessie3:~# ip addr add 127.0.0.1/8 dev red root@kenny-jessie3:~# ip link set dev eth1 up root@kenny-jessie3:~# ip li set red up root@kenny-jessie3:~# ping -c1 -w1 -I red 127.0.0.1 PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.059 ms --- 127.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.059/0.059/0.059/0.000 ms root@kenny-jessie3:~# ip li del red root@kenny-jessie3:~# uname -a Linux kenny-jessie3 4.11.0-rc6+ #8 SMP Wed Apr 19 11:53:48 PDT 2017 x86_64 GNU/Linux The above is one of many tests I run and never hit a problem deleting a VRF device. dst's attached to fnhe_rth_output and fnhe_rth_input appear to be properly flushed and device references released when the device is deleted (NETDEV_DOWN and then NETDEV_UNREGISTER). Can you send me your kernel config and "sysctl -a --pattern 'net.ipv4'"? Perhaps you have something enabled I don't.
On 20/04/17 15:59, David Ahern wrote: > On 4/20/17 8:39 AM, Robert Shearman wrote: >> On 20/04/17 15:21, David Ahern wrote: >>> On 4/20/17 6:58 AM, Robert Shearman wrote: >>>> David reported that doing the following: >>>> >>>> ip li add red type vrf table 10 >>>> ip link set dev eth1 vrf red >>>> ip addr add 127.0.0.1/8 dev red >>>> ip link set dev eth1 up >>>> ip li set red up >>>> ping -c1 -w1 -I red 127.0.0.1 >>>> ip li del red >>>> >>>> results in a hang with this message: >>>> >>>> unregister_netdevice: waiting for red to become free. Usage count >>>> = 1 >>> >>> I think you misunderstood my comment. The above works fine today. There >>> is no bug with refcnts. >>> >>> It breaks with your patches wanting to use a VRF device with the main >>> table (254). >> >> That doesn't seem to match with my experience. I can reproduce this on >> the net tree with the listed commands and the behaviour matches what I >> see in the code. > > Our mileage varies: > > root@kenny-jessie3:~# ip li add red type vrf table 10 > root@kenny-jessie3:~# ip link set dev eth1 vrf red > root@kenny-jessie3:~# ip addr add 127.0.0.1/8 dev red > root@kenny-jessie3:~# ip link set dev eth1 up > root@kenny-jessie3:~# ip li set red up > root@kenny-jessie3:~# ping -c1 -w1 -I red 127.0.0.1 > PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data. > 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.059 ms > > --- 127.0.0.1 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.059/0.059/0.059/0.000 ms > root@kenny-jessie3:~# ip li del red > root@kenny-jessie3:~# uname -a > Linux kenny-jessie3 4.11.0-rc6+ #8 SMP Wed Apr 19 11:53:48 PDT 2017 > x86_64 GNU/Linux > > The above is one of many tests I run and never hit a problem deleting a > VRF device. dst's attached to fnhe_rth_output and fnhe_rth_input appear > to be properly flushed and device references released when the device is > deleted (NETDEV_DOWN and then NETDEV_UNREGISTER). > > Can you send me your kernel config and "sysctl -a --pattern 'net.ipv4'"? > Perhaps you have something enabled I don't. The key thing I think is the ip rules: $ ip rule 0: from all lookup local 1000: from all lookup [l3mdev-table] 32766: from all lookup main 32767: from all lookup default Maybe you have the local rule moved down at startup? I'll send you the requested information if not. Thanks, Rob
On 4/20/17 9:05 AM, Robert Shearman wrote: > The key thing I think is the ip rules: > > $ ip rule > 0: from all lookup local > 1000: from all lookup [l3mdev-table] > 32766: from all lookup main > 32767: from all lookup default > > Maybe you have the local rule moved down at startup? yes that would be a problem. With this test the 127.0.0.1 lookup is matching on the wrong table: perf probe fib_table_lookup%return ret=%ax perf record -e fib:*,probe:* -a -- ping -c1 -w1 -I red 127.0.0.1 ... perf script ping 2803 [000] 70559.086446: fib:fib_table_lookup: table 255 oif 31 iif 1 src 0.0.0.0 dst 127.0.0.1 tos 0 scope 0 flags 4 ping 2803 [000] 70559.086451: fib:fib_table_lookup_nh: nexthop dev lo oif 1 src 127.0.0.1 ping 2803 [000] 70559.086453: probe:fib_table_lookup: (ffffffff8146aaaa <- ffffffff81470734) ret=0x0 ping 2803 [000] 70559.086458: fib:fib_table_lookup: table 255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 4 ping 2803 [000] 70559.086459: fib:fib_table_lookup_nh: nexthop dev lo oif 1 src 127.0.0.1 ping 2803 [000] 70559.086460: probe:fib_table_lookup: (ffffffff8146aaaa <- ffffffff81470734) ret=0x0 ping 2803 [000] 70559.086752: fib:fib_table_lookup: table 255 oif 31 iif 1 src 0.0.0.0 dst 127.0.0.1 tos 0 scope 0 flags 4 ping 2803 [000] 70559.086754: fib:fib_table_lookup_nh: nexthop dev lo oif 1 src 127.0.0.1 ping 2803 [000] 70559.086755: probe:fib_table_lookup: (ffffffff8146aaaa <- ffffffff81470734) ret=0x0 ping 2803 [000] 70559.086768: fib:fib_table_lookup: table 255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 5 ping 2803 [000] 70559.086769: fib:fib_table_lookup_nh: nexthop dev lo oif 1 src 127.0.0.1 ping 2803 [000] 70559.086770: probe:fib_table_lookup: (ffffffff8146aaaa <- ffffffff81470734) ret=0x0 ping 2803 [000] 70559.086781: fib:fib_table_lookup: table 255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 4 ping 2803 [000] 70559.086782: fib:fib_table_lookup_nh: nexthop dev lo oif 1 src 127.0.0.1 ping 2803 [000] 70559.086782: probe:fib_table_lookup: (ffffffff8146aaaa <- ffffffff81470734) ret=0x0 ping 2803 [000] 70559.086787: fib:fib_table_lookup: table 255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 5 ping 2803 [000] 70559.086788: fib:fib_table_lookup_nh: nexthop dev lo oif 1 src 127.0.0.1 ping 2803 [000] 70559.086789: probe:fib_table_lookup: (ffffffff8146aaaa <- ffffffff81470734) ret=0x0 Table 255 is the wrong table. That is the ultimate problem here.
On 20/04/17 16:16, David Ahern wrote: > On 4/20/17 9:05 AM, Robert Shearman wrote: >> The key thing I think is the ip rules: >> >> $ ip rule >> 0: from all lookup local >> 1000: from all lookup [l3mdev-table] >> 32766: from all lookup main >> 32767: from all lookup default >> >> Maybe you have the local rule moved down at startup? > > yes that would be a problem. With this test the 127.0.0.1 lookup is > matching on the wrong table: > > perf probe fib_table_lookup%return ret=%ax > perf record -e fib:*,probe:* -a -- ping -c1 -w1 -I red 127.0.0.1 > ... > perf script > ... > Table 255 is the wrong table. That is the ultimate problem here. > The table used for the lookup could be retrieved from the oif, but the check I introduced would still be required to cope with the unusual, but not disallowed, case of two VRF master devices referring to the same table. Thanks, Rob
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index acd69cfe2951..f667783ffd19 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2125,6 +2125,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res, fi = NULL; } + /* If the flag to skip the nh oif check is set then the output + * device may not match the nh device, so cannot use or add to + * cache in that case. + */ + if (unlikely(fl4->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF && + FIB_RES_NH(*res).nh_dev != dev_out)) + do_cache = false; + fnhe = NULL; do_cache &= fi != NULL; if (do_cache) {
David reported that doing the following: ip li add red type vrf table 10 ip link set dev eth1 vrf red ip addr add 127.0.0.1/8 dev red ip link set dev eth1 up ip li set red up ping -c1 -w1 -I red 127.0.0.1 ip li del red results in a hang with this message: unregister_netdevice: waiting for red to become free. Usage count = 1 The problem is caused by caching the dst used for sending the packet out of the specified interface on the route that the lookup returned from the local table when the rule for the lookup in the local table is ordered before the rule for lookups using l3mdevs. Thus the dst could stay around until the route in the local table is deleted which may be never. Address the problem by not allocating a cacheable output dst if FLOWI_FLAG_SKIP_NH_OIF is set and the nh device differs from the device used for the dst. Fixes: ebfc102c566d ("net: vrf: Flip IPv4 output path from FIB lookup hook to out hook") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Robert Shearman <rshearma@brocade.com> --- net/ipv4/route.c | 8 ++++++++ 1 file changed, 8 insertions(+)