diff mbox

[ovs-dev] bridge: Prohibit "default" bridge name.

Message ID 1493290930-12569-1-git-send-email-u9012063@gmail.com
State Superseded
Headers show

Commit Message

William Tu April 27, 2017, 11:02 a.m. UTC
Before the patch, when users create bridge named "default", although
ovs-vsctl fails but vswitchd in the background will keep retrying it,
causing the systemd-udev to reach 100% cpu utilization.  The reason is
due to frequent calls into kernel's register_netdevice function,
which will invoke several kernel elements who has registered on the
netdevice notifier chain.  One of the notifier, the inetdev_event rejects
this devname and register_netdevice fails.  The patch prohibits creating
"default" bridge name.

VMWare-BZ: #1842388
Signed-off-by: William Tu <u9012063@gmail.com>
---
 vswitchd/bridge.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Gregory Rose April 27, 2017, 1:55 p.m. UTC | #1
On Thu, 2017-04-27 at 04:02 -0700, William Tu wrote:
> Before the patch, when users create bridge named "default", although
> ovs-vsctl fails but vswitchd in the background will keep retrying it,
> causing the systemd-udev to reach 100% cpu utilization.  The reason is
> due to frequent calls into kernel's register_netdevice function,
> which will invoke several kernel elements who has registered on the
> netdevice notifier chain.  One of the notifier, the inetdev_event rejects
> this devname and register_netdevice fails.  The patch prohibits creating
> "default" bridge name.
> 
> VMWare-BZ: #1842388
> Signed-off-by: William Tu <u9012063@gmail.com>
> ---
>  vswitchd/bridge.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index ebb6249416fa..e8a22f82e1d6 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -1710,7 +1710,8 @@ add_del_bridges(const struct ovsrec_open_vswitch *cfg)
>          static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>          const struct ovsrec_bridge *br_cfg = cfg->bridges[i];
>  
> -        if (strchr(br_cfg->name, '/') || strchr(br_cfg->name, '\\')) {
> +        if (strchr(br_cfg->name, '/') || strchr(br_cfg->name, '\\')
> +            || !strcmp(br_cfg->name, "default")) {
>              /* Prevent remote ovsdb-server users from accessing arbitrary
>               * directories, e.g. consider a bridge named "../../../etc/".
>               *

Acked-by: Greg Rose <gvrose8192@gmail.com>
Ben Pfaff April 27, 2017, 4:57 p.m. UTC | #2
On Thu, Apr 27, 2017 at 04:02:10AM -0700, William Tu wrote:
> Before the patch, when users create bridge named "default", although
> ovs-vsctl fails but vswitchd in the background will keep retrying it,
> causing the systemd-udev to reach 100% cpu utilization.  The reason is
> due to frequent calls into kernel's register_netdevice function,
> which will invoke several kernel elements who has registered on the
> netdevice notifier chain.  One of the notifier, the inetdev_event rejects
> this devname and register_netdevice fails.  The patch prohibits creating
> "default" bridge name.
> 
> VMWare-BZ: #1842388
> Signed-off-by: William Tu <u9012063@gmail.com>

It all seems very arbitrary. Do we understand why this is an invalid
device name?
William Tu April 27, 2017, 5:17 p.m. UTC | #3
On Thu, Apr 27, 2017 at 9:57 AM, Ben Pfaff <blp@ovn.org> wrote:
> On Thu, Apr 27, 2017 at 04:02:10AM -0700, William Tu wrote:
>> Before the patch, when users create bridge named "default", although
>> ovs-vsctl fails but vswitchd in the background will keep retrying it,
>> causing the systemd-udev to reach 100% cpu utilization.  The reason is
>> due to frequent calls into kernel's register_netdevice function,
>> which will invoke several kernel elements who has registered on the
>> netdevice notifier chain.  One of the notifier, the inetdev_event rejects
>> this devname and register_netdevice fails.  The patch prohibits creating
>> "default" bridge name.
>>
>> VMWare-BZ: #1842388
>> Signed-off-by: William Tu <u9012063@gmail.com>
>
> It all seems very arbitrary. Do we understand why this is an invalid
> device name?

Yes, when kernel is configured with CONFIG_SYSCTL, creating a new
netdev creates a dir in /proc/sys/net/ipv4/conf/<device name>

The <device name> "default" and "all" is pre-existed when SYSCTL
starts (which means we should also prohibit "all") for default
property of the system's netdev. So sysctl prevents creating dev->name
is "default" or "all". A call stack is below if interested:
sysctl_dev_name_is_allowed
   devinet_sysctl_register
     inetdev_event
       notifier_call_chain
         raw_notifier_call_chain
           call_netdevice_notifiers_info
             register_netdevice

Regards,
William
Ben Pfaff April 27, 2017, 6:17 p.m. UTC | #4
On Thu, Apr 27, 2017 at 10:17:20AM -0700, William Tu wrote:
> On Thu, Apr 27, 2017 at 9:57 AM, Ben Pfaff <blp@ovn.org> wrote:
> > On Thu, Apr 27, 2017 at 04:02:10AM -0700, William Tu wrote:
> >> Before the patch, when users create bridge named "default", although
> >> ovs-vsctl fails but vswitchd in the background will keep retrying it,
> >> causing the systemd-udev to reach 100% cpu utilization.  The reason is
> >> due to frequent calls into kernel's register_netdevice function,
> >> which will invoke several kernel elements who has registered on the
> >> netdevice notifier chain.  One of the notifier, the inetdev_event rejects
> >> this devname and register_netdevice fails.  The patch prohibits creating
> >> "default" bridge name.
> >>
> >> VMWare-BZ: #1842388
> >> Signed-off-by: William Tu <u9012063@gmail.com>
> >
> > It all seems very arbitrary. Do we understand why this is an invalid
> > device name?
> 
> Yes, when kernel is configured with CONFIG_SYSCTL, creating a new
> netdev creates a dir in /proc/sys/net/ipv4/conf/<device name>
> 
> The <device name> "default" and "all" is pre-existed when SYSCTL
> starts (which means we should also prohibit "all") for default
> property of the system's netdev. So sysctl prevents creating dev->name
> is "default" or "all". A call stack is below if interested:
> sysctl_dev_name_is_allowed
>    devinet_sysctl_register
>      inetdev_event
>        notifier_call_chain
>          raw_notifier_call_chain
>            call_netdevice_notifiers_info
>              register_netdevice

Do we get the same behavior (100% CPU) if creating a bridge fails for
other reasons?  For example, it might fail because a network device
already exists with the given name, or because the name is too long for
a network device name.  If we do get 100% CPU from such a failure, then
we should fix the root of the problem instead of blacklisting particular
names.
William Tu April 27, 2017, 9:47 p.m. UTC | #5
On Thu, Apr 27, 2017 at 11:17 AM, Ben Pfaff <blp@ovn.org> wrote:
> On Thu, Apr 27, 2017 at 10:17:20AM -0700, William Tu wrote:
>> On Thu, Apr 27, 2017 at 9:57 AM, Ben Pfaff <blp@ovn.org> wrote:
>> > On Thu, Apr 27, 2017 at 04:02:10AM -0700, William Tu wrote:
>> >> Before the patch, when users create bridge named "default", although
>> >> ovs-vsctl fails but vswitchd in the background will keep retrying it,
>> >> causing the systemd-udev to reach 100% cpu utilization.  The reason is
>> >> due to frequent calls into kernel's register_netdevice function,
>> >> which will invoke several kernel elements who has registered on the
>> >> netdevice notifier chain.  One of the notifier, the inetdev_event rejects
>> >> this devname and register_netdevice fails.  The patch prohibits creating
>> >> "default" bridge name.
>> >>
>> >> VMWare-BZ: #1842388
>> >> Signed-off-by: William Tu <u9012063@gmail.com>
>> >
>> > It all seems very arbitrary. Do we understand why this is an invalid
>> > device name?
>>
>> Yes, when kernel is configured with CONFIG_SYSCTL, creating a new
>> netdev creates a dir in /proc/sys/net/ipv4/conf/<device name>
>>
>> The <device name> "default" and "all" is pre-existed when SYSCTL
>> starts (which means we should also prohibit "all") for default
>> property of the system's netdev. So sysctl prevents creating dev->name
>> is "default" or "all". A call stack is below if interested:
>> sysctl_dev_name_is_allowed
>>    devinet_sysctl_register
>>      inetdev_event
>>        notifier_call_chain
>>          raw_notifier_call_chain
>>            call_netdevice_notifiers_info
>>              register_netdevice
>
> Do we get the same behavior (100% CPU) if creating a bridge fails for
> other reasons?  For example, it might fail because a network device
> already exists with the given name, or because the name is too long for
> a network device name.  If we do get 100% CPU from such a failure, then
> we should fix the root of the problem instead of blacklisting particular
> names.

There are two scenarios:
1) if the bridge name exists, ex: eth0
then "ovs-vsctl add-br eth0" fails due to EEXIST
2) if the bridge name does not exists, but is "default" or "all"
then "ovs-vsctl add-br default" fails due to EINVAL

From OVS's point of view it's the same, ovs-vsctl fails creating the
bridge, and keeps retry. From the whole system's point of view, (2)
has impact on other Linux subsystems, due to kernel netdev notifier
chain mechanism informing other subsystems when trying to add a new
device, while (1) fails pretty early in register_netdevice() and has
no impact.

Or instead of blacklisting, maybe add a max retry count?

Regards,
William
Gregory Rose April 28, 2017, 2:18 a.m. UTC | #6
On Thu, 2017-04-27 at 14:47 -0700, William Tu wrote:
> On Thu, Apr 27, 2017 at 11:17 AM, Ben Pfaff <blp@ovn.org> wrote:
> > On Thu, Apr 27, 2017 at 10:17:20AM -0700, William Tu wrote:
> >> On Thu, Apr 27, 2017 at 9:57 AM, Ben Pfaff <blp@ovn.org> wrote:
> >> > On Thu, Apr 27, 2017 at 04:02:10AM -0700, William Tu wrote:
> >> >> Before the patch, when users create bridge named "default", although
> >> >> ovs-vsctl fails but vswitchd in the background will keep retrying it,
> >> >> causing the systemd-udev to reach 100% cpu utilization.  The reason is
> >> >> due to frequent calls into kernel's register_netdevice function,
> >> >> which will invoke several kernel elements who has registered on the
> >> >> netdevice notifier chain.  One of the notifier, the inetdev_event rejects
> >> >> this devname and register_netdevice fails.  The patch prohibits creating
> >> >> "default" bridge name.
> >> >>
> >> >> VMWare-BZ: #1842388
> >> >> Signed-off-by: William Tu <u9012063@gmail.com>
> >> >
> >> > It all seems very arbitrary. Do we understand why this is an invalid
> >> > device name?
> >>
> >> Yes, when kernel is configured with CONFIG_SYSCTL, creating a new
> >> netdev creates a dir in /proc/sys/net/ipv4/conf/<device name>
> >>
> >> The <device name> "default" and "all" is pre-existed when SYSCTL
> >> starts (which means we should also prohibit "all") for default
> >> property of the system's netdev. So sysctl prevents creating dev->name
> >> is "default" or "all". A call stack is below if interested:
> >> sysctl_dev_name_is_allowed
> >>    devinet_sysctl_register
> >>      inetdev_event
> >>        notifier_call_chain
> >>          raw_notifier_call_chain
> >>            call_netdevice_notifiers_info
> >>              register_netdevice
> >
> > Do we get the same behavior (100% CPU) if creating a bridge fails for
> > other reasons?  For example, it might fail because a network device
> > already exists with the given name, or because the name is too long for
> > a network device name.  If we do get 100% CPU from such a failure, then
> > we should fix the root of the problem instead of blacklisting particular
> > names.
> 
> There are two scenarios:
> 1) if the bridge name exists, ex: eth0
> then "ovs-vsctl add-br eth0" fails due to EEXIST
> 2) if the bridge name does not exists, but is "default" or "all"
> then "ovs-vsctl add-br default" fails due to EINVAL
> 
> From OVS's point of view it's the same, ovs-vsctl fails creating the
> bridge, and keeps retry. From the whole system's point of view, (2)
> has impact on other Linux subsystems, due to kernel netdev notifier
> chain mechanism informing other subsystems when trying to add a new
> device, while (1) fails pretty early in register_netdevice() and has
> no impact.
> 
> Or instead of blacklisting, maybe add a max retry count?

Can you see if using a retry count still ensures this bug is fixed?

VMWare-BZ: #1842388

If so that's probably a better approach. Like Ben I'm a little queasy
about saying we can't have a 'default' bridge under any circumstance.

Thanks,

- Greg

> 
> Regards,
> William
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
William Tu April 28, 2017, 9:44 p.m. UTC | #7
> OK, if I understand properly here, the problem is that asking the Linux
> kernel datapath to add a bridge named "default" does two things.  First,
> it fails.  Second, it triggers a notifier that causes OVS to try again.
> The second part is what makes this different from other failures and
> what makes OVS use 100% CPU in this case but not other cases.  Is that
> right?
>
Yes

> If I'm right, then how about something like the following instead?  It
> pushes this down to Linux-specific code, which seems better because this
> is a Linux-specific issue, and it thoroughly documents the problem.  It
> compiles but I have not tested it.
>
> Thanks,
>
> Ben.
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 9ff1333f8e85..79e827303d07 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
> + * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 Nicira, Inc.
>   *
>   * Licensed under the Apache License, Version 2.0 (the "License");
>   * you may not use this file except in compliance with the License.
> @@ -773,10 +773,28 @@ netdev_linux_alloc(void)
>      return &netdev->up;
>  }
>
> -static void
> -netdev_linux_common_construct(struct netdev_linux *netdev)
> -{
> +static int
> +netdev_linux_common_construct(struct netdev *netdev_)
> +{
> +    /* Prevent any attempt to create (or open) a network device named "default"
> +     * or "all".  These device names are effectively reserved on Linux because
> +     * /proc/sys/net/ipv4/conf/ always contains directories by these names.  By
> +     * itself this wouldn't call for any special treatment, but in practice if
> +     * a program tries to create devices with these names, it causes the kernel
> +     * to fire a "new device" notification event even though creation failed,
> +     * and in turn that causes OVS to wake up and try to create them again,
> +     * which ends up as a 100% CPU loop. */
> +    struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> +    const char *name = netdev_->name;
> +    if (!strcmp(name, "default") || !strcmp(name, "all")) {
> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> +        VLOG_WARN_RL(&rl, "%s: Linux forbids network device with this name",
> +                     name);
> +        return EINVAL;
> +    }
> +
>      ovs_mutex_init(&netdev->mutex);
> +    return 0;
>  }
>
>  /* Creates system and internal devices. */
> @@ -784,9 +802,10 @@ static int
>  netdev_linux_construct(struct netdev *netdev_)
>  {
>      struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> -    int error;
> -
> -    netdev_linux_common_construct(netdev);
> +    int error = netdev_linux_common_construct(netdev_);
> +    if (error) {
> +        return error;
> +    }
>
>      error = get_flags(&netdev->up, &netdev->ifi_flags);
>      if (error == ENODEV) {
> @@ -817,9 +836,11 @@ netdev_linux_construct_tap(struct netdev *netdev_)
>      static const char tap_dev[] = "/dev/net/tun";
>      const char *name = netdev_->name;
>      struct ifreq ifr;
> -    int error;
>
> -    netdev_linux_common_construct(netdev);
> +    int error = netdev_linux_common_construct(netdev_);
> +    if (error) {
> +        return error;
> +    }
>
>      /* Open tap device. */
>      netdev->tap_fd = open(tap_dev, O_RDWR);

Thanks! I've tested it and it works ok.
Regards,
William
diff mbox

Patch

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index ebb6249416fa..e8a22f82e1d6 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -1710,7 +1710,8 @@  add_del_bridges(const struct ovsrec_open_vswitch *cfg)
         static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
         const struct ovsrec_bridge *br_cfg = cfg->bridges[i];
 
-        if (strchr(br_cfg->name, '/') || strchr(br_cfg->name, '\\')) {
+        if (strchr(br_cfg->name, '/') || strchr(br_cfg->name, '\\')
+            || !strcmp(br_cfg->name, "default")) {
             /* Prevent remote ovsdb-server users from accessing arbitrary
              * directories, e.g. consider a bridge named "../../../etc/".
              *