Message ID | 1411768637-6809-6-git-send-email-mcgrof@do-not-panic.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Hello, On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote: ... > Systemd should consider enabling async probe on device drivers > it loads through systemd-udev but probably does not want to > enable it for modules loaded through systemd-modules-load > (modules-load.d). At least on my booting enablign async probe > for all modules fails to boot as such in order to make this Did you find out why boot failed with those modules? > a bit more useful we whitelist a few buses where it should be > at least in theory safe to try to enable async probe. This > way even if systemd tried to ask to enable async probe for all > its device drivers the kernel won't blindly do this. We also > have the sync_probe flag which device drivers can themselves > enable *iff* its known the device driver should never async > probe. > > In order to help *test* things folks can use the bus.safe_mod_async_probe=1 > kernel parameter which will work as if userspace would have > requested all modules to load with async probe. Daring folks can > also use bus.force_mod_async_probe=1 which will enable asynch probe > even on buses not tested in any way yet, if you use that though > you're on your own. If those two knobs are meant for debugging, let's please make that fact immediately evident. e.g. Make them ugly boot params like "__DEVEL__driver_force_mod_async_probe". Devel/debug options ending up becoming stable interface are really nasty. > +struct driver_attach_work { > + struct work_struct work; > + struct device_driver *driver; > +}; > + > struct driver_private { > struct kobject kobj; > struct klist klist_devices; > struct klist_node knode_bus; > struct module_kobject *mkobj; > + struct driver_attach_work *attach_work; > struct device_driver *driver; > }; How many bytes are we saving by allocating it separately? Can't we just embed it in driver_private? > +static void driver_attach_workfn(struct work_struct *work) > +{ > + int ret; > + struct driver_attach_work *attach_work = > + container_of(work, struct driver_attach_work, work); > + struct device_driver *drv = attach_work->driver; > + ktime_t calltime, delta, rettime; > + unsigned long long duration; This could just be a personal preference but I think it's easier to read if local vars w/ initializers come before the ones w/o. > + > + calltime = ktime_get(); > + > + ret = driver_attach(drv); > + if (ret != 0) { > + remove_driver_private(drv); > + bus_put(drv->bus); > + } > + > + rettime = ktime_get(); > + delta = ktime_sub(rettime, calltime); > + duration = (unsigned long long) ktime_to_ns(delta) >> 10; > + > + pr_debug("bus: '%s': add driver %s attach completed after %lld usecs\n", > + drv->bus->name, drv->name, duration); Why do we have the above printout for async path but not sync path? It's kinda weird for the code path to diverge like this. Shouldn't the only difference be the context probes are running from? ... > +static bool drv_enable_async_probe(struct device_driver *drv, > + struct bus_type *bus) > +{ > + struct module *mod; > + > + if (!drv->owner || drv->sync_probe) > + return false; > + > + if (force_mod_async) > + return true; > + > + mod = drv->owner; > + if (!safe_mod_async && !mod->async_probe_requested) > + return false; > + > + /* For now lets avoid stupid bug reports */ > + if (!strcmp(bus->name, "pci") || > + !strcmp(bus->name, "pci_express") || > + !strcmp(bus->name, "hid") || > + !strcmp(bus->name, "sdio") || > + !strcmp(bus->name, "gameport") || > + !strcmp(bus->name, "mmc") || > + !strcmp(bus->name, "i2c") || > + !strcmp(bus->name, "platform") || > + !strcmp(bus->name, "usb")) > + return true; Ugh... things like this tend to become permanent. Do we really need this? And how are we gonna find out what's broken why w/o bug reports? > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index e4ffbcf..7999aba 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -507,6 +507,13 @@ static void __device_release_driver(struct device *dev) > > drv = dev->driver; > if (drv) { > + if (drv->owner && !drv->sync_probe) { > + struct module *mod = drv->owner; > + struct driver_private *priv = drv->p; > + > + if (mod->async_probe_requested) > + flush_work(&priv->attach_work->work); This can be unconditional flus_work(&priv->attach_work) if attach_work isn't separately allocated. > static int unknown_module_param_cb(char *param, char *val, const char *modname, > void *arg) > { > + int ret; > + struct module *mod = arg; Ditto with the order of definitions. > + if (strcmp(param, "async_probe") == 0) { > + mod->async_probe_requested = true; > + return 0; > + } Generally looks good to me. Thanks a lot for doing this! :)
Hi Luis, Thanks for the patches and the detailed analysis. Feel free to add Acked-by: Tom Gundersen <teg@jklm.no> Minor comments on the commit message below. On Fri, Sep 26, 2014 at 11:57 PM, Luis R. Rodriguez <mcgrof@do-not-panic.com> wrote: > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > Some init systems may wish to express the desire to have > device drivers run their device driver's bus probe() run > asynchronously. This implements support for this and > allows userspace to request async probe as a preference > through a generic shared device driver module parameter, > async_probe. Implemention for async probe is supported > through a module parameter given that since synchronous > probe has been prevalent for years some userspace might > exist which relies on the fact that the device driver will > probe synchronously and the assumption that devices it > provides will be immediately available after this. > > Some device driver might not be able to run async probe > so we enable device drivers to annotate this to prevent > this module parameter from having any effect on them. > > This implementation uses queue_work(system_unbound_wq) > to queue async probes, this should enable probe to run > slightly *faster* if the driver's probe path did not > have much interaction with other workqueues otherwise > it may run _slightly_ slower. Tests were done with cxgb4, > which is known to take long on probe, both without > having to run request_firmware() [0] and then by > requiring it to use request_firmware() [1]. The > difference in run time are only measurable in microseconds: > > =====================================================================| > strategy fw (usec) no-fw (usec) | > ---------------------------------------------------------------------| > synchronous 24472569 1307563 | > kthread 25066415.5 1309868.5 | > queue_work(system_unbound_wq) 24913661.5 1307631 | > ---------------------------------------------------------------------| > > In practice, in seconds, the difference is barely noticeable: > > =====================================================================| > strategy fw (s) no-fw (s) | > ---------------------------------------------------------------------| > synchronous 24.47 1.31 | > kthread 25.07 1.31 | > queue_work(system_unbound_wq) 24.91 1.31 | > ---------------------------------------------------------------------| > > [0] http://ftp.suse.com/pub/people/mcgrof/async-probe/probe-cgxb4-no-firmware.png > [1] http://ftp.suse.com/pub/people/mcgrof/async-probe/probe-cgxb4-firmware.png > > The rest of the commit log documents why this feature was implemented > primarily first for systemd and things it should consider next. > > Systemd has a general timeout for all workers currently set to 180 > seconds after which it will send a sigkill signal. Systemd now has a > warning which is issued once it reaches 1/3 of the timeout. The original > motivation for the systemd timeout was to help track device drivers > which do not use asynch firmware loading on init() and the timeout was > originally set to 30 seconds. Please note that the motivation for the timeout in systemd had nothing to do with async firmware loading (that was just the case where problems cropped up). The motivation was to not allow udev-workers to stay around indefinitely, and hence put an upper-bound on their duration (initially 180 s). At some point the bound was reduced to 30 seconds to make sure module-loading would bail out before the kernel's firmware loading timeout would bail out (60s I believe). That is no longer relevant, which is why it was safe to reset the timeout to 180 s. > Since systemd + kernel are heavily tied in for the purposes of this > patch it is assumed you have merged on systemd the following > commits: > > 671174136525ddf208cdbe75d6d6bd159afa961f udev: timeout - warn after a third of the timeout before killing > b5338a19864ac3f5632aee48069a669479621dca udev: timeout - increase timeout > 2e92633dbae52f5ac9b7b2e068935990d475d2cd udev: bump event timeout to 60 seconds > be2ea723b1d023b3d385d3b791ee4607cbfb20ca udev: remove userspace firmware loading support > 9f20a8a376f924c8eb5423cfc1f98644fc1e2d1a udev: fixup commit > dd5eddd28a74a49607a8fffcaf960040dba98479 udev: unify event timeout handling > 9719859c07aa13539ed2cd4b31972cd30f678543 udevd: add --event-timeout commandline option > > Since we bundle together serially driver init() and probe() > on module initialiation systemd's imposed timeout put a limit on the > amount of time a driver init() and probe routines can take. There's a > few overlooked issues with this and the timeout in general: > > 0) Not all drivers are killed, the signal is just sent and > the kill will only be acted upoon if the driver you loaded > happens to have some code path that either uses kthreads (which > as of 786235ee are now killable), or uses some code which checks for > fatal_signal_pending() on the kernel somewhere -- i.e: pci_read_vpd(). Shouldn't this be seen as something to be fixed in the kernel? I mean, do we not want userspace to have the possibility to kill udev/modprobe even disregarding the worker timeouts (say at shutdown, or before switching from the initrd)? > 1) Since systemd is the only one logging the sigkill debugging that > drivers are not loaded or in the worst case *failed to boot* because > of a sigkill has proven hard to debug. Care to clarify this a bit? Are the udev logs somehow unclear? If you think we can improve the logging from udev, please ping me about that and I'll sort it out. > 2) When and if the signal is received by the driver somehow > the driver may fail at different points in its initialization > and unless all error paths on the driver are implemented > perfectly this could mean leaving a device in a half > initialized state. > > 3) The timeout is penalizing device drivers that take long on > probe(), this wasn't the original motivation. Systemd seems > to have been under assumption that probe was asynchronous, > this perhaps is true as an *objective* and goal for *some > subsystems* but by no means is it true that we've been on a wide > crusade to ensure this for all device drivers. It may be a good > idea for *many* device drivers but penalizing them with a kill > for taking long on probe is simply unacceptable specially > when the timeout is completely arbitrary. The point is really not to "penalize" anything, we just need to make sure we put some sort of restrictions on our workers so they don't hang around forever. > 4) The driver core calls probe for *all* devices that a driver can > claim and it does so serially, so if a device driver will need > to probe 3 devices and if probe on the device driver is synchronous > the amount of time that module loading will take will be: > > driver load time = init() + probe for 3 devices serially > > The timeout ultimatley ends up limiting the number of devices that > *any* device driver can support based on the following formula: > > number_devices = systemd_timeout > ------------------------------------- > max known probe time for driver > > Lastly since the error value passed down is the value of > the probe for the last device probed the module will fail > to load and all devices will fail to be available. > > In the Linux kernel we don't want to work around the timeout, > instead systemd must be changed to take all the above into > consideration when issuing any kills on device drivers, ideally > the sigkill should be considered to be ignored at least for > kmod. In addition to this we help systemd by giving it what it > originally considered was there and enable it to ask device > drivers to use asynchronous probe. This patch addresses that > feature. > > Systemd should consider enabling async probe on device drivers > it loads through systemd-udev but probably does not want to > enable it for modules loaded through systemd-modules-load > (modules-load.d). At least on my booting enablign async probe > for all modules fails to boot as such in order to make this > a bit more useful we whitelist a few buses where it should be > at least in theory safe to try to enable async probe. This > way even if systemd tried to ask to enable async probe for all > its device drivers the kernel won't blindly do this. We also > have the sync_probe flag which device drivers can themselves > enable *iff* its known the device driver should never async > probe. > > In order to help *test* things folks can use the bus.safe_mod_async_probe=1 > kernel parameter which will work as if userspace would have > requested all modules to load with async probe. Daring folks can > also use bus.force_mod_async_probe=1 which will enable asynch probe > even on buses not tested in any way yet, if you use that though > you're on your own. > > Cc: Tejun Heo <tj@kernel.org> > Cc: Arjan van de Ven <arjan@linux.intel.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Joseph Salisbury <joseph.salisbury@canonical.com> > Cc: Kay Sievers <kay@vrfy.org> > Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> > Cc: Tim Gardner <tim.gardner@canonical.com> > Cc: Pierre Fersing <pierre-fersing@pierref.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: Benjamin Poirier <bpoirier@suse.de> > Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com> > Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com> > Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com> > Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com> > Cc: Casey Leedom <leedom@chelsio.com> > Cc: Hariprasad S <hariprasad@chelsio.com> > Cc: Santosh Rastapur <santosh@chelsio.com> > Cc: MPT-FusionLinux.pdl@avagotech.com > Cc: linux-scsi@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: netdev@vger.kernel.org > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> > --- > drivers/base/base.h | 6 +++ > drivers/base/bus.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++-- > drivers/base/dd.c | 7 +++ > include/linux/module.h | 2 + > kernel/module.c | 12 ++++- > 5 files changed, 159 insertions(+), 5 deletions(-) > > diff --git a/drivers/base/base.h b/drivers/base/base.h > index 251c5d3..24836f1 100644 > --- a/drivers/base/base.h > +++ b/drivers/base/base.h > @@ -43,11 +43,17 @@ struct subsys_private { > }; > #define to_subsys_private(obj) container_of(obj, struct subsys_private, subsys.kobj) > > +struct driver_attach_work { > + struct work_struct work; > + struct device_driver *driver; > +}; > + > struct driver_private { > struct kobject kobj; > struct klist klist_devices; > struct klist_node knode_bus; > struct module_kobject *mkobj; > + struct driver_attach_work *attach_work; > struct device_driver *driver; > }; > #define to_driver(obj) container_of(obj, struct driver_private, kobj) > diff --git a/drivers/base/bus.c b/drivers/base/bus.c > index a5f41e4..41e321e6 100644 > --- a/drivers/base/bus.c > +++ b/drivers/base/bus.c > @@ -85,6 +85,7 @@ static void driver_release(struct kobject *kobj) > struct driver_private *drv_priv = to_driver(kobj); > > pr_debug("driver: '%s': %s\n", kobject_name(kobj), __func__); > + kfree(drv_priv->attach_work); > kfree(drv_priv); > } > > @@ -662,10 +663,125 @@ static void remove_driver_private(struct device_driver *drv) > struct driver_private *priv = drv->p; > > kobject_put(&priv->kobj); > + kfree(priv->attach_work); > kfree(priv); > drv->p = NULL; > } > > +static void driver_attach_workfn(struct work_struct *work) > +{ > + int ret; > + struct driver_attach_work *attach_work = > + container_of(work, struct driver_attach_work, work); > + struct device_driver *drv = attach_work->driver; > + ktime_t calltime, delta, rettime; > + unsigned long long duration; > + > + calltime = ktime_get(); > + > + ret = driver_attach(drv); > + if (ret != 0) { > + remove_driver_private(drv); > + bus_put(drv->bus); > + } > + > + rettime = ktime_get(); > + delta = ktime_sub(rettime, calltime); > + duration = (unsigned long long) ktime_to_ns(delta) >> 10; > + > + pr_debug("bus: '%s': add driver %s attach completed after %lld usecs\n", > + drv->bus->name, drv->name, duration); > +} > + > +int bus_driver_async_probe(struct device_driver *drv) > +{ > + struct driver_private *priv = drv->p; > + > + priv->attach_work = kzalloc(sizeof(struct driver_attach_work), > + GFP_KERNEL); > + if (!priv->attach_work) > + return -ENOMEM; > + > + priv->attach_work->driver = drv; > + INIT_WORK(&priv->attach_work->work, driver_attach_workfn); > + > + /* Keep this as pr_info() until this is prevalent */ > + pr_info("bus: '%s': probe for driver %s is run asynchronously\n", > + drv->bus->name, drv->name); > + > + queue_work(system_unbound_wq, &priv->attach_work->work); > + > + return 0; > +} > + > +/* > + */ > +static bool safe_mod_async = false; > +module_param_named(safe_mod_async_probe, safe_mod_async, bool, 0400); > +MODULE_PARM_DESC(safe_mod_async_probe, > + "Enable async probe on all modules safely"); > + > +static bool force_mod_async = false; > +module_param_named(force_mod_async_probe, force_mod_async, bool, 0400); > +MODULE_PARM_DESC(force_mod_async_probe, > + "Force async probe on all modules"); > + > +/** > + * drv_enable_async_probe - evaluates if async probe should be used > + * @drv: device driver to evaluate > + * @bus: the bus for the device driver > + * > + * The driver core supports enabling asynchronous probe on device drivers > + * by requiring userspace to pass the module parameter "async_probe". > + * Currently only modules are enabled to use this feature. If a device > + * driver is known to not work properly with asynchronous probe they > + * can force disable asynchronous probe from being enabled through > + * userspace by adding setting sync_probe to true on the @drv. We require > + * async probe to be requested from userspace given that we have historically > + * supported synchronous probe and some userspaces may exist which depend > + * on this functionality. Userspace may wish to use asynchronous probe for > + * most device drivers but since this can fail boot in practice we only > + * enable it currently for a set of buses. > + * > + * If you'd like to test enabling async probe for all buses whitelisted > + * you can enable the safe_mod_async_probe module parameter. Note that its > + * not a good idea to always enable this, in particular you probably don't > + * want drivers under modules-load.d to use this. This module parameter should > + * only be used to help test. If you'd like to test even futher you can > + * use force_mod_async_probe, that will force enable async probe on all > + * drivers, regardless if its bus type, it should however be used with > + * caution. > + */ > +static bool drv_enable_async_probe(struct device_driver *drv, > + struct bus_type *bus) > +{ > + struct module *mod; > + > + if (!drv->owner || drv->sync_probe) > + return false; > + > + if (force_mod_async) > + return true; > + > + mod = drv->owner; > + if (!safe_mod_async && !mod->async_probe_requested) > + return false; > + > + /* For now lets avoid stupid bug reports */ > + if (!strcmp(bus->name, "pci") || > + !strcmp(bus->name, "pci_express") || > + !strcmp(bus->name, "hid") || > + !strcmp(bus->name, "sdio") || > + !strcmp(bus->name, "gameport") || > + !strcmp(bus->name, "mmc") || > + !strcmp(bus->name, "i2c") || > + !strcmp(bus->name, "platform") || > + !strcmp(bus->name, "usb")) > + return true; > + > + return false; > +} > + > /** > * bus_add_driver - Add a driver to the bus. > * @drv: driver. > @@ -675,6 +791,7 @@ int bus_add_driver(struct device_driver *drv) > struct bus_type *bus; > struct driver_private *priv; > int error = 0; > + bool async_probe = false; > > bus = bus_get(drv->bus); > if (!bus) > @@ -696,11 +813,19 @@ int bus_add_driver(struct device_driver *drv) > if (error) > goto out_unregister; > > + async_probe = drv_enable_async_probe(drv, bus); > + > klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers); > if (drv->bus->p->drivers_autoprobe) { > - error = driver_attach(drv); > - if (error) > - goto out_unregister; > + if (async_probe) { > + error = bus_driver_async_probe(drv); > + if (error) > + goto out_unregister; > + } else { > + error = driver_attach(drv); > + if (error) > + goto out_unregister; > + } > } > module_add_driver(drv->owner, drv); > > @@ -1267,6 +1392,12 @@ EXPORT_SYMBOL_GPL(subsys_virtual_register); > > int __init buses_init(void) > { > + if (unlikely(safe_mod_async)) > + pr_info("Enabled safe_mod_async -- you may run into issues\n"); > + > + if (unlikely(force_mod_async)) > + pr_info("Enabling force_mod_async -- you're on your own!\n"); > + > bus_kset = kset_create_and_add("bus", &bus_uevent_ops, NULL); > if (!bus_kset) > return -ENOMEM; > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index e4ffbcf..7999aba 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -507,6 +507,13 @@ static void __device_release_driver(struct device *dev) > > drv = dev->driver; > if (drv) { > + if (drv->owner && !drv->sync_probe) { > + struct module *mod = drv->owner; > + struct driver_private *priv = drv->p; > + > + if (mod->async_probe_requested) > + flush_work(&priv->attach_work->work); > + } > pm_runtime_get_sync(dev); > > driver_sysfs_remove(dev); > diff --git a/include/linux/module.h b/include/linux/module.h > index 71f282a..1e9e017 100644 > --- a/include/linux/module.h > +++ b/include/linux/module.h > @@ -271,6 +271,8 @@ struct module { > bool sig_ok; > #endif > > + bool async_probe_requested; > + > /* symbols that will be GPL-only in the near future. */ > const struct kernel_symbol *gpl_future_syms; > const unsigned long *gpl_future_crcs; > diff --git a/kernel/module.c b/kernel/module.c > index 88f3d6c..31d71ff 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -3175,8 +3175,16 @@ out: > static int unknown_module_param_cb(char *param, char *val, const char *modname, > void *arg) > { > + int ret; > + struct module *mod = arg; > + > + if (strcmp(param, "async_probe") == 0) { > + mod->async_probe_requested = true; > + return 0; > + } > + > /* Check for magic 'dyndbg' arg */ > - int ret = ddebug_dyndbg_module_param_cb(param, val, modname); > + ret = ddebug_dyndbg_module_param_cb(param, val, modname); > if (ret != 0) > pr_warn("%s: unknown parameter '%s' ignored\n", modname, param); > return 0; > @@ -3278,7 +3286,7 @@ static int load_module(struct load_info *info, const char __user *uargs, > > /* Module is ready to execute: parsing args may do that. */ > after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, > - -32768, 32767, NULL, > + -32768, 32767, mod, > unknown_module_param_cb); > if (IS_ERR(after_dashes)) { > err = PTR_ERR(after_dashes); > -- > 2.1.0 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Luis, On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote: > +static bool drv_enable_async_probe(struct device_driver *drv, > + struct bus_type *bus) > +{ > + struct module *mod; > + > + if (!drv->owner || drv->sync_probe) > + return false; This bit is one of the biggest issues I have with the patch set. Why async probing is limited to modules only? I mentioned several times that we need async probing for built-in drivers and the way you are structuring the flags (async by default for modules, possibly opt-out of async for modules, forcibly sync for built-in) it is hard to extend the infrastructure for built-in case. Also, as far as I can see, you are only considering the case where driver is being bound to already registered devices. If you have a module that creates a device for a driver that is already loaded and takes long time to probe you would still be probing synchronously even if driver/module requested async behavior. So for me it is NAK in the current form. Thanks.
On Sun, Sep 28, 2014 at 11:03:29AM -0400, Tejun Heo wrote: > Hello, > > On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote: > ... > > Systemd should consider enabling async probe on device drivers > > it loads through systemd-udev but probably does not want to > > enable it for modules loaded through systemd-modules-load > > (modules-load.d). At least on my booting enablign async probe > > for all modules fails to boot as such in order to make this > > Did you find out why boot failed with those modules? No, it seems this was early in boot and I haven't been able to capture the logs yet of the faults. More on this below. > > a bit more useful we whitelist a few buses where it should be > > at least in theory safe to try to enable async probe. This > > way even if systemd tried to ask to enable async probe for all > > its device drivers the kernel won't blindly do this. We also > > have the sync_probe flag which device drivers can themselves > > enable *iff* its known the device driver should never async > > probe. > > > > In order to help *test* things folks can use the bus.safe_mod_async_probe=1 > > kernel parameter which will work as if userspace would have > > requested all modules to load with async probe. Daring folks can > > also use bus.force_mod_async_probe=1 which will enable asynch probe > > even on buses not tested in any way yet, if you use that though > > you're on your own. > > If those two knobs are meant for debugging, let's please make that > fact immediately evident. e.g. Make them ugly boot params like > "__DEVEL__driver_force_mod_async_probe". Devel/debug options ending > up becoming stable interface are really nasty. Sure make sense, I wasn't quite sure how to make this quite clear, a naming convention seems good to me but I also had added at least a print about this on the log. Ideally I think a TAIN_DEBUG would be best and it seems it could be useful for many other cases in the kernel, we could also just re-use TAINT_CRAP as well. Thoughts? Greg? > > +struct driver_attach_work { > > + struct work_struct work; > > + struct device_driver *driver; > > +}; > > + > > struct driver_private { > > struct kobject kobj; > > struct klist klist_devices; > > struct klist_node knode_bus; > > struct module_kobject *mkobj; > > + struct driver_attach_work *attach_work; > > struct device_driver *driver; > > }; > > How many bytes are we saving by allocating it separately? This saves us 24 bytes per device driver. > Can't we just embed it in driver_private? We sure can and it is my preference to do that as well but just in case I wanted to take the alternative space saving approach as well and let folks decide. There's also the technical aspect of hiding that data structure from drivers, and that may be worth to do but I personally also prefer the simplicity of stuffing it on the public data structure, as you noted below we could then also unconditionally flush_work() on __device_release_driver(). Greg, any preference? > > +static void driver_attach_workfn(struct work_struct *work) > > +{ > > + int ret; > > + struct driver_attach_work *attach_work = > > + container_of(work, struct driver_attach_work, work); > > + struct device_driver *drv = attach_work->driver; > > + ktime_t calltime, delta, rettime; > > + unsigned long long duration; > > This could just be a personal preference but I think it's easier to > read if local vars w/ initializers come before the ones w/o. We gotta standardize on *something*, I tend to declare them in the order in which they are used, in this case I failed to list calltime first, but yeah I'll put initialized first, I don't care much. > > + > > + calltime = ktime_get(); > > + > > + ret = driver_attach(drv); > > + if (ret != 0) { > > + remove_driver_private(drv); > > + bus_put(drv->bus); > > + } > > + > > + rettime = ktime_get(); > > + delta = ktime_sub(rettime, calltime); > > + duration = (unsigned long long) ktime_to_ns(delta) >> 10; > > + > > + pr_debug("bus: '%s': add driver %s attach completed after %lld usecs\n", > > + drv->bus->name, drv->name, duration); > > Why do we have the above printout for async path but not sync path? > It's kinda weird for the code path to diverge like this. Shouldn't > the only difference be the context probes are running from? Yeah sure, I'll remove this, it was useful for me for testing purposes in evaluation against kthreads / sync runs, but that certainly was mostly for debugging. > ... > > +static bool drv_enable_async_probe(struct device_driver *drv, > > + struct bus_type *bus) > > +{ > > + struct module *mod; > > + > > + if (!drv->owner || drv->sync_probe) > > + return false; > > + > > + if (force_mod_async) > > + return true; > > + > > + mod = drv->owner; > > + if (!safe_mod_async && !mod->async_probe_requested) > > + return false; > > + > > + /* For now lets avoid stupid bug reports */ > > + if (!strcmp(bus->name, "pci") || > > + !strcmp(bus->name, "pci_express") || > > + !strcmp(bus->name, "hid") || > > + !strcmp(bus->name, "sdio") || > > + !strcmp(bus->name, "gameport") || > > + !strcmp(bus->name, "mmc") || > > + !strcmp(bus->name, "i2c") || > > + !strcmp(bus->name, "platform") || > > + !strcmp(bus->name, "usb")) > > + return true; > > Ugh... things like this tend to become permanent. Do we really need > this? And how are we gonna find out what's broken why w/o bug > reports? Yeah... well we have two options, one is have something like this to at least make it generally useful or remove this and let folks who care start fixing async for all modules. The downside to removing this is it makes async probe pretty much useless on most systems right now, it would mean systemd would have to probably consider the list above if they wanted to start using this without expecting systems to not work. Let me know what is preferred. > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > index e4ffbcf..7999aba 100644 > > --- a/drivers/base/dd.c > > +++ b/drivers/base/dd.c > > @@ -507,6 +507,13 @@ static void __device_release_driver(struct device *dev) > > > > drv = dev->driver; > > if (drv) { > > + if (drv->owner && !drv->sync_probe) { > > + struct module *mod = drv->owner; > > + struct driver_private *priv = drv->p; > > + > > + if (mod->async_probe_requested) > > + flush_work(&priv->attach_work->work); > > This can be unconditional flush_work(&priv->attach_work) if attach_work > isn't separately allocated. Indeed. > > static int unknown_module_param_cb(char *param, char *val, const char *modname, > > void *arg) > > { > > + int ret; > > + struct module *mod = arg; > > Ditto with the order of definitions. Amended. > Generally looks good to me. > > Thanks a lot for doing this! :) Thanks for the review and pointers so far. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Luis. On Mon, Sep 29, 2014 at 11:22:08PM +0200, Luis R. Rodriguez wrote: > > > + /* For now lets avoid stupid bug reports */ > > > + if (!strcmp(bus->name, "pci") || > > > + !strcmp(bus->name, "pci_express") || > > > + !strcmp(bus->name, "hid") || > > > + !strcmp(bus->name, "sdio") || > > > + !strcmp(bus->name, "gameport") || > > > + !strcmp(bus->name, "mmc") || > > > + !strcmp(bus->name, "i2c") || > > > + !strcmp(bus->name, "platform") || > > > + !strcmp(bus->name, "usb")) > > > + return true; > > > > Ugh... things like this tend to become permanent. Do we really need > > this? And how are we gonna find out what's broken why w/o bug > > reports? > > Yeah... well we have two options, one is have something like this to > at least make it generally useful or remove this and let folks who > care start fixing async for all modules. The downside to removing > this is it makes async probe pretty much useless on most systems > right now, it would mean systemd would have to probably consider > the list above if they wanted to start using this without expecting > systems to not work. So, I'd much prefer blacklist approach if something like this is a necessity. That way, we'd at least know what doesn't work. Thanks.
On Mon, Sep 29, 2014 at 11:22:08PM +0200, Luis R. Rodriguez wrote: > On Sun, Sep 28, 2014 at 11:03:29AM -0400, Tejun Heo wrote: > > Hello, > > > > On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote: > > ... > > > Systemd should consider enabling async probe on device drivers > > > it loads through systemd-udev but probably does not want to > > > enable it for modules loaded through systemd-modules-load > > > (modules-load.d). At least on my booting enablign async probe > > > for all modules fails to boot as such in order to make this > > > > Did you find out why boot failed with those modules? > > No, it seems this was early in boot and I haven't been able to capture the logs > yet of the faults. More on this below. > > > > a bit more useful we whitelist a few buses where it should be > > > at least in theory safe to try to enable async probe. This > > > way even if systemd tried to ask to enable async probe for all > > > its device drivers the kernel won't blindly do this. We also > > > have the sync_probe flag which device drivers can themselves > > > enable *iff* its known the device driver should never async > > > probe. > > > > > > In order to help *test* things folks can use the bus.safe_mod_async_probe=1 > > > kernel parameter which will work as if userspace would have > > > requested all modules to load with async probe. Daring folks can > > > also use bus.force_mod_async_probe=1 which will enable asynch probe > > > even on buses not tested in any way yet, if you use that though > > > you're on your own. > > > > If those two knobs are meant for debugging, let's please make that > > fact immediately evident. e.g. Make them ugly boot params like > > "__DEVEL__driver_force_mod_async_probe". Devel/debug options ending > > up becoming stable interface are really nasty. > > Sure make sense, I wasn't quite sure how to make this quite clear, > a naming convention seems good to me but I also had added at least > a print about this on the log. Ideally I think a TAIN_DEBUG would > be best and it seems it could be useful for many other cases in > the kernel, we could also just re-use TAINT_CRAP as well. Thoughts? > Greg? TAINT_CRAP is for drivers/staging/ code, don't try to repurpose it for some other horrid option. There's no reason we can't add more taint flags for this. greg k-h -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 29, 2014 at 2:59 PM, Greg KH <gregkh@linuxfoundation.org> wrote: >> Sure make sense, I wasn't quite sure how to make this quite clear, >> a naming convention seems good to me but I also had added at least >> a print about this on the log. Ideally I think a TAIN_DEBUG would >> be best and it seems it could be useful for many other cases in >> the kernel, we could also just re-use TAINT_CRAP as well. Thoughts? >> Greg? > > TAINT_CRAP is for drivers/staging/ code, don't try to repurpose it for > some other horrid option. There's no reason we can't add more taint > flags for this. OK thanks, I'll add TAINT_DEBUG. Any preference where to stuff struct driver_attach_work *attach_work ? On the private data structure as this patch currently implements, saving us 24 bytes and hiding it from drivers, or stuffing it on the device driver and simplifying the core code? Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 29, 2014 at 03:10:22PM -0700, Luis R. Rodriguez wrote: > On Mon, Sep 29, 2014 at 2:59 PM, Greg KH <gregkh@linuxfoundation.org> wrote: > >> Sure make sense, I wasn't quite sure how to make this quite clear, > >> a naming convention seems good to me but I also had added at least > >> a print about this on the log. Ideally I think a TAIN_DEBUG would > >> be best and it seems it could be useful for many other cases in > >> the kernel, we could also just re-use TAINT_CRAP as well. Thoughts? > >> Greg? > > > > TAINT_CRAP is for drivers/staging/ code, don't try to repurpose it for > > some other horrid option. There's no reason we can't add more taint > > flags for this. > > OK thanks, I'll add TAINT_DEBUG. Any preference where to stuff struct > driver_attach_work *attach_work ? On the private data structure as > this patch currently implements, saving us 24 bytes and hiding it from > drivers, or stuffing it on the device driver and simplifying the core > code? I honestly haven't even looked at this series, sorry. It's too late near the close of the merge window for 3.18 and have been on the road for the past week in France. greg k-h -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 28, 2014 at 07:07:24PM +0200, Tom Gundersen wrote: > On Fri, Sep 26, 2014 at 11:57 PM, Luis R. Rodriguez > <mcgrof@do-not-panic.com> wrote: > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > Systemd has a general timeout for all workers currently set to 180 > > seconds after which it will send a sigkill signal. Systemd now has a > > warning which is issued once it reaches 1/3 of the timeout. The original > > motivation for the systemd timeout was to help track device drivers > > which do not use asynch firmware loading on init() and the timeout was > > originally set to 30 seconds. > > Please note that the motivation for the timeout in systemd had nothing > to do with async firmware loading (that was just the case where > problems cropped up). *Part *of the original kill logic, according to the commit log, was actually due to the assumption that the issues observed *were* synchronous firmware loading on module init(): commit e64fae5573e566ce4fd9b23c68ac8f3096603314 Author: Kay Sievers <kay.sievers@vrfy.org> Date: Wed Jan 18 05:06:18 2012 +0100 udevd: kill hanging event processes after 30 seconds Some broken kernel drivers load firmware synchronously in the module init path and block modprobe until the firmware request is fulfilled. <...> My point here is not to point fingers but to explain why we went on with this and how we failed to realize only until later that the driver core ran probe together with init. When a few folks pointed out the issues with the kill the issue was punted back to kernel developers and the assumption even among some kernel maintainers was that it was init paths with sync behaviour that was causing some delays and they were broken drivers. It is important to highlight these assumptions ended up setting us off on the wrong path for a while in a hunt to try to fix this issue either in driver or elsewhere. > The motivation was to not allow udev-workers to > stay around indefinitely, and hence put an upper-bound on > their duration (initially 180 s). At some point the bound was reduced > to 30 seconds to make sure module-loading would bail out before the > kernel's firmware loading timeout would bail out (60s I believe). Sure, part of it was that, but folks beat on driver developer about the kill insisting it was drivers that were broken. It was only until Chelsie folks called bloody murder becuase their delays were on probe that we realized there was a bit more to this than what was being pushed back on to driver developers. > That > is no longer relevant, which is why it was safe to reset the timeout > to 180 s. Indeed :D > > Since systemd + kernel are heavily tied in for the purposes of this > > patch it is assumed you have merged on systemd the following > > commits: > > > > 671174136525ddf208cdbe75d6d6bd159afa961f udev: timeout - warn after a third of the timeout before killing > > b5338a19864ac3f5632aee48069a669479621dca udev: timeout - increase timeout > > 2e92633dbae52f5ac9b7b2e068935990d475d2cd udev: bump event timeout to 60 seconds > > be2ea723b1d023b3d385d3b791ee4607cbfb20ca udev: remove userspace firmware loading support > > 9f20a8a376f924c8eb5423cfc1f98644fc1e2d1a udev: fixup commit > > dd5eddd28a74a49607a8fffcaf960040dba98479 udev: unify event timeout handling > > 9719859c07aa13539ed2cd4b31972cd30f678543 udevd: add --event-timeout commandline option > > > > Since we bundle together serially driver init() and probe() > > on module initialiation systemd's imposed timeout put a limit on the > > amount of time a driver init() and probe routines can take. There's a > > few overlooked issues with this and the timeout in general: > > > > 0) Not all drivers are killed, the signal is just sent and > > the kill will only be acted upoon if the driver you loaded > > happens to have some code path that either uses kthreads (which > > as of 786235ee are now killable), or uses some code which checks for > > fatal_signal_pending() on the kernel somewhere -- i.e: pci_read_vpd(). > > Shouldn't this be seen as something to be fixed in the kernel? That's a great question. In practice now after CVE-2012-4398 and its series of patches added which enabled OOM to kill things followed by 786235ee to also handle OOM on kthreads it seems imperative we strive towards this, in practive however if you're getting OOMs on boot you have far more serious issue to be concerned over than handling CVE-2012-4398. Another issue is that even if we wanted to address this a critical right now on module loading driver error paths tend to be pretty buggy and we'd probably end up causing more issues than fixing anything if the sigkill that triggered this was an arbitrary timeout, specially if the timeout is not properly justified. Addressing sigkill due to OOM is important, but as noted if you're running out of memory at load time you have a bit other problems to be concerned over. So extending the kill onto more drivers *because* of the timeout is probably not a good reason as it would probably create more issue than fix anything right now. > I mean, > do we not want userspace to have the possibility to kill udev/modprobe > even disregarding the worker timeouts (say at shutdown, or before > switching from the initrd)? That's a good point and I think the merit to handle a kill due to the other reasons (shutdown, switching from the initrd) should be addressed separately. I mean that validating addressing the kill for the other reasons does not validate the existing kill on timeout for synchronous probing. If its important to handle the kill on shutdown / switching initrd that should be dealt with orthogonally. > > 1) Since systemd is the only one logging the sigkill debugging that > > drivers are not loaded or in the worst case *failed to boot* because > > of a sigkill has proven hard to debug. > > Care to clarify this a bit? Are the udev logs somehow unclear? Sure, so the problem is that folks debugging were not aware of what systemd was doing. Let me be clear that the original 30 second sigkill timeout thing was passed down onto driver maintainers as a non-documented new kernel policy slap-in-the-face-you-must-obviously-be-doing-something-wrong (TM) approach. This was a policy decision passed down as a *reactive* measure, not many folks were aware of it and of what systemd was doing. What made the situation even worse was that as noted on 1) even though the sigkill was being sent since commit e64fae55 (January 2012) on systemd the sigkill was not being picked up on many drivers. To be clear the sigkill was being picked up if you had a driver that by chance had some code on init / probe that by chance checked for fatal_signal_pending(), and even when that triggered folks debugging were in no way shape or form expecting a sigkill from userspace on modprobe as it was not well known that this was part of the policy they should follow. Shit started to hit the fan a bit more widely when kernel commit 786235ee (Nov 2013) was merged upstream which allowed kthreads to be killed, and more drivers started failing. An example of an ancient bug that no one handled until recently: https://bugzilla.kernel.org/show_bug.cgi?id=59581 There is a proper fix to this now but the kill was what was causing this in the first place. The kill was justified as theese drivers *should* be using async probe but by no means does that mean the kill was justified for all subsystems / drivers. The bug also really also sent people on the wrong track and it was only until Alexander poked me about the issue we were seeing on cxbg4 likely being related that we started to really zeroe in on the real issue. The first driver reported / studied due to the kill from system was mptsas: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 A full bisect was done to even try to understand what the issue was.. Then there was the tug of war between either reverting the patch that allowed the kthread to be killed or if this was systemd issue which required increasing the timeout. This was still a storage driver, and increasing the timeout arbitrarily really would not have helped address the root cause of the issue. The next non-storage driver bug that was reported and heavily debugged was cxgb4 and it wasn't easy to debug: https://bugzilla.suse.com/show_bug.cgi?id=877622 Conclusion then is that folks were simply not aware of this new de-facto policy, it was obviously incorrect but well intentioned, and no one really was paying attention to systemd-udevd logs. If we want chatty behaviour that people will pick up we probably instead want a WARN() on the kernel specially before we kill a driver and even then I'm sure this can irritate some folks. > If you think we can improve the logging from udev, please ping me about that > and I'll sort it out. I think the logging done on systemd is fine, there are a few issues with the way things trickled down and what we now need to do. First and foremost there was general communication issue about this new timing policy and obviously it would have helped if this also had more design / review from others. Its no one's fault, but we should learn from it. Design policies on systemd that can affect the kernel / drivers could likely use some bit more review from a wider audience and probably include folks who are probably going to be more critical than those who likely would typically be favorable. Without wider review we could fail to end up with something like a filter bubble [0] but applied to engineering, a design filter bubble, if you will. So apart from addressing logging its important to reflect on this issue and try to aim for having something like a Red Team [1] on design involving systemd and kernel. This is specially true if we are to really marry these two together more and more. The more critical people can be the better, but of course those need to provide constructive criticism, not just rants. In terms of logging: Do we know if distributions / users are reviewing systemd-udevd logs for certain types of issues with as much dilligence as they put to kernel logs when systemd makes decision affecting the kernel? If not we should consider a way so that that happens. In this case the fact that drivers were being killed while being loaded was missed since it was unexpected that would happen so folks didn't know to look for that option, but apart from that the *reason* for the kill probably could have helped too. To help both of these we have to consider if we are going to keep the sigkill on systemd on module loading due to a timeout. As you clarified the goal of the timeout is to avoid having udev workers stay around indefinitely, but I think we need to give kmod workers a bit more consideration. The point of this patch set was partly to give systemd what it assumed was there, but clearly we can't assume all drivers can be loaded asynchronously without issues right now. That means that even with this functionality merged systemd will have to cope with the fact that some drivers will be loaded with synchronous probe. A general timeout and specially with a sigkill is probably not a good idea then, unless of course: 0) those device drivers / subsystem maintainer want a timeout 1) the above decision can distinguish between sync probe / async probe being done To address 0) perhaps one solution is that if subsystem maintainers feel this is needed they can express this on data structure somewhere, perhaps on the bus and/or have a driver value override, for example. For 1) we could expose what we end up doing through sysfs. Of course userspace could also simply want to put in place some requirements but in terms of a timeout / kill it would have to also accept that it cannot get what it might want. For instance we now know it may be that an async probe is not possible on some drivers. Perhaps its best to think about this differently and address now a way to do that efficiently instead of reactively. Apart form having the ability to let systemd ask for async probe, what else do we want to accomplish? [0] http://en.wikipedia.org/wiki/Filter_bubble [1] http://en.wikipedia.org/wiki/Red_team > > 2) When and if the signal is received by the driver somehow > > the driver may fail at different points in its initialization > > and unless all error paths on the driver are implemented > > perfectly this could mean leaving a device in a half > > initialized state. > > > > 3) The timeout is penalizing device drivers that take long on > > probe(), this wasn't the original motivation. Systemd seems > > to have been under assumption that probe was asynchronous, > > this perhaps is true as an *objective* and goal for *some > > subsystems* but by no means is it true that we've been on a wide > > crusade to ensure this for all device drivers. It may be a good > > idea for *many* device drivers but penalizing them with a kill > > for taking long on probe is simply unacceptable specially > > when the timeout is completely arbitrary. > > The point is really not to "penalize" anything, we just need to make > sure we put some sort of restrictions on our workers so they don't > hang around forever. Thanks for clarifying this, can you explain what issues could arise from making an exception to allowing kmod workers to hang around completing init + probe over a certain defined amount of time without being killed? Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 28, 2014 at 12:22:47PM -0700, Dmitry Torokhov wrote: > Hi Luis, > > On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote: > > +static bool drv_enable_async_probe(struct device_driver *drv, > > + struct bus_type *bus) > > +{ > > + struct module *mod; > > + > > + if (!drv->owner || drv->sync_probe) > > + return false; > > This bit is one of the biggest issues I have with the patch set. Why async > probing is limited to modules only? Because Tejun wanted to address this separately, so its not that we will restrict this but we should have non-module solution added as an evolution on top of this, as a secondary step. > I mentioned several times that we need > async probing for built-in drivers and the way you are structuring the flags > (async by default for modules, possibly opt-out of async for modules, forcibly > sync for built-in) it is hard to extend the infrastructure for built-in case. I confess I haven't tried enabling built-in as a secondary step but its just due to lack of time right now but I don't think impossible and think actually think it should be fairly trivial. Are there real blockers to do this that you see as an evolutionary step? > Also, as far as I can see, you are only considering the case where driver is > being bound to already registered devices. If you have a module that creates a > device for a driver that is already loaded and takes long time to probe you > would still be probing synchronously even if driver/module requested async > behavior. Can you provide an example code path hit here? I'll certainly like to address that as well. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 29, 2014 at 05:26:01PM -0400, Tejun Heo wrote: > Hello, Luis. > > On Mon, Sep 29, 2014 at 11:22:08PM +0200, Luis R. Rodriguez wrote: > > > > + /* For now lets avoid stupid bug reports */ > > > > + if (!strcmp(bus->name, "pci") || > > > > + !strcmp(bus->name, "pci_express") || > > > > + !strcmp(bus->name, "hid") || > > > > + !strcmp(bus->name, "sdio") || > > > > + !strcmp(bus->name, "gameport") || > > > > + !strcmp(bus->name, "mmc") || > > > > + !strcmp(bus->name, "i2c") || > > > > + !strcmp(bus->name, "platform") || > > > > + !strcmp(bus->name, "usb")) > > > > + return true; > > > > > > Ugh... things like this tend to become permanent. Do we really need > > > this? And how are we gonna find out what's broken why w/o bug > > > reports? > > > > Yeah... well we have two options, one is have something like this to > > at least make it generally useful or remove this and let folks who > > care start fixing async for all modules. The downside to removing > > this is it makes async probe pretty much useless on most systems > > right now, it would mean systemd would have to probably consider > > the list above if they wanted to start using this without expecting > > systems to not work. > > So, I'd much prefer blacklist approach if something like this is a > necessity. That way, we'd at least know what doesn't work. For buses? Or do you mean you'd want to wait until we have a decent list of drivers with the sync probe flag set? If the later it may take a while to get that list for this to be somewhat useful. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 04:27:51AM +0200, Luis R. Rodriguez wrote: > On Sun, Sep 28, 2014 at 07:07:24PM +0200, Tom Gundersen wrote: > > On Fri, Sep 26, 2014 at 11:57 PM, Luis R. Rodriguez > > <mcgrof@do-not-panic.com> wrote: > > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > > 0) Not all drivers are killed, the signal is just sent and > > > the kill will only be acted upoon if the driver you loaded > > > happens to have some code path that either uses kthreads (which > > > as of 786235ee are now killable), or uses some code which checks for > > > fatal_signal_pending() on the kernel somewhere -- i.e: pci_read_vpd(). > > > > Shouldn't this be seen as something to be fixed in the kernel? > > That's a great question. In practice now after CVE-2012-4398 and its series of > patches added which enabled OOM to kill things followed by 786235ee to also > handle OOM on kthreads it seems imperative we strive towards this, in practive > however if you're getting OOMs on boot you have far more serious issue to be > concerned over than handling CVE-2012-4398. Another issue is that even if we > wanted to address this a critical right now on module loading driver error > paths tend to be pretty buggy and we'd probably end up causing more issues than > fixing anything if the sigkill that triggered this was an arbitrary timeout, > specially if the timeout is not properly justified. <-- snip --> > So extending the kill onto more drivers *because* of the timeout is probably > not a good reason as it would probably create more issue than fix anything > right now. A bit more on this. Tejun had added devres while trying to convert libata to use iomap but in that process also help address buggy failure paths on drivers [0]. Even with devres in place and devm functions being available they actually haven't been popularized until recent kernels [1]. There is even further research on precicely these sorts of errors, such as "Hector: Detecting Resource-Release Omission Faults in error-handling code for systems software" [2] but unfortunately there is no data over time. Another paper is "An approach to improving the structure of error-handling code in the Linux kernel" [3] which tries to address moving error handling code in the middle of the function to gotos to shared code at the end of the function... So we have buggy error paths on drivers and trusting them unfortunately isn't a good idea at this point. They should be fixed but saying we should equally kill all drivers right now would likley introduce more issues than anything. [0] http://lwn.net/Articles/215861/ [1] http://www.slideshare.net/ennael/kernel-recipes-2013?qid=f0888b85-377b-4b29-95c3-f4e59822f5b3&v=default&b=&from_search=6 See slide 6 on graph usage of devm functions over time [2] http://coccinelle.lip6.fr/papers/dsn2013.pdf [3] http://coccinelle.lip6.fr/papers/lctes11.pdf Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 4:27 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > On Sun, Sep 28, 2014 at 07:07:24PM +0200, Tom Gundersen wrote: >> On Fri, Sep 26, 2014 at 11:57 PM, Luis R. Rodriguez >> <mcgrof@do-not-panic.com> wrote: >> > From: "Luis R. Rodriguez" <mcgrof@suse.com> >> > Systemd has a general timeout for all workers currently set to 180 >> > seconds after which it will send a sigkill signal. Systemd now has a >> > warning which is issued once it reaches 1/3 of the timeout. The original >> > motivation for the systemd timeout was to help track device drivers >> > which do not use asynch firmware loading on init() and the timeout was >> > originally set to 30 seconds. >> >> Please note that the motivation for the timeout in systemd had nothing >> to do with async firmware loading (that was just the case where >> problems cropped up). > > *Part *of the original kill logic, according to the commit log, was actually > due to the assumption that the issues observed *were* synchronous firmware > loading on module init(): > > commit e64fae5573e566ce4fd9b23c68ac8f3096603314 > Author: Kay Sievers <kay.sievers@vrfy.org> > Date: Wed Jan 18 05:06:18 2012 +0100 > > udevd: kill hanging event processes after 30 seconds > > Some broken kernel drivers load firmware synchronously in the module init > path and block modprobe until the firmware request is fulfilled. > <...> This was a workaround to avoid a deadlock between udev and the kernel. The 180 s timeout was already in place before this change, and was not motivated by firmware loading. Also note that this patch was not about "tracking device drivers", just about avoiding dead-lock. > My point here is not to point fingers but to explain why we went on with > this and how we failed to realize only until later that the driver core > ran probe together with init. When a few folks pointed out the issues > with the kill the issue was punted back to kernel developers and the > assumption even among some kernel maintainers was that it was init paths > with sync behaviour that was causing some delays and they were broken > drivers. It is important to highlight these assumptions ended up setting > us off on the wrong path for a while in a hunt to try to fix this issue > either in driver or elsewhere. Ok. I'm not sure the motivations for user-space changes is important to include in the commit message, but if you do I'll try to clarify things to avoid misunderstandings. > Thanks for clarifying this, can you explain what issues could arise > from making an exception to allowing kmod workers to hang around > completing init + probe over a certain defined amount of time without > being killed? We could run out of udev workers and the whole boot would hang. The way I see it, the current status from systemd's side is: our short-term work-around is to increase the timeout, and at the moment it appears no long-term solution is needed (i.e., it seems like the right thing to do is to make sure insmod can be near instantaneous, it appears people are working towards this goal, and so far no examples have cropped up showing that it is fundamentally impossible (once/if they do, we should of course revisit the problem)). Cheers, Tom -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 11:22:14AM +0200, Tom Gundersen wrote: > On Tue, Sep 30, 2014 at 4:27 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > > On Sun, Sep 28, 2014 at 07:07:24PM +0200, Tom Gundersen wrote: > >> On Fri, Sep 26, 2014 at 11:57 PM, Luis R. Rodriguez > >> <mcgrof@do-not-panic.com> wrote: > >> > From: "Luis R. Rodriguez" <mcgrof@suse.com> > >> > Systemd has a general timeout for all workers currently set to 180 > >> > seconds after which it will send a sigkill signal. Systemd now has a > >> > warning which is issued once it reaches 1/3 of the timeout. The original > >> > motivation for the systemd timeout was to help track device drivers > >> > which do not use asynch firmware loading on init() and the timeout was > >> > originally set to 30 seconds. > >> > >> Please note that the motivation for the timeout in systemd had nothing > >> to do with async firmware loading (that was just the case where > >> problems cropped up). > > > > *Part *of the original kill logic, according to the commit log, was actually > > due to the assumption that the issues observed *were* synchronous firmware > > loading on module init(): > > > > commit e64fae5573e566ce4fd9b23c68ac8f3096603314 > > Author: Kay Sievers <kay.sievers@vrfy.org> > > Date: Wed Jan 18 05:06:18 2012 +0100 > > > > udevd: kill hanging event processes after 30 seconds > > > > Some broken kernel drivers load firmware synchronously in the module init > > path and block modprobe until the firmware request is fulfilled. > > <...> > > This was a workaround to avoid a deadlock between udev and the kernel. > The 180 s timeout was already in place before this change, and was not > motivated by firmware loading. Also note that this patch was not about > "tracking device drivers", just about avoiding dead-lock. Thanks, can you elaborate on how a deadlock can occur if the kmod worker is not at some point sigkilled? > > My point here is not to point fingers but to explain why we went on with > > this and how we failed to realize only until later that the driver core > > ran probe together with init. When a few folks pointed out the issues > > with the kill the issue was punted back to kernel developers and the > > assumption even among some kernel maintainers was that it was init paths > > with sync behaviour that was causing some delays and they were broken > > drivers. It is important to highlight these assumptions ended up setting > > us off on the wrong path for a while in a hunt to try to fix this issue > > either in driver or elsewhere. > > Ok. I'm not sure the motivations for user-space changes is important > to include in the commit message, but if you do I'll try to clarify > things to avoid misunderstandings. I can try to omit it on the next series. > > Thanks for clarifying this, can you explain what issues could arise > > from making an exception to allowing kmod workers to hang around > > completing init + probe over a certain defined amount of time without > > being killed? > > We could run out of udev workers and the whole boot would hang. Is the issue that if there is no extra worker available and all are idling on sleep / synchronous long work boot will potentially hang unless a new worker becomes available to do more work? If so I can see the sigkill helping for hanging tasks but it doesn't necessarily mean its a good idea to kill modules loading taking a while. Also what if the sigkill is just avoided for *just* kmod workers? > The way I see it, the current status from systemd's side is: our > short-term work-around is to increase the timeout, and at the moment > it appears no long-term solution is needed (i.e., it seems like the > right thing to do is to make sure insmod can be near instantaneous, it > appears people are working towards this goal, and so far no examples > have cropped up showing that it is fundamentally impossible (once/if > they do, we should of course revisit the problem)). That again would be reactive behaviour, what would prevent avoiding the sigkill only for kmod workers? Is it known the deadlock is immiment? If the amount of workers for kmod that would hit the timeout is considered low I don't see how that's possible and why not just lift the sigkill. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 5:24 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote: >> > commit e64fae5573e566ce4fd9b23c68ac8f3096603314 >> > Author: Kay Sievers <kay.sievers@vrfy.org> >> > Date: Wed Jan 18 05:06:18 2012 +0100 >> > >> > udevd: kill hanging event processes after 30 seconds >> > >> > Some broken kernel drivers load firmware synchronously in the module init >> > path and block modprobe until the firmware request is fulfilled. >> > <...> >> >> This was a workaround to avoid a deadlock between udev and the kernel. >> The 180 s timeout was already in place before this change, and was not >> motivated by firmware loading. Also note that this patch was not about >> "tracking device drivers", just about avoiding dead-lock. > > Thanks, can you elaborate on how a deadlock can occur if the kmod > worker is not at some point sigkilled? This was only relevant whet udev did the firmware loading. modprobe would wait for the kernel, which would wait for the firmware loading, which would wait for modprobe. This is no longer a problem as udev does not do firmware loading any more. > Is the issue that if there is no extra worker available and all are > idling on sleep / synchronous long work boot will potentially hang > unless a new worker becomes available to do more work? Correct. > If so I can > see the sigkill helping for hanging tasks but it doesn't necessarily > mean its a good idea to kill modules loading taking a while. Also > what if the sigkill is just avoided for *just* kmod workers? Depending on the number of devices you have, I suppose we could still exhaust the workers. >> The way I see it, the current status from systemd's side is: our >> short-term work-around is to increase the timeout, and at the moment >> it appears no long-term solution is needed (i.e., it seems like the >> right thing to do is to make sure insmod can be near instantaneous, it >> appears people are working towards this goal, and so far no examples >> have cropped up showing that it is fundamentally impossible (once/if >> they do, we should of course revisit the problem)). > > That again would be reactive behaviour, what would prevent avoiding the > sigkill only for kmod workers? Is it known the deadlock is immiment? > If the amount of workers for kmod that would hit the timeout is > considered low I don't see how that's possible and why not just lift > the sigkill. Making kmod a special case is of course possible. However, as long as there is no fundamental reason why kmod should get this special treatment, this just looks like a work-around to me. We already have a work-around, which is to increase the global timeout. If you still think we should do something different in systemd, it is probably best to take the discussion to systemd-devel to make sure all the relevant people are involved. Cheers, Tom -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
As per Tom, adding systemd-devel for advice / review / of the request to avoid the sigkill for kmod workers. Keeping others on Cc as its a discussion that I think can help if both camps are involved. Specially since we've been ping ponging back and forth on this particular topic for a long time now. On Thu, Oct 02, 2014 at 08:12:37AM +0200, Tom Gundersen wrote: > On Tue, Sep 30, 2014 at 5:24 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > >> > commit e64fae5573e566ce4fd9b23c68ac8f3096603314 > >> > Author: Kay Sievers <kay.sievers@vrfy.org> > >> > Date: Wed Jan 18 05:06:18 2012 +0100 > >> > > >> > udevd: kill hanging event processes after 30 seconds > >> > > >> > Some broken kernel drivers load firmware synchronously in the module init > >> > path and block modprobe until the firmware request is fulfilled. > >> > <...> > >> > >> This was a workaround to avoid a deadlock between udev and the kernel. > >> The 180 s timeout was already in place before this change, and was not > >> motivated by firmware loading. Also note that this patch was not about > >> "tracking device drivers", just about avoiding dead-lock. > > > > Thanks, can you elaborate on how a deadlock can occur if the kmod > > worker is not at some point sigkilled? > > This was only relevant whet udev did the firmware loading. modprobe > would wait for the kernel, which would wait for the firmware loading, > which would wait for modprobe. This is no longer a problem as udev > does not do firmware loading any more. Thanks for clarifying. So the deadlock concern is no longer there, therefore it is not a reason to keep the sigkill for kmod. > > Is the issue that if there is no extra worker available and all are > > idling on sleep / synchronous long work boot will potentially hang > > unless a new worker becomes available to do more work? > > Correct. Ok. > > If so I can > > see the sigkill helping for hanging tasks but it doesn't necessarily > > mean its a good idea to kill modules loading taking a while. Also > > what if the sigkill is just avoided for *just* kmod workers? > > Depending on the number of devices you have, I suppose we could still > exhaust the workers. Ok can systemd dynamically create a worker or set of workers per device that creeps up? Async probe for most drivers will help with this but having it dynamic should help as well, specially since there are drivers that will require probe synchronously -- and the fact that async probe mechanism is not yet merged. > >> The way I see it, the current status from systemd's side is: our > >> short-term work-around is to increase the timeout, and at the moment > >> it appears no long-term solution is needed (i.e., it seems like the > >> right thing to do is to make sure insmod can be near instantaneous, it > >> appears people are working towards this goal, and so far no examples > >> have cropped up showing that it is fundamentally impossible (once/if > >> they do, we should of course revisit the problem)). > > > > That again would be reactive behaviour, what would prevent avoiding the > > sigkill only for kmod workers? Is it known the deadlock is immiment? > > If the amount of workers for kmod that would hit the timeout is > > considered low I don't see how that's possible and why not just lift > > the sigkill. > > Making kmod a special case is of course possible. However, as long as > there is no fundamental reason why kmod should get this special > treatment, this just looks like a work-around to me. I've mentioned a series of five reasons why its a bad idea right now to sigkill modules [0], we're reviewed them each and still at least items 2-4 remain particularly valid fundamental reasons to avoid it specially if the deadlock is no longer possible. Running out of workers because they are loading modules and that is taking a while is not really a good standing reason to be killing them, specially if the timeout already is set to a high value. All we're doing there is limiting Linux / number of devices arbitrarily just to help free workers, and it seems that should be dealt with differently. Killing module loading arbitrarily in the middle is not advisable and can cause more issue than help in any way. Async probe mechanism will help free workers faster but this patch series is still being evolved, we should still address the sigkill for kmod workers separately and see what remaining reasons we have for it in light of the possible issues highlighted that it can introduce if kept. If we want to capture drivers taking long on probe each subsystem should handle that and WARN / pick up on it, we cannot however assume that this a generally bad things as discussed before. We will also not be able to async probe *every* driver, which is why the series allows a flag to specify for this. [0] https://lkml.org/lkml/2014/9/26/879 > We already have a > work-around, which is to increase the global timeout. If you still > think we should do something different in systemd, it is probably best > to take the discussion to systemd-devel to make sure all the relevant > people are involved. Sure, I've included systemd-devel. Hope is we can have a constructive discussion on the sigkill for kmod. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 09:21:59AM +0200, Luis R. Rodriguez wrote: > On Mon, Sep 29, 2014 at 05:26:01PM -0400, Tejun Heo wrote: > > Hello, Luis. > > > > On Mon, Sep 29, 2014 at 11:22:08PM +0200, Luis R. Rodriguez wrote: > > > > > + /* For now lets avoid stupid bug reports */ > > > > > + if (!strcmp(bus->name, "pci") || > > > > > + !strcmp(bus->name, "pci_express") || > > > > > + !strcmp(bus->name, "hid") || > > > > > + !strcmp(bus->name, "sdio") || > > > > > + !strcmp(bus->name, "gameport") || > > > > > + !strcmp(bus->name, "mmc") || > > > > > + !strcmp(bus->name, "i2c") || > > > > > + !strcmp(bus->name, "platform") || > > > > > + !strcmp(bus->name, "usb")) > > > > > + return true; > > > > > > > > Ugh... things like this tend to become permanent. Do we really need > > > > this? And how are we gonna find out what's broken why w/o bug > > > > reports? > > > > > > Yeah... well we have two options, one is have something like this to > > > at least make it generally useful or remove this and let folks who > > > care start fixing async for all modules. The downside to removing > > > this is it makes async probe pretty much useless on most systems > > > right now, it would mean systemd would have to probably consider > > > the list above if they wanted to start using this without expecting > > > systems to not work. > > > > So, I'd much prefer blacklist approach if something like this is a > > necessity. That way, we'd at least know what doesn't work. > > For buses? Or do you mean you'd want to wait until we have a decent > list of drivers with the sync probe flag set? If the later it may take > a while to get that list for this to be somewhat useful. OK I'm removing this part and it works well for me now on my laptop and an AMD server without a white list, so all the junk above will be removed in the next series. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 30, 2014 at 09:15:55AM +0200, Luis R. Rodriguez wrote: > Can you provide an example code path hit here? I'll certainly like to address > that as well. I managed to enable built-in driver support on top of this series, I'll send them as part of the next series but I suspect we'll want to discuss blacklist/whitelist a bit more there. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 2, 2014 at 10:06 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > On Thu, Oct 02, 2014 at 08:12:37AM +0200, Tom Gundersen wrote: >> Making kmod a special case is of course possible. However, as long as >> there is no fundamental reason why kmod should get this special >> treatment, this just looks like a work-around to me. > > I've mentioned a series of five reasons why its a bad idea right now to > sigkill modules [0], we're reviewed them each and still at least > items 2-4 remain particularly valid fundamental reasons to avoid it So items 2-4 basically say "there currently are drivers that cannot deal with sigkill after a three minute timeout". In the short-term we already have the solution: increase the timeout. In the long-term, we have two choices, either permanently add some heuristic to udev to deal with drivers taking a very long time to be inserted, or fix the drivers not to take such a long time. A priori, it makes no sense to me that drivers spend unbounded amounts of time to get inserted, so fixing the drivers seems like the most reasonable approach to me. That said, I'm of course open to be proven wrong if there are some drivers that fundamentally _must_ take a long time to insert (but we should then discuss why that is and how we can best deal with the situation, rather than adding some hack up-front when we don't even know if it is needed). Your patch series should go a long way towards fixing the drivers (and I imagine there being a lot of low-hanging fruit that can easily be fixed once your series has landed), and the fact that we have now increased the udev timeout from 30 to 180 seconds should also greatly reduce the problem. Cheers, Tom -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 3, 2014 at 1:23 AM, Tom Gundersen <teg@jklm.no> wrote: > On Thu, Oct 2, 2014 at 10:06 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote: >> On Thu, Oct 02, 2014 at 08:12:37AM +0200, Tom Gundersen wrote: >>> Making kmod a special case is of course possible. However, as long as >>> there is no fundamental reason why kmod should get this special >>> treatment, this just looks like a work-around to me. >> >> I've mentioned a series of five reasons why its a bad idea right now to >> sigkill modules [0], we're reviewed them each and still at least >> items 2-4 remain particularly valid fundamental reasons to avoid it > > So items 2-4 basically say "there currently are drivers that cannot > deal with sigkill after a three minute timeout". No, dealing with the sigkill gracefully is all related to 2) as it says its probably a terrible idea to be triggering exit paths at random points on device drivers on init / probe. And while one could argue that perhaps that can be cleaned up I provided tons of references and even *research effort* on this particular area so the issues over this point should by no means easily be brushed off. And it may be true that we can fix some things on Linux but a) that requires a kernel upgrade on users and b) Some users may end up buying hardware that only is supported through a proprietary driver and getting those fixes is not trivial and almost impossible on some cases. 3) says it is fundamentally incorrect to limit with any arbitrary timeout the bus probe routine 4) talks about how the timeout is creating a limit on the number of devices a device driver can support on Linux as follows give the driver core batches *all* probes for one device driver serially: number_devices = systemd_timeout ------------------------------------- max known probe time for driver We have device drivers which we *know* just on *probe* will take over 1 minute, this means that by default for these device drivers folks can only install 3 devices of that type on a system. One can surely address things on the kernel but again assuming folks use defaults and don't upgrade their kernel the sigkill is simply limiting Linux right now, even if it is for the short term. > In the short-term we already have the solution: increase the timeout. Short term implicates what will be supported for a while for tons of deployments of systemd. The kernel command line work around for increasing the timeout is a reactive measure, its not addressing the problem architecturally. If the sigkill is going to be maintained for kmod its implications should be well documented as well in terms of the impact and limitations on both device drivers and number of devices a driver can support. > In the long-term, we have two choices, either permanently add some > heuristic to udev to deal with drivers taking a very long time to be > inserted, or fix the drivers not to take such a long time. Drivers taking long on init should probably be addressed, drivers taking long on probe are not broken specially since the driver core probe's all supported devices on one device driver serially, so the probe time is actually cumulative. > A priori, > it makes no sense to me that drivers spend unbounded amounts of time > to get inserted, so fixing the drivers seems like the most reasonable > approach to me. That said, I'm of course open to be proven wrong if > there are some drivers that fundamentally _must_ take a long time to > insert (but we should then discuss why that is and how we can best > deal with the situation, rather than adding some hack up-front when we > don't even know if it is needed). Ok hold on. Async probe on the driver core will be a new feature and there are even caveats that Tejun pointed out which are important for distributions to consider before embracing it. Of course folks can ignore these but by no means should it be considered that tons of device device drivers were broken, what we are providing is a new mechanism. And then there are device drivers which will need work in order to use async probe, some will require fixes on init / probe assumptions as I provided for the amd64_edac driver but for others only time will tell what is required. > Your patch series should go a long way towards fixing the drivers (and > I imagine there being a lot of low-hanging fruit that can easily be > fixed once your series has landed), and the fact that we have now > increased the udev timeout from 30 to 180 seconds should also greatly > reduce the problem. Sure, I do ask for folks to revisit the short term solution though, I did my best to communicate / document the issues. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote:
> + queue_work(system_unbound_wq, &priv->attach_work->work);
Tejun,
based on my testing so far using system_highpri_wq instead of
system_unbound_wq yields close to par / better boot times
than synchronous probe support for all modules. How set are
you on using system_unbound_wq? About to punt out a new
series which also addresses built-in.
Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 03, 2014 at 10:11:26PM +0200, Luis R. Rodriguez wrote: > On Fri, Sep 26, 2014 at 02:57:17PM -0700, Luis R. Rodriguez wrote: > > + queue_work(system_unbound_wq, &priv->attach_work->work); > > Tejun, > > based on my testing so far using system_highpri_wq instead of > system_unbound_wq yields close to par / better boot times > than synchronous probe support for all modules. How set are > you on using system_unbound_wq? About to punt out a new > series which also addresses built-in. Nevermind, folks can change this later with better empirical testing than I can provide and right now the differences I see are not too conclusive and I suspect we'll see more of a difference once the right built-in drivers are selected to probe asynchrounously. Luis -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
=====================================================================| strategy fw (usec) no-fw (usec) | ---------------------------------------------------------------------| synchronous 24472569 1307563 | kthread 25066415.5 1309868.5 | queue_work(system_unbound_wq) 24913661.5 1307631 | ---------------------------------------------------------------------| In practice, in seconds, the difference is barely noticeable: =====================================================================| strategy fw (s) no-fw (s) | ---------------------------------------------------------------------| synchronous 24.47 1.31 | kthread 25.07 1.31 | queue_work(system_unbound_wq) 24.91 1.31 | ---------------------------------------------------------------------| [0] http://ftp.suse.com/pub/people/mcgrof/async-probe/probe-cgxb4-no-firmware.png [1] http://ftp.suse.com/pub/people/mcgrof/async-probe/probe-cgxb4-firmware.png The rest of the commit log documents why this feature was implemented primarily first for systemd and things it should consider next. Systemd has a general timeout for all workers currently set to 180 seconds after which it will send a sigkill signal. Systemd now has a warning which is issued once it reaches 1/3 of the timeout. The original motivation for the systemd timeout was to help track device drivers which do not use asynch firmware loading on init() and the timeout was originally set to 30 seconds. Since systemd + kernel are heavily tied in for the purposes of this patch it is assumed you have merged on systemd the following commits: 671174136525ddf208cdbe75d6d6bd159afa961f udev: timeout - warn after a third of the timeout before killing b5338a19864ac3f5632aee48069a669479621dca udev: timeout - increase timeout 2e92633dbae52f5ac9b7b2e068935990d475d2cd udev: bump event timeout to 60 seconds be2ea723b1d023b3d385d3b791ee4607cbfb20ca udev: remove userspace firmware loading support 9f20a8a376f924c8eb5423cfc1f98644fc1e2d1a udev: fixup commit dd5eddd28a74a49607a8fffcaf960040dba98479 udev: unify event timeout handling 9719859c07aa13539ed2cd4b31972cd30f678543 udevd: add --event-timeout commandline option Since we bundle together serially driver init() and probe() on module initialiation systemd's imposed timeout put a limit on the amount of time a driver init() and probe routines can take. There's a few overlooked issues with this and the timeout in general: 0) Not all drivers are killed, the signal is just sent and the kill will only be acted upoon if the driver you loaded happens to have some code path that either uses kthreads (which as of 786235ee are now killable), or uses some code which checks for fatal_signal_pending() on the kernel somewhere -- i.e: pci_read_vpd(). 1) Since systemd is the only one logging the sigkill debugging that drivers are not loaded or in the worst case *failed to boot* because of a sigkill has proven hard to debug. 2) When and if the signal is received by the driver somehow the driver may fail at different points in its initialization and unless all error paths on the driver are implemented perfectly this could mean leaving a device in a half initialized state. 3) The timeout is penalizing device drivers that take long on probe(), this wasn't the original motivation. Systemd seems to have been under assumption that probe was asynchronous, this perhaps is true as an *objective* and goal for *some subsystems* but by no means is it true that we've been on a wide crusade to ensure this for all device drivers. It may be a good idea for *many* device drivers but penalizing them with a kill for taking long on probe is simply unacceptable specially when the timeout is completely arbitrary. 4) The driver core calls probe for *all* devices that a driver can claim and it does so serially, so if a device driver will need to probe 3 devices and if probe on the device driver is synchronous the amount of time that module loading will take will be: driver load time = init() + probe for 3 devices serially The timeout ultimatley ends up limiting the number of devices that *any* device driver can support based on the following formula: number_devices = systemd_timeout ------------------------------------- max known probe time for driver Lastly since the error value passed down is the value of the probe for the last device probed the module will fail to load and all devices will fail to be available. In the Linux kernel we don't want to work around the timeout, instead systemd must be changed to take all the above into consideration when issuing any kills on device drivers, ideally the sigkill should be considered to be ignored at least for kmod. In addition to this we help systemd by giving it what it originally considered was there and enable it to ask device drivers to use asynchronous probe. This patch addresses that feature. Systemd should consider enabling async probe on device drivers it loads through systemd-udev but probably does not want to enable it for modules loaded through systemd-modules-load (modules-load.d). At least on my booting enablign async probe for all modules fails to boot as such in order to make this a bit more useful we whitelist a few buses where it should be at least in theory safe to try to enable async probe. This way even if systemd tried to ask to enable async probe for all its device drivers the kernel won't blindly do this. We also have the sync_probe flag which device drivers can themselves enable *iff* its known the device driver should never async probe. In order to help *test* things folks can use the bus.safe_mod_async_probe=1 kernel parameter which will work as if userspace would have requested all modules to load with async probe. Daring folks can also use bus.force_mod_async_probe=1 which will enable asynch probe even on buses not tested in any way yet, if you use that though you're on your own. Cc: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Joseph Salisbury <joseph.salisbury@canonical.com> Cc: Kay Sievers <kay@vrfy.org> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Tim Gardner <tim.gardner@canonical.com> Cc: Pierre Fersing <pierre-fersing@pierref.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Poirier <bpoirier@suse.de> Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com> Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com> Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com> Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com> Cc: Casey Leedom <leedom@chelsio.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Santosh Rastapur <santosh@chelsio.com> Cc: MPT-FusionLinux.pdl@avagotech.com Cc: linux-scsi@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> --- drivers/base/base.h | 6 +++ drivers/base/bus.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++-- drivers/base/dd.c | 7 +++ include/linux/module.h | 2 + kernel/module.c | 12 ++++- 5 files changed, 159 insertions(+), 5 deletions(-) diff --git a/drivers/base/base.h b/drivers/base/base.h index 251c5d3..24836f1 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -43,11 +43,17 @@ struct subsys_private { }; #define to_subsys_private(obj) container_of(obj, struct subsys_private, subsys.kobj) +struct driver_attach_work { + struct work_struct work; + struct device_driver *driver; +}; + struct driver_private { struct kobject kobj; struct klist klist_devices; struct klist_node knode_bus; struct module_kobject *mkobj; + struct driver_attach_work *attach_work; struct device_driver *driver; }; #define to_driver(obj) container_of(obj, struct driver_private, kobj) diff --git a/drivers/base/bus.c b/drivers/base/bus.c index a5f41e4..41e321e6 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -85,6 +85,7 @@ static void driver_release(struct kobject *kobj) struct driver_private *drv_priv = to_driver(kobj); pr_debug("driver: '%s': %s\n", kobject_name(kobj), __func__); + kfree(drv_priv->attach_work); kfree(drv_priv); } @@ -662,10 +663,125 @@ static void remove_driver_private(struct device_driver *drv) struct driver_private *priv = drv->p; kobject_put(&priv->kobj); + kfree(priv->attach_work); kfree(priv); drv->p = NULL; } +static void driver_attach_workfn(struct work_struct *work) +{ + int ret; + struct driver_attach_work *attach_work = + container_of(work, struct driver_attach_work, work); + struct device_driver *drv = attach_work->driver; + ktime_t calltime, delta, rettime; + unsigned long long duration; + + calltime = ktime_get(); + + ret = driver_attach(drv); + if (ret != 0) { + remove_driver_private(drv); + bus_put(drv->bus); + } + + rettime = ktime_get(); + delta = ktime_sub(rettime, calltime); + duration = (unsigned long long) ktime_to_ns(delta) >> 10; + + pr_debug("bus: '%s': add driver %s attach completed after %lld usecs\n", + drv->bus->name, drv->name, duration); +} + +int bus_driver_async_probe(struct device_driver *drv) +{ + struct driver_private *priv = drv->p; + + priv->attach_work = kzalloc(sizeof(struct driver_attach_work), + GFP_KERNEL); + if (!priv->attach_work) + return -ENOMEM; + + priv->attach_work->driver = drv; + INIT_WORK(&priv->attach_work->work, driver_attach_workfn); + + /* Keep this as pr_info() until this is prevalent */ + pr_info("bus: '%s': probe for driver %s is run asynchronously\n", + drv->bus->name, drv->name); + + queue_work(system_unbound_wq, &priv->attach_work->work); + + return 0; +} + +/* + */ +static bool safe_mod_async = false; +module_param_named(safe_mod_async_probe, safe_mod_async, bool, 0400); +MODULE_PARM_DESC(safe_mod_async_probe, + "Enable async probe on all modules safely"); + +static bool force_mod_async = false; +module_param_named(force_mod_async_probe, force_mod_async, bool, 0400); +MODULE_PARM_DESC(force_mod_async_probe, + "Force async probe on all modules"); + +/** + * drv_enable_async_probe - evaluates if async probe should be used + * @drv: device driver to evaluate + * @bus: the bus for the device driver + * + * The driver core supports enabling asynchronous probe on device drivers + * by requiring userspace to pass the module parameter "async_probe". + * Currently only modules are enabled to use this feature. If a device + * driver is known to not work properly with asynchronous probe they + * can force disable asynchronous probe from being enabled through + * userspace by adding setting sync_probe to true on the @drv. We require + * async probe to be requested from userspace given that we have historically + * supported synchronous probe and some userspaces may exist which depend + * on this functionality. Userspace may wish to use asynchronous probe for + * most device drivers but since this can fail boot in practice we only + * enable it currently for a set of buses. + * + * If you'd like to test enabling async probe for all buses whitelisted + * you can enable the safe_mod_async_probe module parameter. Note that its + * not a good idea to always enable this, in particular you probably don't + * want drivers under modules-load.d to use this. This module parameter should + * only be used to help test. If you'd like to test even futher you can + * use force_mod_async_probe, that will force enable async probe on all + * drivers, regardless if its bus type, it should however be used with + * caution. + */ +static bool drv_enable_async_probe(struct device_driver *drv, + struct bus_type *bus) +{ + struct module *mod; + + if (!drv->owner || drv->sync_probe) + return false; + + if (force_mod_async) + return true; + + mod = drv->owner; + if (!safe_mod_async && !mod->async_probe_requested) + return false; + + /* For now lets avoid stupid bug reports */ + if (!strcmp(bus->name, "pci") || + !strcmp(bus->name, "pci_express") || + !strcmp(bus->name, "hid") || + !strcmp(bus->name, "sdio") || + !strcmp(bus->name, "gameport") || + !strcmp(bus->name, "mmc") || + !strcmp(bus->name, "i2c") || + !strcmp(bus->name, "platform") || + !strcmp(bus->name, "usb")) + return true; + + return false; +} + /** * bus_add_driver - Add a driver to the bus. * @drv: driver. @@ -675,6 +791,7 @@ int bus_add_driver(struct device_driver *drv) struct bus_type *bus; struct driver_private *priv; int error = 0; + bool async_probe = false; bus = bus_get(drv->bus); if (!bus) @@ -696,11 +813,19 @@ int bus_add_driver(struct device_driver *drv) if (error) goto out_unregister; + async_probe = drv_enable_async_probe(drv, bus); + klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers); if (drv->bus->p->drivers_autoprobe) { - error = driver_attach(drv); - if (error) - goto out_unregister; + if (async_probe) { + error = bus_driver_async_probe(drv); + if (error) + goto out_unregister; + } else { + error = driver_attach(drv); + if (error) + goto out_unregister; + } } module_add_driver(drv->owner, drv); @@ -1267,6 +1392,12 @@ EXPORT_SYMBOL_GPL(subsys_virtual_register); int __init buses_init(void) { + if (unlikely(safe_mod_async)) + pr_info("Enabled safe_mod_async -- you may run into issues\n"); + + if (unlikely(force_mod_async)) + pr_info("Enabling force_mod_async -- you're on your own!\n"); + bus_kset = kset_create_and_add("bus", &bus_uevent_ops, NULL); if (!bus_kset) return -ENOMEM; diff --git a/drivers/base/dd.c b/drivers/base/dd.c index e4ffbcf..7999aba 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -507,6 +507,13 @@ static void __device_release_driver(struct device *dev) drv = dev->driver; if (drv) { + if (drv->owner && !drv->sync_probe) { + struct module *mod = drv->owner; + struct driver_private *priv = drv->p; + + if (mod->async_probe_requested) + flush_work(&priv->attach_work->work); + } pm_runtime_get_sync(dev); driver_sysfs_remove(dev); diff --git a/include/linux/module.h b/include/linux/module.h index 71f282a..1e9e017 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -271,6 +271,8 @@ struct module { bool sig_ok; #endif + bool async_probe_requested; + /* symbols that will be GPL-only in the near future. */ const struct kernel_symbol *gpl_future_syms; const unsigned long *gpl_future_crcs; diff --git a/kernel/module.c b/kernel/module.c index 88f3d6c..31d71ff 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3175,8 +3175,16 @@ out: static int unknown_module_param_cb(char *param, char *val, const char *modname, void *arg) { + int ret; + struct module *mod = arg; + + if (strcmp(param, "async_probe") == 0) { + mod->async_probe_requested = true; + return 0; + } + /* Check for magic 'dyndbg' arg */ - int ret = ddebug_dyndbg_module_param_cb(param, val, modname); + ret = ddebug_dyndbg_module_param_cb(param, val, modname); if (ret != 0) pr_warn("%s: unknown parameter '%s' ignored\n", modname, param); return 0; @@ -3278,7 +3286,7 @@ static int load_module(struct load_info *info, const char __user *uargs, /* Module is ready to execute: parsing args may do that. */ after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, - -32768, 32767, NULL, + -32768, 32767, mod, unknown_module_param_cb); if (IS_ERR(after_dashes)) { err = PTR_ERR(after_dashes);
From: "Luis R. Rodriguez" <mcgrof@suse.com> Some init systems may wish to express the desire to have device drivers run their device driver's bus probe() run asynchronously. This implements support for this and allows userspace to request async probe as a preference through a generic shared device driver module parameter, async_probe. Implemention for async probe is supported through a module parameter given that since synchronous probe has been prevalent for years some userspace might exist which relies on the fact that the device driver will probe synchronously and the assumption that devices it provides will be immediately available after this. Some device driver might not be able to run async probe so we enable device drivers to annotate this to prevent this module parameter from having any effect on them. This implementation uses queue_work(system_unbound_wq) to queue async probes, this should enable probe to run slightly *faster* if the driver's probe path did not have much interaction with other workqueues otherwise it may run _slightly_ slower. Tests were done with cxgb4, which is known to take long on probe, both without having to run request_firmware() [0] and then by requiring it to use request_firmware() [1]. The difference in run time are only measurable in microseconds: