mbox series

[RFC,v3,0/7] Extend regulator notification support

Message ID cover.1615454845.git.matti.vaittinen@fi.rohmeurope.com
Headers show
Series Extend regulator notification support | expand

Message

Matti Vaittinen March 11, 2021, 10:21 a.m. UTC
Extend regulator notification support

This is an RFC series for getting feedback on extending the regulator
notification and error flag support. Initial discussion on the topic can
be found here:
https://lore.kernel.org/lkml/6046836e22b8252983f08d5621c35ececb97820d.camel@fi.rohmeurope.com/

This series is built on top of the BD9576MUF support patch series v9
which is currently in MFD tree at immutable branch ib-mfd-watchdog-5.13
https://lore.kernel.org/lkml/cover.1615219345.git.matti.vaittinen@fi.rohmeurope.com/
(The series should apply without those patches but there is compile time
dependency to definitions brought in at the last patch of the BD9576
series. This should be Ok though as there is a Kconfig dependency in
BD9576 regulator driver)

In a nutshell - the RFC adds:

1. WARNING level events/error flags. (Patch 2)
  Current regulator 'ERROR' event notifications for over/under
  voltage, over current and over temperature are used to indicate
  condition where monitored entity is so badly "off" that it actually
  indicates a hardware error which can not be recovered. The most
  typical hanling for that is believed to be a (graceful)
  system-shutdown. Here we add set of 'WARNING' level flags to allow
  sending notifications to consumers before things are 'that badly off'
  so that consumer drivers can implement recovery-actions.
2. Device-tree properties for specifying limit values. (Patches 1, 4)
  Add limits for above mentioned 'ERROR' and 'WARNING' levels (which
  send notifications to consumers) and also for a 'PROTECTION' level
  (which will be used to immediately shut-down the regulator(s) W/O
  informing consumer drivers. Typically implemented by hardware).
  Property parsing is implemented in regulator core which then calls
  callback operations for limit setting from the IC drivers. A
  warning is emitted if protection is requested by device tree but the
  underlying IC does not support configuring requested protection.
3. Helpers which can be registered by IC. (Patch 3)
  Target is to avoid implementing IRQ handling and IRQ storm protection
  in each IC driver. (Many of the ICs implementin these IRQs do not allow
  masking or acking the IRQ but keep the IRQ asserted for the whole
  duration of problem keeping the processor in IRQ handling loop).

The helper was attempted to be done so it could be used to implement
roughly same logic as is used in qcom-labibb regulator. This means
amongst other things a safety shut-down if IC registers are not readable.
Using these shut-down retry counters are optional. The idea is that the
helper could be also used by simpler ICs which do not provide status
register(s) which can be used to check if error is still active.

ICs which do not have such status register can simply omit the 'renable'
callback (and retry-counts etc) - and helper assumes the situation is Ok
and re-enables IRQ after given time period. If problem persists the
handler is ran again and another notification is sent - but at least the
delay allows processor to avoid IRQ loop.

Patch 6 takes this notification support in use at BD9576MUF.
Patch 7 is related to MFD change which is not really related to the RFC
here. It was added to this series in order to avoid potential conflicts.

Changelog v3:
  Regulator core:
   - Fix dangling pointer access at regulator_irq_helper()
  stpmic1_regulator:
   - fix function prototype (compile error)
  bd9576-regulator:
   - Update over current limits to what was given in new data-sheet
     (REV00K)
   - Allow over-current monitoring without external FET. Set limits to
     values given in data-sheet (REV00K).

Changelog v2:
  Generic:
  - rebase on v5.12-rc2 + BD9576 series
  - Split devm variant of delayed wq to own series
  Regulator framework:
  - Provide non devm variant of IRQ notification helpers
  - shorten dt-property names as suggested by Rob
  - unconditionally call map_event in IRQ handling and require it to be
    populated
  BD9576 regulators:
  - change the FET resistance property to micro-ohms
  - fix voltage computation in OC limit setting

--

Matti Vaittinen (7):
  dt_bindings: Add protection limit properties
  regulator: add warning flags
  regulator: IRQ based event/error notification helpers
  regulator: add property parsing and callbacks to set protection limits
  dt-bindings: regulator: bd9576 add FET ON-resistance for OCW
  regulator: bd9576: Support error reporting
  regulator: bd9576: Fix the driver name in id table

 .../bindings/regulator/regulator.yaml         |   82 ++
 .../regulator/rohm,bd9576-regulator.yaml      |    5 +
 drivers/regulator/Makefile                    |    2 +-
 drivers/regulator/bd9576-regulator.c          | 1060 +++++++++++++++--
 drivers/regulator/core.c                      |  146 ++-
 drivers/regulator/irq_helpers.c               |  423 +++++++
 drivers/regulator/of_regulator.c              |   58 +
 drivers/regulator/qcom-labibb-regulator.c     |   10 +-
 drivers/regulator/qcom_spmi-regulator.c       |    6 +-
 drivers/regulator/stpmic1_regulator.c         |   20 +-
 include/linux/regulator/consumer.h            |   14 +
 include/linux/regulator/driver.h              |  176 ++-
 include/linux/regulator/machine.h             |   26 +
 13 files changed, 1886 insertions(+), 142 deletions(-)
 create mode 100644 drivers/regulator/irq_helpers.c

Comments

Mark Brown April 2, 2021, 5:11 p.m. UTC | #1
On Thu, Mar 11, 2021 at 12:22:36PM +0200, Matti Vaittinen wrote:

> @@ -0,0 +1,423 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2021 ROHM Semiconductors

Please make the entire comment a C++ one so things look more consistent.

> +static void regulator_notifier_isr_work(struct work_struct *work)
> +{

> +	if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) {
> +		if (d->die)
> +			ret = d->die(rid);
> +		else
> +			BUG();
> +
> +		/*
> +		 * If the 'last resort' IC recovery failed we will have
> +		 * nothing else left to do...
> +		 */
> +		BUG_ON(ret);

This isn't good...  we should be trying to provide more system level
handling of this, if nothing else it's quite possibly not a software bug
here but rather a hardware failure.  An explicit message about what
happened would be more likely to be understood as a hardware failure,
and something which allows handling such as initiating a system shutdown
would be good as well - I'm not sure if there's any existing mechanism
to plumb userspace into, or perhaps some sort of policy configurable via
sysfs.  That could be built on later though, I think the main thing here
is that the logging should be clearer and distinguishable from a random
software fault which is what BUG_ON() looks like.  The backtrace and
whatnot that BUG_ON() provides aren't useful here and the message isn't
going to be very distinctive, some custom prints will attract more
attention.

> +	/* Disable IRQ if HW keeps line asserted */
> +	if (d->irq_off_ms)
> +		disable_irq_nosync(irq);
> +	/*
> +	 * IRQ seems to be for us. Let's fire correct notifiers / store error

Missing blank lines in the file.

> + * This structure is passed to map_event and renable for reporting reulator

regulator.
Mark Brown April 2, 2021, 5:18 p.m. UTC | #2
On Thu, Mar 11, 2021 at 12:23:02PM +0200, Matti Vaittinen wrote:

> +	/*
> +	 * Existing logic does not warn if over_current_protection is given as
> +	 * a constraint but driver does not support that. I think we should
> +	 * warn about this type of issues as it is possible someone changes

The "existing logic" bit here is for a changelog, not the code - as soon
as the patch is applied the comment becomes inaccurate.  This also seems
like a separate patch.
Mark Brown April 2, 2021, 5:19 p.m. UTC | #3
On Thu, Mar 11, 2021 at 12:24:29PM +0200, Matti Vaittinen wrote:
> Driver name was changed in MFD cell:
> https://lore.kernel.org/lkml/560b9748094392493ebf7af11b6cc558776c4fd5.1613031055.git.matti.vaittinen@fi.rohmeurope.com/
> Fix the ID table to match this.

This looks unrelated to the rest of the series?
Mark Brown April 2, 2021, 5:19 p.m. UTC | #4
On Thu, Mar 11, 2021 at 12:21:01PM +0200, Matti Vaittinen wrote:
> Extend regulator notification support
> 
> This is an RFC series for getting feedback on extending the regulator
> notification and error flag support. Initial discussion on the topic can
> be found here:

This looks good apart from the fairly minor comments I sent on a couple
of the patches and the schema problem Rob reported.
Matti Vaittinen April 4, 2021, 3:47 p.m. UTC | #5
Hi Mark,

Thanks for the review(s)!
On Fri, 2021-04-02 at 18:18 +0100, Mark Brown wrote:
> On Thu, Mar 11, 2021 at 12:23:02PM +0200, Matti Vaittinen wrote:
> 
> > +	/*
> > +	 * Existing logic does not warn if over_current_protection is
> > given as
> > +	 * a constraint but driver does not support that. I think we
> > should
> > +	 * warn about this type of issues as it is possible someone
> > changes
> 
> The "existing logic" bit here is for a changelog, not the code - as
> soon
> as the patch is applied the comment becomes inaccurate.  This also
> seems
> like a separate patch.

I don't think this patch changed the logic but kept it as it is now.
Eg, for the existing over_current_protection property we still silently
ignore case where property is given but driver does not support setting
it. For me this sounds like fragile approach and I did handle the new
properties (like detection) in a different way. Thus the comment should
stay valid - and thus I didn't think this warrants a new patch.

If you think we should change the logic, then we should definitely do
that in separate patch. That allows revert if existing setups break
everywhere. How would you like this to be? I can change the logic if
you see it's worth the risk of breaking existing setups.

Best Regards
	Matti Vaittinen
Matti Vaittinen April 4, 2021, 3:51 p.m. UTC | #6
On Fri, 2021-04-02 at 18:19 +0100, Mark Brown wrote:
> On Thu, Mar 11, 2021 at 12:24:29PM +0200, Matti Vaittinen wrote:
> > Driver name was changed in MFD cell:
> > https://lore.kernel.org/lkml/560b9748094392493ebf7af11b6cc558776c4fd5.1613031055.git.matti.vaittinen@fi.rohmeurope.com/
> > Fix the ID table to match this.
> 
> This looks unrelated to the rest of the series?

Correct. I think I mentioned that somewhere - or at least I intended to
do that. Probably in the cover-letter.

I included this change to the series just to avoid conflicts. Do you
want me to send it separately?

Best Regards
	Matti Vaittinen
Matti Vaittinen April 4, 2021, 4:07 p.m. UTC | #7
On Fri, 2021-04-02 at 18:11 +0100, Mark Brown wrote:
> On Thu, Mar 11, 2021 at 12:22:36PM +0200, Matti Vaittinen wrote:
> > +	if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) {
> > +		if (d->die)
> > +			ret = d->die(rid);
> > +		else
> > +			BUG();
> > +
> > +		/*
> > +		 * If the 'last resort' IC recovery failed we will have
> > +		 * nothing else left to do...
> > +		 */
> > +		BUG_ON(ret);
> 
> This isn't good...  we should be trying to provide more system level
> handling of this, if nothing else it's quite possibly not a software
> bug
> here but rather a hardware failure.  An explicit message about what
> happened would be more likely to be understood as a hardware failure,

I do agree. I'll add a print in next version.

> and something which allows handling such as initiating a system
> shutdown
> would be good as well - I'm not sure if there's any existing
> mechanism
> to plumb userspace into, or perhaps some sort of policy configurable
> via
> sysfs.

I like the idea but don't know of such existing mechanism. The input
system power-key event is closest that comes to my mind - but I don't
think that would be quite right. Additionally, I am unsure what level
of user-space functionality can be expected to work? Maybe the severity
of configured notifications should be used to decide whether to do in-
kernel handling or to alert user-space. Anyways, that is something that
requires further pondering - I'd propose improving this later.

Best Regards
	Matti Vaittinen