diff mbox series

[v3,1/3] gpio: pca953x: Rewrite ->get_multiple() function

Message ID 20200420172752.33588-1-andriy.shevchenko@linux.intel.com
State New
Headers show
Series [v3,1/3] gpio: pca953x: Rewrite ->get_multiple() function | expand

Commit Message

Andy Shevchenko April 20, 2020, 5:27 p.m. UTC
The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
basically did everything wrong from style and code reuse perspective, i.e.
- it didn't utilize existing PCA953x internal helpers
- it didn't utilize bitmap API
- it misses the point that ilog2(), besides that BANK_SFT is useless,
  can be used in macros
- it has indentation issues.

Rewrite the function completely.

Cc: Paul Thomas <pthomas8589@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/gpio/gpio-pca953x.c | 41 ++++++++++---------------------------
 1 file changed, 11 insertions(+), 30 deletions(-)

Comments

Paul Thomas April 21, 2020, 3:23 a.m. UTC | #1
On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> basically did everything wrong from style and code reuse perspective, i.e.
Hi Andy,

Well your version is certainly elegant and simple, and does better
with code reuse. However there are a couple of other goals I had in
mind.
First, the "lazy" approach of 96d7c7b3e654 is actually faster when
user space sets up a 8-bit linehandle[1]146us (single regmap_read())
vs 172us (pca953x_read_regs()) which incidentally is what we do in our
application. In lazily reading 1 byte at a time it is the fastest
access for that, if user space is always setting up the linehandle for
the whole chip pca953x_read_regs() would be faster. Seeing as
get_multiple has been unimplemented for this chip until now perhaps
our use case deserves some consideration? That being said, the
pca953x_read_regs() is still far better than calling regmap_read() 8
times.

Second, your version does not work with a 5.2 kernel, bitmap_replace
is not there yet and pca953x_read_regs() signature is different. So
perhaps we'll all move on and no one will care about 5.2, but as
that's what we are using that was the basis for the patch. Have you
tested this with actual hardware? I actually didn't do a proper test
just the timing of the pca953x_read_regs().

> - it didn't utilize existing PCA953x internal helpers
> - it didn't utilize bitmap API
> - it misses the point that ilog2(), besides that BANK_SFT is useless,
>   can be used in macros
Yes, I know ilog2() can be used in macros, I didn't think it was worth
including the .h file just to calculate 3. Putting the ilog2(x) in the
comments seemed to be common in other kernel sections, but maybe that
was historic before the macro version? Either way is fine. The shift
is not useless, without that you would go into the if statement for
every bit, but you only want to do a regmap_read() for every byte.

> - it has indentation issues.
It passed checkpatch.pl, and any indentation fixes are fine with me.

thanks,
Paul

[1] Tested using 16-bit max7312
Andy Shevchenko April 21, 2020, 12:55 p.m. UTC | #2
On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote:
> On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> >
> > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> > basically did everything wrong from style and code reuse perspective, i.e.
> Hi Andy,
> 
> Well your version is certainly elegant and simple, and does better
> with code reuse. However there are a couple of other goals I had in
> mind.
> First, the "lazy" approach of 96d7c7b3e654 is actually faster when
> user space sets up a 8-bit linehandle[1]146us (single regmap_read())
> vs 172us (pca953x_read_regs()) which incidentally is what we do in our
> application. In lazily reading 1 byte at a time it is the fastest
> access for that, if user space is always setting up the linehandle for
> the whole chip pca953x_read_regs() would be faster. Seeing as
> get_multiple has been unimplemented for this chip until now perhaps
> our use case deserves some consideration?

I understand completely your goal, but
- for I²C expanders timings is the last thing to look for (they are quite slow
  by nature), so, I really don't care about 16% speed up for one call; don't
  forget that we are in multi-task OS, where this can be easily interrupted and
  user will see the result quite after expected quick result
- the code maintenance has a priority over micro-optimization (this driver
  suffered many times of breakages because of some optimizations done)
- it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are
  applied cleanly

>That being said, the
> pca953x_read_regs() is still far better than calling regmap_read() 8
> times.

Yes, it's a best compromise I can come with.

> Second, your version does not work with a 5.2 kernel, bitmap_replace
> is not there yet and pca953x_read_regs() signature is different. So
> perhaps we'll all move on and no one will care about 5.2, but as
> that's what we are using that was the basis for the patch.

That's not an argument at all. This is feature request, not the fix.
Uwe's ones, though, are fixes, and thus much more important.

> Have you
> tested this with actual hardware?

Do you see any issues with it?

For the record, yes, on Intel Edison/Arduino where 4 PCA9555 are present.

% gpiodetect
gpiochip0 [0000:00:0c.0] (192 lines)
gpiochip1 [i2c-INT3491:00] (16 lines)
gpiochip2 [spi-PRP0001:00] (4 lines)
gpiochip3 [i2c-INT3491:01] (16 lines)
gpiochip4 [i2c-INT3491:02] (16 lines)
gpiochip5 [i2c-INT3491:03] (16 lines)

### Consider gpiochip1: 8 and 10 are PUed

% gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15
[  201.416858] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b58f
1 1 1 1 0 1 1 0 1 0 0 1

### only 8

% gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15
[  354.153921] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b18f
1 1 1 1 0 1 1 0 0 0 0 1

### only 10

% gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15
[  362.723867] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b48f
1 1 1 1 0 1 0 0 1 0 0 1

### None of them

% gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15
[  371.294910] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b08f
1 1 1 1 0 1 0 0 0 0 0 1

Yes, I specifically added a debug print since GPIO trace does not distinguish
between plain get and complex one.

  dev_info(&chip->client->dev, "%s <<< %*pb\n", __func__, MAX_LINE, reg_val);

> I actually didn't do a proper test
> just the timing of the pca953x_read_regs().
> 
> > - it didn't utilize existing PCA953x internal helpers
> > - it didn't utilize bitmap API
> > - it misses the point that ilog2(), besides that BANK_SFT is useless,
> >   can be used in macros
> Yes, I know ilog2() can be used in macros, I didn't think it was worth
> including the .h file just to calculate 3. Putting the ilog2(x) in the
> comments seemed to be common in other kernel sections, but maybe that
> was historic before the macro version? Either way is fine. The shift
> is not useless, without that you would go into the if statement for
> every bit, but you only want to do a regmap_read() for every byte.
> 
> > - it has indentation issues.
> It passed checkpatch.pl, and any indentation fixes are fine with me.

> [1] Tested using 16-bit max7312
Andy Shevchenko April 21, 2020, 1:03 p.m. UTC | #3
On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
> The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> basically did everything wrong from style and code reuse perspective, i.e.
> - it didn't utilize existing PCA953x internal helpers
> - it didn't utilize bitmap API
> - it misses the point that ilog2(), besides that BANK_SFT is useless,
>   can be used in macros
> - it has indentation issues.
> 
> Rewrite the function completely.

Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes.
We may still discuss the approach with ->get_multiple(), though.

For the record, should some of us volunteer to be a reviewer for this driver.
It's awful that almost every release we get something either ugly or broken in it.
Uwe, would you like to be a designated reviewer (I would also support)?

> Cc: Paul Thomas <pthomas8589@gmail.com>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  drivers/gpio/gpio-pca953x.c | 41 ++++++++++---------------------------
>  1 file changed, 11 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
> index 60ae18e4b5f5a..41be681ae77c2 100644
> --- a/drivers/gpio/gpio-pca953x.c
> +++ b/drivers/gpio/gpio-pca953x.c
> @@ -115,7 +115,6 @@ MODULE_DEVICE_TABLE(acpi, pca953x_acpi_ids);
>  
>  #define MAX_BANK 5
>  #define BANK_SZ 8
> -#define BANK_SFT 3 /* ilog2(BANK_SZ) */
>  #define MAX_LINE	(MAX_BANK * BANK_SZ)
>  
>  #define NBANK(chip) DIV_ROUND_UP(chip->gpio_chip.ngpio, BANK_SZ)
> @@ -469,38 +468,20 @@ static int pca953x_gpio_get_direction(struct gpio_chip *gc, unsigned off)
>  }
>  
>  static int pca953x_gpio_get_multiple(struct gpio_chip *gc,
> -				      unsigned long *mask, unsigned long *bits)
> +				     unsigned long *mask, unsigned long *bits)
>  {
>  	struct pca953x_chip *chip = gpiochip_get_data(gc);
> -	unsigned int reg_val;
> -	int offset, value, i, ret = 0;
> -	u8 inreg;
> +	DECLARE_BITMAP(reg_val, MAX_LINE);
> +	int ret;
>  
> -	/* Force offset outside the range of i so that
> -	 * at least the first relevant register is read
> -	 */
> -	offset = gc->ngpio;
> -	for_each_set_bit(i, mask, gc->ngpio) {
> -		/* whenever i goes into a new bank update inreg
> -		 * and read the register
> -		 */
> -		if ((offset >> BANK_SFT) != (i >> BANK_SFT)) {
> -			offset = i;
> -			inreg = pca953x_recalc_addr(chip, chip->regs->input,
> -						    offset, true, false);
> -			mutex_lock(&chip->i2c_lock);
> -			ret = regmap_read(chip->regmap, inreg, &reg_val);
> -			mutex_unlock(&chip->i2c_lock);
> -			if (ret < 0)
> -				return ret;
> -		}
> -		/* reg_val is relative to the last read byte,
> -		 * so only shift the relative bits
> -		 */
> -		value = (reg_val >> (i % 8)) & 0x01;
> -		__assign_bit(i, bits, value);
> -	}
> -	return ret;
> +	mutex_lock(&chip->i2c_lock);
> +	ret = pca953x_read_regs(chip, chip->regs->input, reg_val);
> +	mutex_unlock(&chip->i2c_lock);
> +	if (ret)
> +		return ret;
> +
> +	bitmap_replace(bits, bits, reg_val, mask, gc->ngpio);
> +	return 0;
>  }
>  
>  static void pca953x_gpio_set_multiple(struct gpio_chip *gc,
> -- 
> 2.26.1
>
Uwe Kleine-König April 21, 2020, 2:06 p.m. UTC | #4
Hello Andy,

first of all thanks for picking up my patches, very appreciated.

On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote:
> On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote:
> > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > >
> > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> > > basically did everything wrong from style and code reuse perspective, i.e.
> > Hi Andy,
> > 
> > Well your version is certainly elegant and simple, and does better
> > with code reuse. However there are a couple of other goals I had in
> > mind.
> > First, the "lazy" approach of 96d7c7b3e654 is actually faster when
> > user space sets up a 8-bit linehandle[1]146us (single regmap_read())
> > vs 172us (pca953x_read_regs()) which incidentally is what we do in our
> > application. In lazily reading 1 byte at a time it is the fastest
> > access for that, if user space is always setting up the linehandle for
> > the whole chip pca953x_read_regs() would be faster. Seeing as
> > get_multiple has been unimplemented for this chip until now perhaps
> > our use case deserves some consideration?
> 
> I understand completely your goal, but
> - for I²C expanders timings is the last thing to look for (they are quite slow
>   by nature), so, I really don't care about 16% speed up for one call; don't
>   forget that we are in multi-task OS, where this can be easily interrupted and
>   user will see the result quite after expected quick result

I didn't do any timing analysis and while I understand your
argumentation, I'm not sure to agree. I noticed while debugging the
problem that then resulted in my fix that gpioctl uses the .set_multiple
callback. I told my customer to use gpioctl instead of /sys/class/gpio
because it performs better just to notice that when using gpioctl to set
a single GPIO this transfers five bytes instead of only two.

Having said that I think the sane approach is to optimize
.{g,s}et_multiple to reduce the read/write size to the smallest bulk
size possible that covers all bits that have their corresponding bit set
in mask.

> - the code maintenance has a priority over micro-optimization (this driver
>   suffered many times of breakages because of some optimizations done)

ack here. Some parts of the driver were harder to grasp than necessary.

> - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are
>   applied cleanly

I didn't check, is 96d7c7b3e654 broken for some chips?

I will add my suggested optimisation to my todo list for evaluation. If
I think it is still nice and maintainable I'll send a patch. Until I
have looked into this (or someone else did) I'm in favour of applying
Andy's patch.

Best regards
Uwe
Andy Shevchenko April 21, 2020, 2:21 p.m. UTC | #5
On Tue, Apr 21, 2020 at 04:06:24PM +0200, Uwe Kleine-König wrote:
> On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote:
> > On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote:
> > > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko
> > > <andriy.shevchenko@linux.intel.com> wrote:
> > > >
> > > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> > > > basically did everything wrong from style and code reuse perspective, i.e.
> > > Hi Andy,
> > > 
> > > Well your version is certainly elegant and simple, and does better
> > > with code reuse. However there are a couple of other goals I had in
> > > mind.
> > > First, the "lazy" approach of 96d7c7b3e654 is actually faster when
> > > user space sets up a 8-bit linehandle[1]146us (single regmap_read())
> > > vs 172us (pca953x_read_regs()) which incidentally is what we do in our
> > > application. In lazily reading 1 byte at a time it is the fastest
> > > access for that, if user space is always setting up the linehandle for
> > > the whole chip pca953x_read_regs() would be faster. Seeing as
> > > get_multiple has been unimplemented for this chip until now perhaps
> > > our use case deserves some consideration?
> > 
> > I understand completely your goal, but
> > - for I²C expanders timings is the last thing to look for (they are quite slow
> >   by nature), so, I really don't care about 16% speed up for one call; don't
> >   forget that we are in multi-task OS, where this can be easily interrupted and
> >   user will see the result quite after expected quick result
> 
> I didn't do any timing analysis and while I understand your
> argumentation, I'm not sure to agree. I noticed while debugging the
> problem that then resulted in my fix that gpioctl uses the .set_multiple
> callback. I told my customer to use gpioctl instead of /sys/class/gpio
> because it performs better just to notice that when using gpioctl to set
> a single GPIO this transfers five bytes instead of only two.
> 
> Having said that I think the sane approach is to optimize
> .{g,s}et_multiple to reduce the read/write size to the smallest bulk
> size possible that covers all bits that have their corresponding bit set
> in mask.
> 
> > - the code maintenance has a priority over micro-optimization (this driver
> >   suffered many times of breakages because of some optimizations done)
> 
> ack here. Some parts of the driver were harder to grasp than necessary.
> 
> > - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are
> >   applied cleanly
> 
> I didn't check, is 96d7c7b3e654 broken for some chips?

I was referring to another call to recalc address with additional parameters,
which your second patch actually gets rid of.

> I will add my suggested optimisation to my todo list for evaluation. If
> I think it is still nice and maintainable I'll send a patch. Until I
> have looked into this (or someone else did) I'm in favour of applying
> Andy's patch.
Uwe Kleine-König April 21, 2020, 2:31 p.m. UTC | #6
Hello,

On Tue, Apr 21, 2020 at 04:03:00PM +0300, Andy Shevchenko wrote:
> On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
> > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> > basically did everything wrong from style and code reuse perspective, i.e.
> > - it didn't utilize existing PCA953x internal helpers
> > - it didn't utilize bitmap API
> > - it misses the point that ilog2(), besides that BANK_SFT is useless,
> >   can be used in macros
> > - it has indentation issues.
> > 
> > Rewrite the function completely.
> 
> Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes.
> We may still discuss the approach with ->get_multiple(), though.
> 
> For the record, should some of us volunteer to be a reviewer for this driver.
> It's awful that almost every release we get something either ugly or broken in it.
> Uwe, would you like to be a designated reviewer (I would also support)?

Yeah that would be fine. I'm not sure how long I will have access to the
hardware I fixed the problem on, but I can see what to do when it
happens.

Best regards
Uwe
Bartosz Golaszewski April 21, 2020, 3:42 p.m. UTC | #7
wt., 21 kwi 2020 o 15:03 Andy Shevchenko
<andriy.shevchenko@linux.intel.com> napisał(a):
>
> On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
> > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> > basically did everything wrong from style and code reuse perspective, i.e.
> > - it didn't utilize existing PCA953x internal helpers
> > - it didn't utilize bitmap API
> > - it misses the point that ilog2(), besides that BANK_SFT is useless,
> >   can be used in macros
> > - it has indentation issues.
> >
> > Rewrite the function completely.
>
> Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes.
> We may still discuss the approach with ->get_multiple(), though.
>

Personally I like these patches - if Linus doesn't object in the
coming days I'll pick them up into my tree as a follow-up to Paul's
patch.

Bart

> For the record, should some of us volunteer to be a reviewer for this driver.
> It's awful that almost every release we get something either ugly or broken in it.
> Uwe, would you like to be a designated reviewer (I would also support)?
Paul Thomas April 22, 2020, 4:33 a.m. UTC | #8
Hi Everyone,

On Tue, Apr 21, 2020 at 10:21 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Tue, Apr 21, 2020 at 04:06:24PM +0200, Uwe Kleine-König wrote:
> > On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote:
> > > On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote:
> > > > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko
> > > > <andriy.shevchenko@linux.intel.com> wrote:
> > > > >
> > > > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> > > > > basically did everything wrong from style and code reuse perspective, i.e.
> > > > Hi Andy,
> > > >
> > > > Well your version is certainly elegant and simple, and does better
> > > > with code reuse. However there are a couple of other goals I had in
> > > > mind.
> > > > First, the "lazy" approach of 96d7c7b3e654 is actually faster when
> > > > user space sets up a 8-bit linehandle[1]146us (single regmap_read())
> > > > vs 172us (pca953x_read_regs()) which incidentally is what we do in our
> > > > application. In lazily reading 1 byte at a time it is the fastest
> > > > access for that, if user space is always setting up the linehandle for
> > > > the whole chip pca953x_read_regs() would be faster. Seeing as
> > > > get_multiple has been unimplemented for this chip until now perhaps
> > > > our use case deserves some consideration?
> > >
> > > I understand completely your goal, but
> > > - for I²C expanders timings is the last thing to look for (they are quite slow
> > >   by nature), so, I really don't care about 16% speed up for one call; don't
> > >   forget that we are in multi-task OS, where this can be easily interrupted and
> > >   user will see the result quite after expected quick result
Sure, it's not a HUGE deal, but this will get worse for 5 bank chips.
Also, 26us is not insignificant, with the preempt-rt kernel we're
using that can be more than the max interrupt latency.

> >
> > I didn't do any timing analysis and while I understand your
> > argumentation, I'm not sure to agree. I noticed while debugging the
> > problem that then resulted in my fix that gpioctl uses the .set_multiple
> > callback. I told my customer to use gpioctl instead of /sys/class/gpio
> > because it performs better just to notice that when using gpioctl to set
> > a single GPIO this transfers five bytes instead of only two.
> >
> > Having said that I think the sane approach is to optimize
> > .{g,s}et_multiple to reduce the read/write size to the smallest bulk
> > size possible that covers all bits that have their corresponding bit set
> > in mask.
> >
> > > - the code maintenance has a priority over micro-optimization (this driver
> > >   suffered many times of breakages because of some optimizations done)
Yeah, I appreciate that maintainability needs to be a big priority,
I'm wondering if comments & syntax could be improved so that the
general idea of 96d7c7b3e654 is clear and maintainable. It is just
walking through mask, and whenever it gets to a new byte it reads it
from the hardware.

> >
> > ack here. Some parts of the driver were harder to grasp than necessary.
> >
> > > - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are
> > >   applied cleanly
> >
> > I didn't check, is 96d7c7b3e654 broken for some chips?
>
> I was referring to another call to recalc address with additional parameters,
> which your second patch actually gets rid of.

If it's just the call to pca953x_recalc_addr() that caused the
conflict with Uwe's work with 96d7c7b3e654, we can just remove the
last two arguments so it matches what pca953x_gpio_get_value() is
doing.

thanks,
Paul
Andy Shevchenko April 22, 2020, 8:34 a.m. UTC | #9
On Wed, Apr 22, 2020 at 12:33:34AM -0400, Paul Thomas wrote:
> On Tue, Apr 21, 2020 at 10:21 AM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Tue, Apr 21, 2020 at 04:06:24PM +0200, Uwe Kleine-König wrote:
> > > On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote:
> > > > On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote:
> > > > > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko
> > > > > <andriy.shevchenko@linux.intel.com> wrote:

...

> > > > I understand completely your goal, but
> > > > - for I²C expanders timings is the last thing to look for (they are quite slow
> > > >   by nature), so, I really don't care about 16% speed up for one call; don't
> > > >   forget that we are in multi-task OS, where this can be easily interrupted and
> > > >   user will see the result quite after expected quick result
> Sure, it's not a HUGE deal, but this will get worse for 5 bank chips.
> Also, 26us is not insignificant, with the preempt-rt kernel we're
> using that can be more than the max interrupt latency.

Using slow buses, no, not just using, but relying on them, in RT environment
seems pretty much bad hardware design. If you have time critical solution,
it should try its best by excluding points of possible failures.

> > > I didn't do any timing analysis and while I understand your
> > > argumentation, I'm not sure to agree. I noticed while debugging the
> > > problem that then resulted in my fix that gpioctl uses the .set_multiple
> > > callback. I told my customer to use gpioctl instead of /sys/class/gpio
> > > because it performs better just to notice that when using gpioctl to set
> > > a single GPIO this transfers five bytes instead of only two.
> > >
> > > Having said that I think the sane approach is to optimize
> > > .{g,s}et_multiple to reduce the read/write size to the smallest bulk
> > > size possible that covers all bits that have their corresponding bit set
> > > in mask.
> > >
> > > > - the code maintenance has a priority over micro-optimization (this driver
> > > >   suffered many times of breakages because of some optimizations done)
> Yeah, I appreciate that maintainability needs to be a big priority,
> I'm wondering if comments & syntax could be improved so that the
> general idea of 96d7c7b3e654 is clear and maintainable. It is just
> walking through mask, and whenever it gets to a new byte it reads it
> from the hardware.

I think the idea per se is worth to consider, though it should not be limited
to the ->get_multiple(), for example reading IRQ status can advantage of this
as well. So, it should be rather some common helper which takes mask as input
parameter and converts it to the set of registers to read. regmap API, IIRC,
has support of sparse reading. I dunno, though, if it is supported by regmap
I²C case.

> > > ack here. Some parts of the driver were harder to grasp than necessary.
> > >
> > > > - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are
> > > >   applied cleanly
> > >
> > > I didn't check, is 96d7c7b3e654 broken for some chips?
> >
> > I was referring to another call to recalc address with additional parameters,
> > which your second patch actually gets rid of.
> 
> If it's just the call to pca953x_recalc_addr() that caused the
> conflict with Uwe's work with 96d7c7b3e654, we can just remove the
> last two arguments so it matches what pca953x_gpio_get_value() is
> doing.

Let's rethink entire approach. See above.
Linus Walleij April 28, 2020, 12:13 p.m. UTC | #10
On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski
<bgolaszewski@baylibre.com> wrote:
> wt., 21 kwi 2020 o 15:03 Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> napisał(a):
> >
> > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
> > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function")
> > > basically did everything wrong from style and code reuse perspective, i.e.
> > > - it didn't utilize existing PCA953x internal helpers
> > > - it didn't utilize bitmap API
> > > - it misses the point that ilog2(), besides that BANK_SFT is useless,
> > >   can be used in macros
> > > - it has indentation issues.
> > >
> > > Rewrite the function completely.
> >
> > Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes.
> > We may still discuss the approach with ->get_multiple(), though.
> >
>
> Personally I like these patches - if Linus doesn't object in the
> coming days I'll pick them up into my tree as a follow-up to Paul's
> patch.

I don't mind.

I would like that Uwe also agrees that we can merge
these three and use as a base?

I don't mind trying to put in code to optimize use cases
when accessing single bytes here either. But I'd like there
to be explicit comments in the code saying why these
optimizations are there. Can we do those on top of
this (hopefully) known working base?

Yours,
Linus Walleij
Uwe Kleine-König April 28, 2020, 12:19 p.m. UTC | #11
Hello Linus,

On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote:
> I would like that Uwe also agrees that we can merge
> these three and use as a base?

That's fine for me.

Best regards
Uwe
Andy Shevchenko April 28, 2020, 12:41 p.m. UTC | #12
On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote:
> On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski
> <bgolaszewski@baylibre.com> wrote:
> > wt., 21 kwi 2020 o 15:03 Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> napisał(a):
> > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:

...

> I don't mind trying to put in code to optimize use cases
> when accessing single bytes here either. But I'd like there
> to be explicit comments in the code saying why these
> optimizations are there. Can we do those on top of
> this (hopefully) known working base?

As I pointed out to Paul, the optimization like he proposed is not bad thing
per se, the implementation is. On top of that I suggested to take a look to IRQ
status bits, which presumable will leverage from this optimization as well.

So, After applying this series it would be matter of change one line in the
->get_multiple() to replace read_regs() with optimized version or so along with
IRQ bits changes.
Paul Thomas April 28, 2020, 1:09 p.m. UTC | #13
On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote:
> > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski
> > <bgolaszewski@baylibre.com> wrote:
> > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko
> > > <andriy.shevchenko@linux.intel.com> napisał(a):
> > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
>
> ...
>
> > I don't mind trying to put in code to optimize use cases
> > when accessing single bytes here either. But I'd like there
> > to be explicit comments in the code saying why these
> > optimizations are there. Can we do those on top of
> > this (hopefully) known working base?
For the record my original get_multiple patch was a known working base.

>
> As I pointed out to Paul, the optimization like he proposed is not bad thing
> per se, the implementation is. On top of that I suggested to take a look to IRQ
> status bits, which presumable will leverage from this optimization as well.
>
> So, After applying this series it would be matter of change one line in the
> ->get_multiple() to replace read_regs() with optimized version or so along with
> IRQ bits changes.

This new function would then at least need to be called with mask as
an additional argument right? Then the bitmap_replace() will set
everything regardless of if it was read, this is fine I suppose
because it doesn't matter if it's setting bits outside of mask. You
just have two loops one in the new function and one in
bitmap_replace(). If this is what people would like to see I can work
on it. I did look into the sparse reads and it seems as though regmap
has a gather_write, but not a gather_read and gather_write isn't in
regmap-i2c.c anyway.

-Paul
Andy Shevchenko April 28, 2020, 2:18 p.m. UTC | #14
On Tue, Apr 28, 2020 at 09:09:03AM -0400, Paul Thomas wrote:
> On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> >
> > On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote:
> > > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski
> > > <bgolaszewski@baylibre.com> wrote:
> > > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko
> > > > <andriy.shevchenko@linux.intel.com> napisał(a):
> > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
> >
> > ...
> >
> > > I don't mind trying to put in code to optimize use cases
> > > when accessing single bytes here either. But I'd like there
> > > to be explicit comments in the code saying why these
> > > optimizations are there. Can we do those on top of
> > > this (hopefully) known working base?
> For the record my original get_multiple patch was a known working base.
> 
> >
> > As I pointed out to Paul, the optimization like he proposed is not bad thing
> > per se, the implementation is. On top of that I suggested to take a look to IRQ
> > status bits, which presumable will leverage from this optimization as well.
> >
> > So, After applying this series it would be matter of change one line in the
> > ->get_multiple() to replace read_regs() with optimized version or so along with
> > IRQ bits changes.
> 
> This new function would then at least need to be called with mask as
> an additional argument right? Then the bitmap_replace() will set
> everything regardless of if it was read, this is fine I suppose
> because it doesn't matter if it's setting bits outside of mask. You
> just have two loops one in the new function and one in
> bitmap_replace().

Note, on 64-bit architectures there is no loop in bitmap ops (since 40 <= 64 in
this case). On 32-bit it might be (only for 40 case, which I think less present
in the wild than the rest). And bitmap ops are optimized over longs, so, it's
pretty much fast (esp. in comparison to I²C IO).

> If this is what people would like to see I can work
> on it. I did look into the sparse reads and it seems as though regmap
> has a gather_write, but not a gather_read and gather_write isn't in
> regmap-i2c.c anyway.

Yes, I think this is the right way to go.
Paul Thomas April 28, 2020, 2:58 p.m. UTC | #15
On Tue, Apr 28, 2020 at 10:18 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Tue, Apr 28, 2020 at 09:09:03AM -0400, Paul Thomas wrote:
> > On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote:
> > > > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski
> > > > <bgolaszewski@baylibre.com> wrote:
> > > > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko
> > > > > <andriy.shevchenko@linux.intel.com> napisał(a):
> > > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
> > >
> > > ...
> > >
> > > > I don't mind trying to put in code to optimize use cases
> > > > when accessing single bytes here either. But I'd like there
> > > > to be explicit comments in the code saying why these
> > > > optimizations are there. Can we do those on top of
> > > > this (hopefully) known working base?
> > For the record my original get_multiple patch was a known working base.
> >
> > >
> > > As I pointed out to Paul, the optimization like he proposed is not bad thing
> > > per se, the implementation is. On top of that I suggested to take a look to IRQ
> > > status bits, which presumable will leverage from this optimization as well.
> > >
> > > So, After applying this series it would be matter of change one line in the
> > > ->get_multiple() to replace read_regs() with optimized version or so along with
> > > IRQ bits changes.
> >
> > This new function would then at least need to be called with mask as
> > an additional argument right? Then the bitmap_replace() will set
> > everything regardless of if it was read, this is fine I suppose
> > because it doesn't matter if it's setting bits outside of mask. You
> > just have two loops one in the new function and one in
> > bitmap_replace().
>
> Note, on 64-bit architectures there is no loop in bitmap ops (since 40 <= 64 in
> this case). On 32-bit it might be (only for 40 case, which I think less present
> in the wild than the rest). And bitmap ops are optimized over longs, so, it's
> pretty much fast (esp. in comparison to I²C IO).
>
> > If this is what people would like to see I can work
> > on it. I did look into the sparse reads and it seems as though regmap
> > has a gather_write, but not a gather_read and gather_write isn't in
> > regmap-i2c.c anyway.
>
> Yes, I think this is the right way to go.
OK, sounds good.

-Paul

>
> --
> With Best Regards,
> Andy Shevchenko
>
>
Bartosz Golaszewski April 29, 2020, 11:21 a.m. UTC | #16
wt., 28 kwi 2020 o 16:58 Paul Thomas <pthomas8589@gmail.com> napisał(a):
>
> On Tue, Apr 28, 2020 at 10:18 AM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> >
> > On Tue, Apr 28, 2020 at 09:09:03AM -0400, Paul Thomas wrote:
> > > On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko
> > > <andriy.shevchenko@linux.intel.com> wrote:
> > > >
> > > > On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote:
> > > > > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski
> > > > > <bgolaszewski@baylibre.com> wrote:
> > > > > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko
> > > > > > <andriy.shevchenko@linux.intel.com> napisał(a):
> > > > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote:
> > > >
> > > > ...
> > > >
> > > > > I don't mind trying to put in code to optimize use cases
> > > > > when accessing single bytes here either. But I'd like there
> > > > > to be explicit comments in the code saying why these
> > > > > optimizations are there. Can we do those on top of
> > > > > this (hopefully) known working base?
> > > For the record my original get_multiple patch was a known working base.
> > >
> > > >
> > > > As I pointed out to Paul, the optimization like he proposed is not bad thing
> > > > per se, the implementation is. On top of that I suggested to take a look to IRQ
> > > > status bits, which presumable will leverage from this optimization as well.
> > > >
> > > > So, After applying this series it would be matter of change one line in the
> > > > ->get_multiple() to replace read_regs() with optimized version or so along with
> > > > IRQ bits changes.
> > >
> > > This new function would then at least need to be called with mask as
> > > an additional argument right? Then the bitmap_replace() will set
> > > everything regardless of if it was read, this is fine I suppose
> > > because it doesn't matter if it's setting bits outside of mask. You
> > > just have two loops one in the new function and one in
> > > bitmap_replace().
> >
> > Note, on 64-bit architectures there is no loop in bitmap ops (since 40 <= 64 in
> > this case). On 32-bit it might be (only for 40 case, which I think less present
> > in the wild than the rest). And bitmap ops are optimized over longs, so, it's
> > pretty much fast (esp. in comparison to I²C IO).
> >
> > > If this is what people would like to see I can work
> > > on it. I did look into the sparse reads and it seems as though regmap
> > > has a gather_write, but not a gather_read and gather_write isn't in
> > > regmap-i2c.c anyway.
> >
> > Yes, I think this is the right way to go.
> OK, sounds good.
>

I applied the whole series to my for-next branch. I'll soon send a PR to Linus.

Bart
diff mbox series

Patch

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index 60ae18e4b5f5a..41be681ae77c2 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -115,7 +115,6 @@  MODULE_DEVICE_TABLE(acpi, pca953x_acpi_ids);
 
 #define MAX_BANK 5
 #define BANK_SZ 8
-#define BANK_SFT 3 /* ilog2(BANK_SZ) */
 #define MAX_LINE	(MAX_BANK * BANK_SZ)
 
 #define NBANK(chip) DIV_ROUND_UP(chip->gpio_chip.ngpio, BANK_SZ)
@@ -469,38 +468,20 @@  static int pca953x_gpio_get_direction(struct gpio_chip *gc, unsigned off)
 }
 
 static int pca953x_gpio_get_multiple(struct gpio_chip *gc,
-				      unsigned long *mask, unsigned long *bits)
+				     unsigned long *mask, unsigned long *bits)
 {
 	struct pca953x_chip *chip = gpiochip_get_data(gc);
-	unsigned int reg_val;
-	int offset, value, i, ret = 0;
-	u8 inreg;
+	DECLARE_BITMAP(reg_val, MAX_LINE);
+	int ret;
 
-	/* Force offset outside the range of i so that
-	 * at least the first relevant register is read
-	 */
-	offset = gc->ngpio;
-	for_each_set_bit(i, mask, gc->ngpio) {
-		/* whenever i goes into a new bank update inreg
-		 * and read the register
-		 */
-		if ((offset >> BANK_SFT) != (i >> BANK_SFT)) {
-			offset = i;
-			inreg = pca953x_recalc_addr(chip, chip->regs->input,
-						    offset, true, false);
-			mutex_lock(&chip->i2c_lock);
-			ret = regmap_read(chip->regmap, inreg, &reg_val);
-			mutex_unlock(&chip->i2c_lock);
-			if (ret < 0)
-				return ret;
-		}
-		/* reg_val is relative to the last read byte,
-		 * so only shift the relative bits
-		 */
-		value = (reg_val >> (i % 8)) & 0x01;
-		__assign_bit(i, bits, value);
-	}
-	return ret;
+	mutex_lock(&chip->i2c_lock);
+	ret = pca953x_read_regs(chip, chip->regs->input, reg_val);
+	mutex_unlock(&chip->i2c_lock);
+	if (ret)
+		return ret;
+
+	bitmap_replace(bits, bits, reg_val, mask, gc->ngpio);
+	return 0;
 }
 
 static void pca953x_gpio_set_multiple(struct gpio_chip *gc,