Message ID | 20200420172752.33588-1-andriy.shevchenko@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | [v3,1/3] gpio: pca953x: Rewrite ->get_multiple() function | expand |
On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > basically did everything wrong from style and code reuse perspective, i.e. Hi Andy, Well your version is certainly elegant and simple, and does better with code reuse. However there are a couple of other goals I had in mind. First, the "lazy" approach of 96d7c7b3e654 is actually faster when user space sets up a 8-bit linehandle[1]146us (single regmap_read()) vs 172us (pca953x_read_regs()) which incidentally is what we do in our application. In lazily reading 1 byte at a time it is the fastest access for that, if user space is always setting up the linehandle for the whole chip pca953x_read_regs() would be faster. Seeing as get_multiple has been unimplemented for this chip until now perhaps our use case deserves some consideration? That being said, the pca953x_read_regs() is still far better than calling regmap_read() 8 times. Second, your version does not work with a 5.2 kernel, bitmap_replace is not there yet and pca953x_read_regs() signature is different. So perhaps we'll all move on and no one will care about 5.2, but as that's what we are using that was the basis for the patch. Have you tested this with actual hardware? I actually didn't do a proper test just the timing of the pca953x_read_regs(). > - it didn't utilize existing PCA953x internal helpers > - it didn't utilize bitmap API > - it misses the point that ilog2(), besides that BANK_SFT is useless, > can be used in macros Yes, I know ilog2() can be used in macros, I didn't think it was worth including the .h file just to calculate 3. Putting the ilog2(x) in the comments seemed to be common in other kernel sections, but maybe that was historic before the macro version? Either way is fine. The shift is not useless, without that you would go into the if statement for every bit, but you only want to do a regmap_read() for every byte. > - it has indentation issues. It passed checkpatch.pl, and any indentation fixes are fine with me. thanks, Paul [1] Tested using 16-bit max7312
On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote: > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > > basically did everything wrong from style and code reuse perspective, i.e. > Hi Andy, > > Well your version is certainly elegant and simple, and does better > with code reuse. However there are a couple of other goals I had in > mind. > First, the "lazy" approach of 96d7c7b3e654 is actually faster when > user space sets up a 8-bit linehandle[1]146us (single regmap_read()) > vs 172us (pca953x_read_regs()) which incidentally is what we do in our > application. In lazily reading 1 byte at a time it is the fastest > access for that, if user space is always setting up the linehandle for > the whole chip pca953x_read_regs() would be faster. Seeing as > get_multiple has been unimplemented for this chip until now perhaps > our use case deserves some consideration? I understand completely your goal, but - for I²C expanders timings is the last thing to look for (they are quite slow by nature), so, I really don't care about 16% speed up for one call; don't forget that we are in multi-task OS, where this can be easily interrupted and user will see the result quite after expected quick result - the code maintenance has a priority over micro-optimization (this driver suffered many times of breakages because of some optimizations done) - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are applied cleanly >That being said, the > pca953x_read_regs() is still far better than calling regmap_read() 8 > times. Yes, it's a best compromise I can come with. > Second, your version does not work with a 5.2 kernel, bitmap_replace > is not there yet and pca953x_read_regs() signature is different. So > perhaps we'll all move on and no one will care about 5.2, but as > that's what we are using that was the basis for the patch. That's not an argument at all. This is feature request, not the fix. Uwe's ones, though, are fixes, and thus much more important. > Have you > tested this with actual hardware? Do you see any issues with it? For the record, yes, on Intel Edison/Arduino where 4 PCA9555 are present. % gpiodetect gpiochip0 [0000:00:0c.0] (192 lines) gpiochip1 [i2c-INT3491:00] (16 lines) gpiochip2 [spi-PRP0001:00] (4 lines) gpiochip3 [i2c-INT3491:01] (16 lines) gpiochip4 [i2c-INT3491:02] (16 lines) gpiochip5 [i2c-INT3491:03] (16 lines) ### Consider gpiochip1: 8 and 10 are PUed % gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15 [ 201.416858] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b58f 1 1 1 1 0 1 1 0 1 0 0 1 ### only 8 % gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15 [ 354.153921] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b18f 1 1 1 1 0 1 1 0 0 0 0 1 ### only 10 % gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15 [ 362.723867] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b48f 1 1 1 1 0 1 0 0 1 0 0 1 ### None of them % gpioget gpiochip1 0 1 2 3 6 7 8 9 10 11 14 15 [ 371.294910] pca953x i2c-INT3491:00: pca953x_gpio_get_multiple <<< 00,0000b08f 1 1 1 1 0 1 0 0 0 0 0 1 Yes, I specifically added a debug print since GPIO trace does not distinguish between plain get and complex one. dev_info(&chip->client->dev, "%s <<< %*pb\n", __func__, MAX_LINE, reg_val); > I actually didn't do a proper test > just the timing of the pca953x_read_regs(). > > > - it didn't utilize existing PCA953x internal helpers > > - it didn't utilize bitmap API > > - it misses the point that ilog2(), besides that BANK_SFT is useless, > > can be used in macros > Yes, I know ilog2() can be used in macros, I didn't think it was worth > including the .h file just to calculate 3. Putting the ilog2(x) in the > comments seemed to be common in other kernel sections, but maybe that > was historic before the macro version? Either way is fine. The shift > is not useless, without that you would go into the if statement for > every bit, but you only want to do a regmap_read() for every byte. > > > - it has indentation issues. > It passed checkpatch.pl, and any indentation fixes are fine with me. > [1] Tested using 16-bit max7312
On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > basically did everything wrong from style and code reuse perspective, i.e. > - it didn't utilize existing PCA953x internal helpers > - it didn't utilize bitmap API > - it misses the point that ilog2(), besides that BANK_SFT is useless, > can be used in macros > - it has indentation issues. > > Rewrite the function completely. Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes. We may still discuss the approach with ->get_multiple(), though. For the record, should some of us volunteer to be a reviewer for this driver. It's awful that almost every release we get something either ugly or broken in it. Uwe, would you like to be a designated reviewer (I would also support)? > Cc: Paul Thomas <pthomas8589@gmail.com> > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > --- > drivers/gpio/gpio-pca953x.c | 41 ++++++++++--------------------------- > 1 file changed, 11 insertions(+), 30 deletions(-) > > diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c > index 60ae18e4b5f5a..41be681ae77c2 100644 > --- a/drivers/gpio/gpio-pca953x.c > +++ b/drivers/gpio/gpio-pca953x.c > @@ -115,7 +115,6 @@ MODULE_DEVICE_TABLE(acpi, pca953x_acpi_ids); > > #define MAX_BANK 5 > #define BANK_SZ 8 > -#define BANK_SFT 3 /* ilog2(BANK_SZ) */ > #define MAX_LINE (MAX_BANK * BANK_SZ) > > #define NBANK(chip) DIV_ROUND_UP(chip->gpio_chip.ngpio, BANK_SZ) > @@ -469,38 +468,20 @@ static int pca953x_gpio_get_direction(struct gpio_chip *gc, unsigned off) > } > > static int pca953x_gpio_get_multiple(struct gpio_chip *gc, > - unsigned long *mask, unsigned long *bits) > + unsigned long *mask, unsigned long *bits) > { > struct pca953x_chip *chip = gpiochip_get_data(gc); > - unsigned int reg_val; > - int offset, value, i, ret = 0; > - u8 inreg; > + DECLARE_BITMAP(reg_val, MAX_LINE); > + int ret; > > - /* Force offset outside the range of i so that > - * at least the first relevant register is read > - */ > - offset = gc->ngpio; > - for_each_set_bit(i, mask, gc->ngpio) { > - /* whenever i goes into a new bank update inreg > - * and read the register > - */ > - if ((offset >> BANK_SFT) != (i >> BANK_SFT)) { > - offset = i; > - inreg = pca953x_recalc_addr(chip, chip->regs->input, > - offset, true, false); > - mutex_lock(&chip->i2c_lock); > - ret = regmap_read(chip->regmap, inreg, ®_val); > - mutex_unlock(&chip->i2c_lock); > - if (ret < 0) > - return ret; > - } > - /* reg_val is relative to the last read byte, > - * so only shift the relative bits > - */ > - value = (reg_val >> (i % 8)) & 0x01; > - __assign_bit(i, bits, value); > - } > - return ret; > + mutex_lock(&chip->i2c_lock); > + ret = pca953x_read_regs(chip, chip->regs->input, reg_val); > + mutex_unlock(&chip->i2c_lock); > + if (ret) > + return ret; > + > + bitmap_replace(bits, bits, reg_val, mask, gc->ngpio); > + return 0; > } > > static void pca953x_gpio_set_multiple(struct gpio_chip *gc, > -- > 2.26.1 >
Hello Andy, first of all thanks for picking up my patches, very appreciated. On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote: > On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote: > > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > > > basically did everything wrong from style and code reuse perspective, i.e. > > Hi Andy, > > > > Well your version is certainly elegant and simple, and does better > > with code reuse. However there are a couple of other goals I had in > > mind. > > First, the "lazy" approach of 96d7c7b3e654 is actually faster when > > user space sets up a 8-bit linehandle[1]146us (single regmap_read()) > > vs 172us (pca953x_read_regs()) which incidentally is what we do in our > > application. In lazily reading 1 byte at a time it is the fastest > > access for that, if user space is always setting up the linehandle for > > the whole chip pca953x_read_regs() would be faster. Seeing as > > get_multiple has been unimplemented for this chip until now perhaps > > our use case deserves some consideration? > > I understand completely your goal, but > - for I²C expanders timings is the last thing to look for (they are quite slow > by nature), so, I really don't care about 16% speed up for one call; don't > forget that we are in multi-task OS, where this can be easily interrupted and > user will see the result quite after expected quick result I didn't do any timing analysis and while I understand your argumentation, I'm not sure to agree. I noticed while debugging the problem that then resulted in my fix that gpioctl uses the .set_multiple callback. I told my customer to use gpioctl instead of /sys/class/gpio because it performs better just to notice that when using gpioctl to set a single GPIO this transfers five bytes instead of only two. Having said that I think the sane approach is to optimize .{g,s}et_multiple to reduce the read/write size to the smallest bulk size possible that covers all bits that have their corresponding bit set in mask. > - the code maintenance has a priority over micro-optimization (this driver > suffered many times of breakages because of some optimizations done) ack here. Some parts of the driver were harder to grasp than necessary. > - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are > applied cleanly I didn't check, is 96d7c7b3e654 broken for some chips? I will add my suggested optimisation to my todo list for evaluation. If I think it is still nice and maintainable I'll send a patch. Until I have looked into this (or someone else did) I'm in favour of applying Andy's patch. Best regards Uwe
On Tue, Apr 21, 2020 at 04:06:24PM +0200, Uwe Kleine-König wrote: > On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote: > > On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote: > > > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > > > > basically did everything wrong from style and code reuse perspective, i.e. > > > Hi Andy, > > > > > > Well your version is certainly elegant and simple, and does better > > > with code reuse. However there are a couple of other goals I had in > > > mind. > > > First, the "lazy" approach of 96d7c7b3e654 is actually faster when > > > user space sets up a 8-bit linehandle[1]146us (single regmap_read()) > > > vs 172us (pca953x_read_regs()) which incidentally is what we do in our > > > application. In lazily reading 1 byte at a time it is the fastest > > > access for that, if user space is always setting up the linehandle for > > > the whole chip pca953x_read_regs() would be faster. Seeing as > > > get_multiple has been unimplemented for this chip until now perhaps > > > our use case deserves some consideration? > > > > I understand completely your goal, but > > - for I²C expanders timings is the last thing to look for (they are quite slow > > by nature), so, I really don't care about 16% speed up for one call; don't > > forget that we are in multi-task OS, where this can be easily interrupted and > > user will see the result quite after expected quick result > > I didn't do any timing analysis and while I understand your > argumentation, I'm not sure to agree. I noticed while debugging the > problem that then resulted in my fix that gpioctl uses the .set_multiple > callback. I told my customer to use gpioctl instead of /sys/class/gpio > because it performs better just to notice that when using gpioctl to set > a single GPIO this transfers five bytes instead of only two. > > Having said that I think the sane approach is to optimize > .{g,s}et_multiple to reduce the read/write size to the smallest bulk > size possible that covers all bits that have their corresponding bit set > in mask. > > > - the code maintenance has a priority over micro-optimization (this driver > > suffered many times of breakages because of some optimizations done) > > ack here. Some parts of the driver were harder to grasp than necessary. > > > - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are > > applied cleanly > > I didn't check, is 96d7c7b3e654 broken for some chips? I was referring to another call to recalc address with additional parameters, which your second patch actually gets rid of. > I will add my suggested optimisation to my todo list for evaluation. If > I think it is still nice and maintainable I'll send a patch. Until I > have looked into this (or someone else did) I'm in favour of applying > Andy's patch.
Hello, On Tue, Apr 21, 2020 at 04:03:00PM +0300, Andy Shevchenko wrote: > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > > basically did everything wrong from style and code reuse perspective, i.e. > > - it didn't utilize existing PCA953x internal helpers > > - it didn't utilize bitmap API > > - it misses the point that ilog2(), besides that BANK_SFT is useless, > > can be used in macros > > - it has indentation issues. > > > > Rewrite the function completely. > > Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes. > We may still discuss the approach with ->get_multiple(), though. > > For the record, should some of us volunteer to be a reviewer for this driver. > It's awful that almost every release we get something either ugly or broken in it. > Uwe, would you like to be a designated reviewer (I would also support)? Yeah that would be fine. I'm not sure how long I will have access to the hardware I fixed the problem on, but I can see what to do when it happens. Best regards Uwe
wt., 21 kwi 2020 o 15:03 Andy Shevchenko <andriy.shevchenko@linux.intel.com> napisał(a): > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > > basically did everything wrong from style and code reuse perspective, i.e. > > - it didn't utilize existing PCA953x internal helpers > > - it didn't utilize bitmap API > > - it misses the point that ilog2(), besides that BANK_SFT is useless, > > can be used in macros > > - it has indentation issues. > > > > Rewrite the function completely. > > Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes. > We may still discuss the approach with ->get_multiple(), though. > Personally I like these patches - if Linus doesn't object in the coming days I'll pick them up into my tree as a follow-up to Paul's patch. Bart > For the record, should some of us volunteer to be a reviewer for this driver. > It's awful that almost every release we get something either ugly or broken in it. > Uwe, would you like to be a designated reviewer (I would also support)?
Hi Everyone, On Tue, Apr 21, 2020 at 10:21 AM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Apr 21, 2020 at 04:06:24PM +0200, Uwe Kleine-König wrote: > > On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote: > > > On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote: > > > > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko > > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > > > > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > > > > > basically did everything wrong from style and code reuse perspective, i.e. > > > > Hi Andy, > > > > > > > > Well your version is certainly elegant and simple, and does better > > > > with code reuse. However there are a couple of other goals I had in > > > > mind. > > > > First, the "lazy" approach of 96d7c7b3e654 is actually faster when > > > > user space sets up a 8-bit linehandle[1]146us (single regmap_read()) > > > > vs 172us (pca953x_read_regs()) which incidentally is what we do in our > > > > application. In lazily reading 1 byte at a time it is the fastest > > > > access for that, if user space is always setting up the linehandle for > > > > the whole chip pca953x_read_regs() would be faster. Seeing as > > > > get_multiple has been unimplemented for this chip until now perhaps > > > > our use case deserves some consideration? > > > > > > I understand completely your goal, but > > > - for I²C expanders timings is the last thing to look for (they are quite slow > > > by nature), so, I really don't care about 16% speed up for one call; don't > > > forget that we are in multi-task OS, where this can be easily interrupted and > > > user will see the result quite after expected quick result Sure, it's not a HUGE deal, but this will get worse for 5 bank chips. Also, 26us is not insignificant, with the preempt-rt kernel we're using that can be more than the max interrupt latency. > > > > I didn't do any timing analysis and while I understand your > > argumentation, I'm not sure to agree. I noticed while debugging the > > problem that then resulted in my fix that gpioctl uses the .set_multiple > > callback. I told my customer to use gpioctl instead of /sys/class/gpio > > because it performs better just to notice that when using gpioctl to set > > a single GPIO this transfers five bytes instead of only two. > > > > Having said that I think the sane approach is to optimize > > .{g,s}et_multiple to reduce the read/write size to the smallest bulk > > size possible that covers all bits that have their corresponding bit set > > in mask. > > > > > - the code maintenance has a priority over micro-optimization (this driver > > > suffered many times of breakages because of some optimizations done) Yeah, I appreciate that maintainability needs to be a big priority, I'm wondering if comments & syntax could be improved so that the general idea of 96d7c7b3e654 is clear and maintainable. It is just walking through mask, and whenever it gets to a new byte it reads it from the hardware. > > > > ack here. Some parts of the driver were harder to grasp than necessary. > > > > > - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are > > > applied cleanly > > > > I didn't check, is 96d7c7b3e654 broken for some chips? > > I was referring to another call to recalc address with additional parameters, > which your second patch actually gets rid of. If it's just the call to pca953x_recalc_addr() that caused the conflict with Uwe's work with 96d7c7b3e654, we can just remove the last two arguments so it matches what pca953x_gpio_get_value() is doing. thanks, Paul
On Wed, Apr 22, 2020 at 12:33:34AM -0400, Paul Thomas wrote: > On Tue, Apr 21, 2020 at 10:21 AM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Apr 21, 2020 at 04:06:24PM +0200, Uwe Kleine-König wrote: > > > On Tue, Apr 21, 2020 at 03:55:53PM +0300, Andy Shevchenko wrote: > > > > On Mon, Apr 20, 2020 at 11:23:57PM -0400, Paul Thomas wrote: > > > > > On Mon, Apr 20, 2020 at 1:27 PM Andy Shevchenko > > > > > <andriy.shevchenko@linux.intel.com> wrote: ... > > > > I understand completely your goal, but > > > > - for I²C expanders timings is the last thing to look for (they are quite slow > > > > by nature), so, I really don't care about 16% speed up for one call; don't > > > > forget that we are in multi-task OS, where this can be easily interrupted and > > > > user will see the result quite after expected quick result > Sure, it's not a HUGE deal, but this will get worse for 5 bank chips. > Also, 26us is not insignificant, with the preempt-rt kernel we're > using that can be more than the max interrupt latency. Using slow buses, no, not just using, but relying on them, in RT environment seems pretty much bad hardware design. If you have time critical solution, it should try its best by excluding points of possible failures. > > > I didn't do any timing analysis and while I understand your > > > argumentation, I'm not sure to agree. I noticed while debugging the > > > problem that then resulted in my fix that gpioctl uses the .set_multiple > > > callback. I told my customer to use gpioctl instead of /sys/class/gpio > > > because it performs better just to notice that when using gpioctl to set > > > a single GPIO this transfers five bytes instead of only two. > > > > > > Having said that I think the sane approach is to optimize > > > .{g,s}et_multiple to reduce the read/write size to the smallest bulk > > > size possible that covers all bits that have their corresponding bit set > > > in mask. > > > > > > > - the code maintenance has a priority over micro-optimization (this driver > > > > suffered many times of breakages because of some optimizations done) > Yeah, I appreciate that maintainability needs to be a big priority, > I'm wondering if comments & syntax could be improved so that the > general idea of 96d7c7b3e654 is clear and maintainable. It is just > walking through mask, and whenever it gets to a new byte it reads it > from the hardware. I think the idea per se is worth to consider, though it should not be limited to the ->get_multiple(), for example reading IRQ status can advantage of this as well. So, it should be rather some common helper which takes mask as input parameter and converts it to the set of registers to read. regmap API, IIRC, has support of sparse reading. I dunno, though, if it is supported by regmap I²C case. > > > ack here. Some parts of the driver were harder to grasp than necessary. > > > > > > > - it breaks Uwe's approach to fix AI chips, after my patch Uwe's ones are > > > > applied cleanly > > > > > > I didn't check, is 96d7c7b3e654 broken for some chips? > > > > I was referring to another call to recalc address with additional parameters, > > which your second patch actually gets rid of. > > If it's just the call to pca953x_recalc_addr() that caused the > conflict with Uwe's work with 96d7c7b3e654, we can just remove the > last two arguments so it matches what pca953x_gpio_get_value() is > doing. Let's rethink entire approach. See above.
On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski <bgolaszewski@baylibre.com> wrote: > wt., 21 kwi 2020 o 15:03 Andy Shevchenko > <andriy.shevchenko@linux.intel.com> napisał(a): > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > > > The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") > > > basically did everything wrong from style and code reuse perspective, i.e. > > > - it didn't utilize existing PCA953x internal helpers > > > - it didn't utilize bitmap API > > > - it misses the point that ilog2(), besides that BANK_SFT is useless, > > > can be used in macros > > > - it has indentation issues. > > > > > > Rewrite the function completely. > > > > Bart, Linus, please, consider this series to be applied, because it has Uwe's fixes. > > We may still discuss the approach with ->get_multiple(), though. > > > > Personally I like these patches - if Linus doesn't object in the > coming days I'll pick them up into my tree as a follow-up to Paul's > patch. I don't mind. I would like that Uwe also agrees that we can merge these three and use as a base? I don't mind trying to put in code to optimize use cases when accessing single bytes here either. But I'd like there to be explicit comments in the code saying why these optimizations are there. Can we do those on top of this (hopefully) known working base? Yours, Linus Walleij
Hello Linus, On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote: > I would like that Uwe also agrees that we can merge > these three and use as a base? That's fine for me. Best regards Uwe
On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote: > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski > <bgolaszewski@baylibre.com> wrote: > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> napisał(a): > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: ... > I don't mind trying to put in code to optimize use cases > when accessing single bytes here either. But I'd like there > to be explicit comments in the code saying why these > optimizations are there. Can we do those on top of > this (hopefully) known working base? As I pointed out to Paul, the optimization like he proposed is not bad thing per se, the implementation is. On top of that I suggested to take a look to IRQ status bits, which presumable will leverage from this optimization as well. So, After applying this series it would be matter of change one line in the ->get_multiple() to replace read_regs() with optimized version or so along with IRQ bits changes.
On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote: > > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski > > <bgolaszewski@baylibre.com> wrote: > > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko > > > <andriy.shevchenko@linux.intel.com> napisał(a): > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > > ... > > > I don't mind trying to put in code to optimize use cases > > when accessing single bytes here either. But I'd like there > > to be explicit comments in the code saying why these > > optimizations are there. Can we do those on top of > > this (hopefully) known working base? For the record my original get_multiple patch was a known working base. > > As I pointed out to Paul, the optimization like he proposed is not bad thing > per se, the implementation is. On top of that I suggested to take a look to IRQ > status bits, which presumable will leverage from this optimization as well. > > So, After applying this series it would be matter of change one line in the > ->get_multiple() to replace read_regs() with optimized version or so along with > IRQ bits changes. This new function would then at least need to be called with mask as an additional argument right? Then the bitmap_replace() will set everything regardless of if it was read, this is fine I suppose because it doesn't matter if it's setting bits outside of mask. You just have two loops one in the new function and one in bitmap_replace(). If this is what people would like to see I can work on it. I did look into the sparse reads and it seems as though regmap has a gather_write, but not a gather_read and gather_write isn't in regmap-i2c.c anyway. -Paul
On Tue, Apr 28, 2020 at 09:09:03AM -0400, Paul Thomas wrote: > On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote: > > > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski > > > <bgolaszewski@baylibre.com> wrote: > > > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko > > > > <andriy.shevchenko@linux.intel.com> napisał(a): > > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > > > > ... > > > > > I don't mind trying to put in code to optimize use cases > > > when accessing single bytes here either. But I'd like there > > > to be explicit comments in the code saying why these > > > optimizations are there. Can we do those on top of > > > this (hopefully) known working base? > For the record my original get_multiple patch was a known working base. > > > > > As I pointed out to Paul, the optimization like he proposed is not bad thing > > per se, the implementation is. On top of that I suggested to take a look to IRQ > > status bits, which presumable will leverage from this optimization as well. > > > > So, After applying this series it would be matter of change one line in the > > ->get_multiple() to replace read_regs() with optimized version or so along with > > IRQ bits changes. > > This new function would then at least need to be called with mask as > an additional argument right? Then the bitmap_replace() will set > everything regardless of if it was read, this is fine I suppose > because it doesn't matter if it's setting bits outside of mask. You > just have two loops one in the new function and one in > bitmap_replace(). Note, on 64-bit architectures there is no loop in bitmap ops (since 40 <= 64 in this case). On 32-bit it might be (only for 40 case, which I think less present in the wild than the rest). And bitmap ops are optimized over longs, so, it's pretty much fast (esp. in comparison to I²C IO). > If this is what people would like to see I can work > on it. I did look into the sparse reads and it seems as though regmap > has a gather_write, but not a gather_read and gather_write isn't in > regmap-i2c.c anyway. Yes, I think this is the right way to go.
On Tue, Apr 28, 2020 at 10:18 AM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Apr 28, 2020 at 09:09:03AM -0400, Paul Thomas wrote: > > On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote: > > > > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski > > > > <bgolaszewski@baylibre.com> wrote: > > > > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko > > > > > <andriy.shevchenko@linux.intel.com> napisał(a): > > > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > > > > > > ... > > > > > > > I don't mind trying to put in code to optimize use cases > > > > when accessing single bytes here either. But I'd like there > > > > to be explicit comments in the code saying why these > > > > optimizations are there. Can we do those on top of > > > > this (hopefully) known working base? > > For the record my original get_multiple patch was a known working base. > > > > > > > > As I pointed out to Paul, the optimization like he proposed is not bad thing > > > per se, the implementation is. On top of that I suggested to take a look to IRQ > > > status bits, which presumable will leverage from this optimization as well. > > > > > > So, After applying this series it would be matter of change one line in the > > > ->get_multiple() to replace read_regs() with optimized version or so along with > > > IRQ bits changes. > > > > This new function would then at least need to be called with mask as > > an additional argument right? Then the bitmap_replace() will set > > everything regardless of if it was read, this is fine I suppose > > because it doesn't matter if it's setting bits outside of mask. You > > just have two loops one in the new function and one in > > bitmap_replace(). > > Note, on 64-bit architectures there is no loop in bitmap ops (since 40 <= 64 in > this case). On 32-bit it might be (only for 40 case, which I think less present > in the wild than the rest). And bitmap ops are optimized over longs, so, it's > pretty much fast (esp. in comparison to I²C IO). > > > If this is what people would like to see I can work > > on it. I did look into the sparse reads and it seems as though regmap > > has a gather_write, but not a gather_read and gather_write isn't in > > regmap-i2c.c anyway. > > Yes, I think this is the right way to go. OK, sounds good. -Paul > > -- > With Best Regards, > Andy Shevchenko > >
wt., 28 kwi 2020 o 16:58 Paul Thomas <pthomas8589@gmail.com> napisał(a): > > On Tue, Apr 28, 2020 at 10:18 AM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Tue, Apr 28, 2020 at 09:09:03AM -0400, Paul Thomas wrote: > > > On Tue, Apr 28, 2020 at 8:41 AM Andy Shevchenko > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > > > On Tue, Apr 28, 2020 at 02:13:21PM +0200, Linus Walleij wrote: > > > > > On Tue, Apr 21, 2020 at 5:42 PM Bartosz Golaszewski > > > > > <bgolaszewski@baylibre.com> wrote: > > > > > > wt., 21 kwi 2020 o 15:03 Andy Shevchenko > > > > > > <andriy.shevchenko@linux.intel.com> napisał(a): > > > > > > > On Mon, Apr 20, 2020 at 08:27:50PM +0300, Andy Shevchenko wrote: > > > > > > > > ... > > > > > > > > > I don't mind trying to put in code to optimize use cases > > > > > when accessing single bytes here either. But I'd like there > > > > > to be explicit comments in the code saying why these > > > > > optimizations are there. Can we do those on top of > > > > > this (hopefully) known working base? > > > For the record my original get_multiple patch was a known working base. > > > > > > > > > > > As I pointed out to Paul, the optimization like he proposed is not bad thing > > > > per se, the implementation is. On top of that I suggested to take a look to IRQ > > > > status bits, which presumable will leverage from this optimization as well. > > > > > > > > So, After applying this series it would be matter of change one line in the > > > > ->get_multiple() to replace read_regs() with optimized version or so along with > > > > IRQ bits changes. > > > > > > This new function would then at least need to be called with mask as > > > an additional argument right? Then the bitmap_replace() will set > > > everything regardless of if it was read, this is fine I suppose > > > because it doesn't matter if it's setting bits outside of mask. You > > > just have two loops one in the new function and one in > > > bitmap_replace(). > > > > Note, on 64-bit architectures there is no loop in bitmap ops (since 40 <= 64 in > > this case). On 32-bit it might be (only for 40 case, which I think less present > > in the wild than the rest). And bitmap ops are optimized over longs, so, it's > > pretty much fast (esp. in comparison to I²C IO). > > > > > If this is what people would like to see I can work > > > on it. I did look into the sparse reads and it seems as though regmap > > > has a gather_write, but not a gather_read and gather_write isn't in > > > regmap-i2c.c anyway. > > > > Yes, I think this is the right way to go. > OK, sounds good. > I applied the whole series to my for-next branch. I'll soon send a PR to Linus. Bart
diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index 60ae18e4b5f5a..41be681ae77c2 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -115,7 +115,6 @@ MODULE_DEVICE_TABLE(acpi, pca953x_acpi_ids); #define MAX_BANK 5 #define BANK_SZ 8 -#define BANK_SFT 3 /* ilog2(BANK_SZ) */ #define MAX_LINE (MAX_BANK * BANK_SZ) #define NBANK(chip) DIV_ROUND_UP(chip->gpio_chip.ngpio, BANK_SZ) @@ -469,38 +468,20 @@ static int pca953x_gpio_get_direction(struct gpio_chip *gc, unsigned off) } static int pca953x_gpio_get_multiple(struct gpio_chip *gc, - unsigned long *mask, unsigned long *bits) + unsigned long *mask, unsigned long *bits) { struct pca953x_chip *chip = gpiochip_get_data(gc); - unsigned int reg_val; - int offset, value, i, ret = 0; - u8 inreg; + DECLARE_BITMAP(reg_val, MAX_LINE); + int ret; - /* Force offset outside the range of i so that - * at least the first relevant register is read - */ - offset = gc->ngpio; - for_each_set_bit(i, mask, gc->ngpio) { - /* whenever i goes into a new bank update inreg - * and read the register - */ - if ((offset >> BANK_SFT) != (i >> BANK_SFT)) { - offset = i; - inreg = pca953x_recalc_addr(chip, chip->regs->input, - offset, true, false); - mutex_lock(&chip->i2c_lock); - ret = regmap_read(chip->regmap, inreg, ®_val); - mutex_unlock(&chip->i2c_lock); - if (ret < 0) - return ret; - } - /* reg_val is relative to the last read byte, - * so only shift the relative bits - */ - value = (reg_val >> (i % 8)) & 0x01; - __assign_bit(i, bits, value); - } - return ret; + mutex_lock(&chip->i2c_lock); + ret = pca953x_read_regs(chip, chip->regs->input, reg_val); + mutex_unlock(&chip->i2c_lock); + if (ret) + return ret; + + bitmap_replace(bits, bits, reg_val, mask, gc->ngpio); + return 0; } static void pca953x_gpio_set_multiple(struct gpio_chip *gc,
The commit 96d7c7b3e654 ("gpio: gpio-pca953x, Add get_multiple function") basically did everything wrong from style and code reuse perspective, i.e. - it didn't utilize existing PCA953x internal helpers - it didn't utilize bitmap API - it misses the point that ilog2(), besides that BANK_SFT is useless, can be used in macros - it has indentation issues. Rewrite the function completely. Cc: Paul Thomas <pthomas8589@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> --- drivers/gpio/gpio-pca953x.c | 41 ++++++++++--------------------------- 1 file changed, 11 insertions(+), 30 deletions(-)