Message ID | 1327674295-3700-2-git-send-email-akinobu.mita@gmail.com |
---|---|
State | New, archived |
Headers | show |
On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote: > - Use memchr_inv to check if the data contains all 0xFF bytes. > It is faster than looping for each byte. Stupid question: Are there any mtd devices modified that are slower at 64 bit accesses than repeated 8 bit accesses?
On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote: > On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote: >> - Use memchr_inv to check if the data contains all 0xFF bytes. >> It is faster than looping for each byte. > > Stupid question: > > Are there any mtd devices modified that are slower > at 64 bit accesses than repeated 8 bit accesses? I believe this patch deals with kernel buffers, not any kind of direct access to the MTD, so the question (which is not stupid IMO) should be regarding CPU architectures. And my educated guess is that 64-bit access should not be any slower. I do know that 8-bit access *is* slower for some relevant architectures. Brian
2012/1/28 Brian Norris <computersforpeace@gmail.com>: > On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote: >> On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote: >>> - Use memchr_inv to check if the data contains all 0xFF bytes. >>> It is faster than looping for each byte. >> >> Stupid question: >> >> Are there any mtd devices modified that are slower >> at 64 bit accesses than repeated 8 bit accesses? > > I believe this patch deals with kernel buffers, not any kind of direct > access to the MTD, so the question (which is not stupid IMO) should be > regarding CPU architectures. And my educated guess is that 64-bit > access should not be any slower. I do know that 8-bit access *is* > slower for some relevant architectures. It could be slower when the number of bytes scanned is very small (found a unmatched character immediately, or the size of the area is very small), because memchr_inv() needs to generate a 64bit pattern to compare before starting the loop. I recalled that Eric Dumazet pointed out it could generate the 64bit pattern more efficiently. (https://lkml.org/lkml/2011/8/8/480) Even if that small scanning is slower, this change can be assumed cleanup patch that simplifies the code.
On Fri, Jan 27, 2012 at 3:52 PM, Akinobu Mita <akinobu.mita@gmail.com> wrote: > 2012/1/28 Brian Norris <computersforpeace@gmail.com>: >> On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote: >>> On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote: >>>> - Use memchr_inv to check if the data contains all 0xFF bytes. >>>> It is faster than looping for each byte. >>> >>> Stupid question: >>> >>> Are there any mtd devices modified that are slower >>> at 64 bit accesses than repeated 8 bit accesses? >> >> I believe this patch deals with kernel buffers, not any kind of direct >> access to the MTD, so the question (which is not stupid IMO) should be >> regarding CPU architectures. And my educated guess is that 64-bit >> access should not be any slower. I do know that 8-bit access *is* >> slower for some relevant architectures. > > It could be slower when the number of bytes scanned is very small > (found a unmatched character immediately, or the size of the area > is very small), because memchr_inv() needs to generate a 64bit pattern > to compare before starting the loop. I recalled that Eric Dumazet > pointed out it could generate the 64bit pattern more efficiently. > (https://lkml.org/lkml/2011/8/8/480) > > Even if that small scanning is slower, this change can be assumed cleanup > patch that simplifies the code. Well, I agree that it qualifies as cleanup as well, but we should at least make an attempt not to cause performance regression... So by my understanding, the use of memchr_inv() is on buffers of minimum length of 10 in this patch, so we're likely to have decent results. And memcmp() usage looks fine to me. So unless other concerns arise: Acked-by: Brian Norris <computersforpeace@gmail.com>
2012/2/1 Brian Norris <computersforpeace@gmail.com>: > On Fri, Jan 27, 2012 at 3:52 PM, Akinobu Mita <akinobu.mita@gmail.com> wrote: >> 2012/1/28 Brian Norris <computersforpeace@gmail.com>: >>> On Fri, Jan 27, 2012 at 9:16 AM, Joe Perches <joe@perches.com> wrote: >>>> On Fri, 2012-01-27 at 23:24 +0900, Akinobu Mita wrote: >>>>> - Use memchr_inv to check if the data contains all 0xFF bytes. >>>>> It is faster than looping for each byte. >>>> >>>> Stupid question: >>>> >>>> Are there any mtd devices modified that are slower >>>> at 64 bit accesses than repeated 8 bit accesses? >>> >>> I believe this patch deals with kernel buffers, not any kind of direct >>> access to the MTD, so the question (which is not stupid IMO) should be >>> regarding CPU architectures. And my educated guess is that 64-bit >>> access should not be any slower. I do know that 8-bit access *is* >>> slower for some relevant architectures. >> >> It could be slower when the number of bytes scanned is very small >> (found a unmatched character immediately, or the size of the area >> is very small), because memchr_inv() needs to generate a 64bit pattern >> to compare before starting the loop. I recalled that Eric Dumazet >> pointed out it could generate the 64bit pattern more efficiently. >> (https://lkml.org/lkml/2011/8/8/480) >> >> Even if that small scanning is slower, this change can be assumed cleanup >> patch that simplifies the code. > > Well, I agree that it qualifies as cleanup as well, but we should at > least make an attempt not to cause performance regression... > > So by my understanding, the use of memchr_inv() is on buffers of > minimum length of 10 in this patch, so we're likely to have decent > results. And memcmp() usage looks fine to me. Sorry, I answered without checking memchr_inv() carefully. If the size of buffer is less than 16 bytes, memchr_inv() scans for each byte as the original code did. So it is unlikely to be slower in the most cases. But I mentioned in the previous email, there are some problems in memchr_inv(). I'll send the patch in a few days. > So unless other concerns arise: > > Acked-by: Brian Norris <computersforpeace@gmail.com> Thanks.
diff --git a/drivers/mtd/nand/davinci_nand.c b/drivers/mtd/nand/davinci_nand.c index 6e56615..423a7cc 100644 --- a/drivers/mtd/nand/davinci_nand.c +++ b/drivers/mtd/nand/davinci_nand.c @@ -315,10 +315,9 @@ static int nand_davinci_correct_4bit(struct mtd_info *mtd, unsigned long timeo; /* All bytes 0xff? It's an erased page; ignore its ECC. */ - for (i = 0; i < 10; i++) { - if (ecc_code[i] != 0xff) - goto compare; - } + if (memchr_inv(ecc_code, 0xff, 10)) + goto compare; + return 0; compare: diff --git a/drivers/mtd/nand/denali.c b/drivers/mtd/nand/denali.c index 3984d48..a87e3a4 100644 --- a/drivers/mtd/nand/denali.c +++ b/drivers/mtd/nand/denali.c @@ -908,10 +908,9 @@ static void read_oob_data(struct mtd_info *mtd, uint8_t *buf, int page) */ bool is_erased(uint8_t *buf, int len) { - int i = 0; - for (i = 0; i < len; i++) - if (buf[i] != 0xFF) - return false; + if (memchr_inv(buf, 0xFF, len)) + return false; + return true; } #define ECC_SECTOR_SIZE 512 diff --git a/drivers/mtd/nand/diskonchip.c b/drivers/mtd/nand/diskonchip.c index df921e7..bf0602e 100644 --- a/drivers/mtd/nand/diskonchip.c +++ b/drivers/mtd/nand/diskonchip.c @@ -945,12 +945,8 @@ static int doc200x_calculate_ecc(struct mtd_info *mtd, const u_char *dat, unsign /* Note: this somewhat expensive test should not be triggered often. It could be optimized away by examining the data in the writebuf routine, and remembering the result. */ - for (i = 0; i < 512; i++) { - if (dat[i] == 0xff) - continue; + if (memchr_inv(dat, 0xff, 512)) emptymatch = 0; - break; - } } /* If emptymatch still =1, we do have an all-0xff data buffer. Return all-0xff ecc value instead of the computed one, so @@ -1000,24 +996,16 @@ static int doc200x_correct_data(struct mtd_info *mtd, u_char *dat, /* If emptymatch=1, the read syndrome is consistent with an all-0xff data and stored ecc block. Check the stored ecc. */ if (emptymatch) { - for (i = 0; i < 6; i++) { - if (read_ecc[i] == 0xff) - continue; + if (memchr_inv(read_ecc, 0xff, 6)) emptymatch = 0; - break; - } } /* If emptymatch still =1, check the data block. */ if (emptymatch) { /* Note: this somewhat expensive test should not be triggered often. It could be optimized away by examining the data in the readbuf routine, and remembering the result. */ - for (i = 0; i < 512; i++) { - if (dat[i] == 0xff) - continue; + if (memchr_inv(dat, 0xff, 512)) emptymatch = 0; - break; - } } /* If emptymatch still =1, this is almost certainly a freshly- erased block, in which case the ECC will not come out right. diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c index 20a112f..a1b2dd5 100644 --- a/drivers/mtd/nand/nand_bbt.c +++ b/drivers/mtd/nand/nand_bbt.c @@ -100,10 +100,8 @@ static int check_pattern(uint8_t *buf, int len, int paglen, struct nand_bbt_desc end = paglen + td->offs; if (td->options & NAND_BBT_SCANEMPTY) { - for (i = 0; i < end; i++) { - if (p[i] != 0xff) - return -1; - } + if (memchr_inv(p, 0xff, end)) + return -1; } p += end; @@ -133,14 +131,10 @@ static int check_pattern(uint8_t *buf, int len, int paglen, struct nand_bbt_desc */ static int check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td) { - int i; - uint8_t *p = buf; - /* Compare the pattern */ - for (i = 0; i < td->len; i++) { - if (p[td->offs + i] != td->pattern[i]) - return -1; - } + if (memcmp(buf + td->offs, td->pattern, td->len)) + return -1; + return 0; } diff --git a/drivers/mtd/nand/nand_bcm_umi.c b/drivers/mtd/nand/nand_bcm_umi.c index 46a6bc9..ad51dad 100644 --- a/drivers/mtd/nand/nand_bcm_umi.c +++ b/drivers/mtd/nand/nand_bcm_umi.c @@ -109,13 +109,9 @@ int nand_bcm_umi_bch_correct_page(uint8_t *datap, uint8_t *readEccData, * see if errors are correctible */ if ((regValue & REG_UMI_BCH_CTRL_STATUS_UNCORR_ERR) > 0) { - int i; - - for (i = 0; i < numEccBytes; i++) { - if (readEccData[i] != 0xff) { - /* errors cannot be fixed, return -1 */ - return -1; - } + if (memchr_inv(readEccData, 0xff, numEccBytes)) { + /* errors cannot be fixed, return -1 */ + return -1; } /* If ECC is unprogrammed then we can't correct, * assume everything OK */
- Use memchr_inv to check if the data contains all 0xFF bytes. It is faster than looping for each byte. - Use memcmp to compare memory areas Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jiandong Zheng <jdzheng@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org --- drivers/mtd/nand/davinci_nand.c | 7 +++---- drivers/mtd/nand/denali.c | 7 +++---- drivers/mtd/nand/diskonchip.c | 18 +++--------------- drivers/mtd/nand/nand_bbt.c | 16 +++++----------- drivers/mtd/nand/nand_bcm_umi.c | 10 +++------- 5 files changed, 17 insertions(+), 41 deletions(-)