Message ID | 513F4EC7.6010109@dlhnet.de |
---|---|
State | New |
Headers | show |
On 03/12/2013 09:50 AM, Peter Lieven wrote: > performance gain on SSE2 is approx. 20-25%. altivec > is not tested. performance for unsigned long arithmetic > is unchanged. > > Signed-off-by: Peter Lieven <pl@kamp.de> > --- > util/cutils.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/util/cutils.c b/util/cutils.c > index a09d8e8..23f0cd6 100644 > --- a/util/cutils.c > +++ b/util/cutils.c > @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len) > * latency. > */ > > + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 > + && len % 8*sizeof(VECTYPE) == 0) { Space around binary operators. Use CHAR_BITS instead of a magic number 8. Also, did you mean: len % (CHAR_BITS * sizeof(VECTYPE)) instead of what you wrote as '(len % 8) * sizeof(VECTYPE)'? > + return buffer_find_nonzero_offset(buf, len)==len; > + } > + > size_t i; > long d0, d1, d2, d3; > const long * const data = buf;
Am 12.03.2013 um 17:01 schrieb Eric Blake <eblake@redhat.com>: > On 03/12/2013 09:50 AM, Peter Lieven wrote: >> performance gain on SSE2 is approx. 20-25%. altivec >> is not tested. performance for unsigned long arithmetic >> is unchanged. >> >> Signed-off-by: Peter Lieven <pl@kamp.de> >> --- >> util/cutils.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/util/cutils.c b/util/cutils.c >> index a09d8e8..23f0cd6 100644 >> --- a/util/cutils.c >> +++ b/util/cutils.c >> @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len) >> * latency. >> */ >> >> + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 >> + && len % 8*sizeof(VECTYPE) == 0) { > > Space around binary operators. Use CHAR_BITS instead of a magic number > 8. Also, did you mean: > > len % (CHAR_BITS * sizeof(VECTYPE)) > > instead of what you wrote as '(len % 8) * sizeof(VECTYPE)'? the 8 is not BITS_PER_BYTE or CHAR_BITS its the number of vectors in one loop in buffer_find_nonzero_offset(). I will add a constant for this to make it clearer. Peter > >> + return buffer_find_nonzero_offset(buf, len)==len; >> + } >> + >> size_t i; >> long d0, d1, d2, d3; >> const long * const data = buf; > > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org >
On 03/12/2013 10:03 AM, Peter Lieven wrote: >>> + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 >>> + && len % 8*sizeof(VECTYPE) == 0) { >> >> Space around binary operators. Use CHAR_BITS instead of a magic number >> 8. > the 8 is not BITS_PER_BYTE or CHAR_BITS its the number of > vectors in one loop in buffer_find_nonzero_offset(). I will add > a constant for this to make it clearer. Indeed, now I see it - 8 is the unroll factor. Well, all the more evidence that a named constant makes the code easier to read, compared to me mis-interpreting the magic number.
diff --git a/util/cutils.c b/util/cutils.c index a09d8e8..23f0cd6 100644 --- a/util/cutils.c +++ b/util/cutils.c @@ -186,6 +186,11 @@ bool buffer_is_zero(const void *buf, size_t len) * latency. */ + if (((uintptr_t) buf) % sizeof(VECTYPE) == 0 + && len % 8*sizeof(VECTYPE) == 0) { + return buffer_find_nonzero_offset(buf, len)==len; + } + size_t i; long d0, d1, d2, d3; const long * const data = buf;
performance gain on SSE2 is approx. 20-25%. altivec is not tested. performance for unsigned long arithmetic is unchanged. Signed-off-by: Peter Lieven <pl@kamp.de> --- util/cutils.c | 5 +++++ 1 file changed, 5 insertions(+)