Message ID | 20240909162327.3163100-3-visitorckw@gmail.com |
---|---|
State | New |
Headers | show |
Series | Optimization and benchmarking of bsearch() | expand |
On Wed, Sep 11, 2024 at 06:16:32PM -0400, DJ Delorie wrote: > Kuan-Wei Chiu <visitorckw@gmail.com> writes: > > Optimize the bsearch() function to improve binary search performance. > > Although the code size grew by 8 bytes, the new implementation achieves > > a 15% reduction in execution time on my x86 machine, according to the > > bench-bsearch benchmark results. > > LGTM as-is but I noted something to try that might speed it up more. > Reviewed-by: DJ Delorie <dj@redhat.com> Thank you for your review and suggestions! > > > diff --git a/bits/stdlib-bsearch.h b/bits/stdlib-bsearch.h > > > bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size, > > __compar_fn_t __compar) > > { > > - size_t __l, __u, __idx; > > const void *__p; > > int __comparison; > > > > - __l = 0; > > - __u = __nmemb; > > - while (__l < __u) > > + while (__nmemb) > > { > > - __idx = (__l + __u) / 2; > > - __p = (const void *) (((const char *) __base) + (__idx * __size)); > > + __p = (const void *) (((const char *) __base) + ((__nmemb >> 1) * __size)); > > We're replacing an add/shift with just a shift, but using the existing > __base and __nmemb instead of the new __l and __u. Ok. > > After this, __p points to base[__nmemb>>1] > > > __comparison = (*__compar) (__key, __p); > > - if (__comparison < 0) > > - __u = __idx; > > - else if (__comparison > 0) > > - __l = __idx + 1; > > - else > > + if (__comparison == 0) > > It seems to me performance would be better if the more likely cases went > first - checking "__comparison > 0" would, approximately 50% of the > time, skip the "__comparison == 0" check, but not the other way around. > Have you tried this with the ">0" test before the "==0" test, or with > marking the "==0" test as unlikely? > > Keeping the >0 and <0 paths close to the top of the loop might help with > cache hits too, depending on link alignment. I tried adding unlikely(), but it seems that gcc generates the exact same code (Note: I only checked on x86). As for moving the > 0 test first, I also gave it a try, but it actually worsened the benchmark results. I haven’t looked closely at the generated code to understand why, but here are the benchmark results: For the current patch: { "timing_type": "hp_timing", "functions": { "bsearch": { "bench-variant": "default", "results": [ { "array-size": 100000, "element-size": 4, "key-pattern": "descending", "contained": "no", "simple": "yes", "timing": 110.092 }, { "array-size": 100000, "element-size": 4, "key-pattern": "descending", "contained": "yes", "simple": "yes", "timing": 99.0945 }, { "array-size": 100000, "element-size": 4, "key-pattern": "ascending", "contained": "no", "simple": "yes", "timing": 112.395 }, { "array-size": 100000, "element-size": 4, "key-pattern": "ascending", "contained": "yes", "simple": "yes", "timing": 98.5953 }] } } } With the > 0 test moved to the front: { "timing_type": "hp_timing", "functions": { "bsearch": { "bench-variant": "default", "results": [ { "array-size": 100000, "element-size": 4, "key-pattern": "descending", "contained": "no", "simple": "yes", "timing": 113.875 }, { "array-size": 100000, "element-size": 4, "key-pattern": "descending", "contained": "yes", "simple": "yes", "timing": 101.111 }, { "array-size": 100000, "element-size": 4, "key-pattern": "ascending", "contained": "no", "simple": "yes", "timing": 113.713 }, { "array-size": 100000, "element-size": 4, "key-pattern": "ascending", "contained": "yes", "simple": "yes", "timing": 105.429 }] } } } > > > { > > #if __GNUC_PREREQ(4, 6) > > # pragma GCC diagnostic push > > @@ -46,6 +38,12 @@ bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size, > > # pragma GCC diagnostic pop > > #endif > > } > > + if (__comparison > 0) > > + { > > + __base = ((const char *) __p) + __size; > > + --__nmemb; > > + } > > + __nmemb >>= 1; > > So if comparison < 0 (key is before probe): > __nmemb is set to the number of elements before __p which is base[__nmemb>>1] > __base is unchanged > - range is 0..__nmemb-1 > > if comparison is > 0 (key is after probe): > __base is set to the element after the probe > If (__nmemb is odd) > probe point is at (__nmemb-1) >> 1 > same numbers before and after probe > subtracting one is unneeded but has no effect > > 0 1 2 3 4 5 6 7 8 (n == 9) (n>>1 == 4) > P > 0 1 2 3 4 5 6 (n == 7) (n>>1 == 3) > P > > If (__nmemb is even) > probe point is at (__nmemb) >> 1 > even numbers before but odd number after > subtracting one is needed > > 0 1 2 3 4 5 6 7 (n == 8) (n>>1 == 4) > P > 0 1 2 3 4 5 (n == 6) (n>>1 == 3) > P > > Ok. > > > } > > > > return NULL; >
diff --git a/bits/stdlib-bsearch.h b/bits/stdlib-bsearch.h index 540d718952..57f060b504 100644 --- a/bits/stdlib-bsearch.h +++ b/bits/stdlib-bsearch.h @@ -20,22 +20,14 @@ __extern_inline void * bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size, __compar_fn_t __compar) { - size_t __l, __u, __idx; const void *__p; int __comparison; - __l = 0; - __u = __nmemb; - while (__l < __u) + while (__nmemb) { - __idx = (__l + __u) / 2; - __p = (const void *) (((const char *) __base) + (__idx * __size)); + __p = (const void *) (((const char *) __base) + ((__nmemb >> 1) * __size)); __comparison = (*__compar) (__key, __p); - if (__comparison < 0) - __u = __idx; - else if (__comparison > 0) - __l = __idx + 1; - else + if (__comparison == 0) { #if __GNUC_PREREQ(4, 6) # pragma GCC diagnostic push @@ -46,6 +38,12 @@ bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size, # pragma GCC diagnostic pop #endif } + if (__comparison > 0) + { + __base = ((const char *) __p) + __size; + --__nmemb; + } + __nmemb >>= 1; } return NULL;