Message ID | 20141216202438.GA5612@domone |
---|---|
State | New |
Headers | show |
Thanks, this is much better than worrying about how to pacify GCC. The code could be made a bit shorter and clearer with mempcpy, and there's no longer any need to distinguish between s and s1, so I suggest the following minor rewrite, which shrinks the code size by another 26 bytes (16%) on my x86-64 platform. char * STRNCAT (char *s1, const char *s2, size_t n) { char *s1_end = mempcpy (s1 + strlen (s1), s2, __strnlen (s2, n)); *s1_end = '\0'; return s1; }
From: Paul Eggert <eggert@cs.ucla.edu> Date: Tue, 16 Dec 2014 12:50:38 -0800 > Thanks, this is much better than worrying about how to pacify GCC. The > code could be made a bit shorter and clearer with mempcpy, and there's > no longer any need to distinguish between s and s1, so I suggest the > following minor rewrite, which shrinks the code size by another 26 > bytes (16%) on my x86-64 platform. The fact that we are talking about complete rewrites for what should be a one line warning fix shows that we enabled -Werror prematurely. This should have been a no-brainer, yet it is going to take a week to fix something that is breaking the build for people.
On Tue, 16 Dec 2014, David Miller wrote: > The fact that we are talking about complete rewrites for what should > be a one line warning fix shows that we enabled -Werror prematurely. No, it simply shows people getting side-tracked and confusing the questions of how to fix the build for an existing warning and how to clean up the issue it shows up. The variety of different configurations in which people build glibc fundamentally requires the process of fixing warnings to be done in parallel by people seeing those warnings (even restricted to x86_64 with GCC 4.9, some people are reporting different warnings I don't see in such a configuration, so testing all combinations of architectures and compiler versions would not cover the warning space). The process of fixing warnings started while the -Werror policy was under discussion (there was plenty of opportunity for people to fix warnings they saw in advance of -Werror being enabled - if they didn't take that opportunity, announcing "-Werror will be enabled at X time in the future" would hardly have resulted in much more fixing). My position remains as in <https://sourceware.org/ml/libc-alpha/2014-11/msg00693.html>: I think we should be perfectly willing to file a bug in Bugzilla and then add a suppression referencing the bug, if there's something tricky about determining if there's a real glibc bug there or a better way to address the warning - that means the build stays working on more platforms (in the presence of -Werror). I see no reason why we should assume a possible problem in existing code shown up by a warning (from new GCC or on another architecture) is more serious than other bugs. -Werror is first about stopping new issues getting in accidentally, rather than forcing certain existing issues (those that cause warnings) to be higher priority than other existing issues. (If the warning that it's hard to work out the correct fix for arises from a recent change to glibc, we should be rather more willing to revert that change until the fix has been worked out, although that's not necessarily the right thing to do in all cases.)
On Tue, Dec 16, 2014 at 12:36:38PM -0800, Roland McGrath wrote: > Use '\0', not '\000'. > > IIRC we have a general policy about having a benchtests case and citing > numbers on that. That is not policy as benchtest results on string function are still meaningless. Here I could simply improve benchtest score by inlining strnlen and memcpy implementations, for example with #define memcpy memcpy2 #include <string/memcpy.c> As it avoids call overhead. However it would be big mistake to do that "optimization" as strncat is cold function while memcpy is likely in memory it degrade performance due to additonal icache misses.
On 12/16/2014 07:24 PM, Joseph Myers wrote: > On Tue, 16 Dec 2014, David Miller wrote: > >> The fact that we are talking about complete rewrites for what should >> be a one line warning fix shows that we enabled -Werror prematurely. > > No, it simply shows people getting side-tracked and confusing the > questions of how to fix the build for an existing warning and how to clean > up the issue it shows up. I agree with Joseph. Cheers, Carlos.
On 17-12-2014 14:32, Ondřej Bílka wrote: > On Tue, Dec 16, 2014 at 12:36:38PM -0800, Roland McGrath wrote: >> Use '\0', not '\000'. >> >> IIRC we have a general policy about having a benchtests case and citing >> numbers on that. > That is not policy as benchtest results on string function are still > meaningless. Here I could simply improve benchtest score by inlining > strnlen and memcpy implementations, for example with > > #define memcpy memcpy2 > #include <string/memcpy.c> > > As it avoids call overhead. However it would be big mistake to do that > "optimization" as strncat is cold function while memcpy is likely in > memory it degrade performance due to additonal icache misses. > This is a fair criticize, but benchtests from string functions are far for 'meanigless'. On powerpc side, for instance, I noted unaligned cases that current code for strcpy was outperforming. It leads to check more result using different workloads and profiler and code a better strategy. Which resulted in a better implementation in the end (patch just posted). Better would be to characterize a better string benchtest and provide patches.
On Wed, Dec 17, 2014 at 04:31:25PM -0200, Adhemerval Zanella wrote: > On 17-12-2014 14:32, Ondřej Bílka wrote: > > On Tue, Dec 16, 2014 at 12:36:38PM -0800, Roland McGrath wrote: > >> Use '\0', not '\000'. > >> > >> IIRC we have a general policy about having a benchtests case and citing > >> numbers on that. > > That is not policy as benchtest results on string function are still > > meaningless. Here I could simply improve benchtest score by inlining > > strnlen and memcpy implementations, for example with > > > > #define memcpy memcpy2 > > #include <string/memcpy.c> > > > > As it avoids call overhead. However it would be big mistake to do that > > "optimization" as strncat is cold function while memcpy is likely in > > memory it degrade performance due to additonal icache misses. > > > This is a fair criticize, but benchtests from string functions are far for > 'meanigless'. On powerpc side, for instance, I noted unaligned cases > that current code for strcpy was outperforming. It leads to check more > result using different workloads and profiler and code a better strategy. > Which resulted in a better implementation in the end (patch just posted). > Which is quite dangerous, if benchmark is unreliable you cannot be sure if that improvement is real or will turn into regression. That could now easily be case as function in benchtest predicts all branches perfectly which makes path fast in benchtest does not happen in reality. I will post several changes and we will see truth (I did not read patch so I cannot say what will happen.)
On 18-12-2014 17:59, Ondřej Bílka wrote: > On Wed, Dec 17, 2014 at 04:31:25PM -0200, Adhemerval Zanella wrote: >> On 17-12-2014 14:32, Ondřej Bílka wrote: >>> On Tue, Dec 16, 2014 at 12:36:38PM -0800, Roland McGrath wrote: >>>> Use '\0', not '\000'. >>>> >>>> IIRC we have a general policy about having a benchtests case and citing >>>> numbers on that. >>> That is not policy as benchtest results on string function are still >>> meaningless. Here I could simply improve benchtest score by inlining >>> strnlen and memcpy implementations, for example with >>> >>> #define memcpy memcpy2 >>> #include <string/memcpy.c> >>> >>> As it avoids call overhead. However it would be big mistake to do that >>> "optimization" as strncat is cold function while memcpy is likely in >>> memory it degrade performance due to additonal icache misses. >>> >> This is a fair criticize, but benchtests from string functions are far for >> 'meanigless'. On powerpc side, for instance, I noted unaligned cases >> that current code for strcpy was outperforming. It leads to check more >> result using different workloads and profiler and code a better strategy. >> Which resulted in a better implementation in the end (patch just posted). >> > Which is quite dangerous, if benchmark is unreliable you cannot be sure > if that improvement is real or will turn into regression. That could now > easily be case as function in benchtest predicts all branches perfectly > which makes path fast in benchtest does not happen in reality. I will post > several changes and we will see truth (I did not read patch so I cannot say > what will happen.) > And that is *exactly* why we should push for more patches and a better coverage in benchmarks. I do see your criticizes as positive, what I do not see is just bashing someone that use the benchmarks it without providing anything better instead. And do not get me wrong here, but I checked sometime ago your personal page and your benchmark work and I lost interested because it was *way* confusing and without much organization. And that is exactly what I am trying to avoid here: comments in separate threads with meaningful comments, but lost without proper organization. For instance, I noticed you used x86_64 specific instructions to flush the cache and other things to try mimic a real world usage. I think is a good idea try to came up with a benchmark that may use some arch-specific, but we will need some discussion on how to characterize it, how to code for multiple archs, etc.
diff --git a/string/strncat.c b/string/strncat.c index 6d29114..e130a1f 100644 --- a/string/strncat.c +++ b/string/strncat.c @@ -29,52 +29,15 @@ char * STRNCAT (char *s1, const char *s2, size_t n) { - char c; char *s = s1; /* Find the end of S1. */ s1 += strlen (s1); - /* Make S1 point before next character, so we can increment - it while memory is read (wins on pipelined cpus). */ - s1 -= 1; + size_t ss = __strnlen (s2, n); - if (n >= 4) - { - size_t n4 = n >> 2; - do - { - c = *s2++; - *++s1 = c; - if (c == '\0') - return s; - c = *s2++; - *++s1 = c; - if (c == '\0') - return s; - c = *s2++; - *++s1 = c; - if (c == '\0') - return s; - c = *s2++; - *++s1 = c; - if (c == '\0') - return s; - } while (--n4 > 0); - n &= 3; - } - - while (n > 0) - { - c = *s2++; - *++s1 = c; - if (c == '\0') - return s; - n--; - } - - if (c != '\0') - *++s1 = '\0'; + memcpy (s1, s2, ss); + s1[ss] = '\000'; return s; }