Message ID | 000101cfb243$63a5b1b0$2af11510$@com |
---|---|
State | New |
Headers | show |
On 07-08-2014 10:27, Wilco Dijkstra wrote: > Hi, > > This patch improves strcat performance by using strlen and strcpy. Strlen has a fast C > implementation, so this improves performance even on targets which don't have an optimized strlen > and strcpy - it is 25% faster in bench-strcat. On targets which don't provide an optimized strcat > but which do have an optimized strlen and strcpy, performance gain is > 2x. > > OK for commit? > > ChangeLog: > 2014-08-07 Wilco Dijkstra <wdijkstr@arm.com> > > * string/strcat.c (strcat): Improve performance by using strlen/strcpy. > --- > string/strcat.c | 21 +-------------------- > 1 file changed, 1 insertion(+), 20 deletions(-) > > diff --git a/string/strcat.c b/string/strcat.c > > - do > - { > - c = *s2++; > - *++s1 = c; > - } > - while (c != '\0'); > - > + strcpy (dest + strlen (dest), src); Should it be __strcpy/__strlen ?
On Thu, 7 Aug 2014, Adhemerval Zanella wrote: > > + strcpy (dest + strlen (dest), src); > > Should it be __strcpy/__strlen ? As explained in recent discussion, there is no need for use of __* when calling functions in ISO C90 that haven't been removed in more recent standards (or more generally, when calling function A from function B if function A is in all the supported standards containing function B). You do need *_hidden_* for PLT avoidance, but include/string.h already has libc_hidden_builtin_proto calls for strcpy and strlen (and if any definition of those functions is missing the corresponding libc_hidden_builtin_def, there will be an obvious error linking glibc).
On 08/07/2014 03:27 PM, Wilco Dijkstra wrote: > This patch improves strcat performance by using strlen and strcpy. Strlen has a fast C > implementation, so this improves performance even on targets which don't have an optimized strlen > and strcpy - it is 25% faster in bench-strcat. On targets which don't provide an optimized strcat > but which do have an optimized strlen and strcpy, performance gain is > 2x. > > OK for commit? This is okay for master. Thanks.
On 08/07/2014 09:27 AM, Wilco Dijkstra wrote: > Hi, > > This patch improves strcat performance by using strlen and strcpy. Strlen has a fast C > implementation, so this improves performance even on targets which don't have an optimized strlen > and strcpy - it is 25% faster in bench-strcat. On targets which don't provide an optimized strcat > but which do have an optimized strlen and strcpy, performance gain is > 2x. What benchmarks did you use to test this performance gain? Did you use glibc's microbenchmark? What numbers did you get? c.
On Thu, Sep 11, 2014 at 03:42:31PM -0400, Carlos O'Donell wrote: > On 08/07/2014 09:27 AM, Wilco Dijkstra wrote: > > Hi, > > > > This patch improves strcat performance by using strlen and strcpy. Strlen has a fast C > > implementation, so this improves performance even on targets which don't have an optimized strlen > > and strcpy - it is 25% faster in bench-strcat. On targets which don't provide an optimized strcat > > but which do have an optimized strlen and strcpy, performance gain is > 2x. > > What benchmarks did you use to test this performance gain? > > Did you use glibc's microbenchmark? > > What numbers did you get? > Ah same patch that I send like year ago, your question is answered in quoted email. Also trying to benchmark C implementations is mostly meaningless, if you want good performance you need write at least assembly implementation of memcpy, strlen and strcpy. Optimizing one of these has bigger performance impact than rest of functions combined and if you do not care about that you do not have to care about this c implementation as well. Also in case of strcat its easy to see that no matter how well you optimize it you cannot get faster than strcpy (dest+strlen (dest), src); minus constant overhead of like one function call. Reason is that you could use strcat both as strcpy (char *x, char *y) { x[0] = 0; return strcat (x, y); } and strlen (char *x) { char y[] = "x"; return strcat2 (x, y); // where we in assembly replace each write instruction jump that calculates end of string. } Now if you got input where strcat is faster than strcpy+strlen pair it would give you faster strlen or strcpy by using formulas above.
> Carlos O'Donell wrote: > On 08/07/2014 09:27 AM, Wilco Dijkstra wrote: > > Hi, > > > > This patch improves strcat performance by using strlen and strcpy. Strlen has a fast C > > implementation, so this improves performance even on targets which don't have an optimized > strlen > > and strcpy - it is 25% faster in bench-strcat. On targets which don't provide an optimized > strcat > > but which do have an optimized strlen and strcpy, performance gain is > 2x. > > What benchmarks did you use to test this performance gain? > > Did you use glibc's microbenchmark? > > What numbers did you get? These results are for the GLIBC benchtests/bench-strcat.c - I increased the iterations significantly and profiled the results with a high tickrate to verify the timings. 65.74% 11343 bench-strcat bench-strcat [.] simple_strcat 24.90% 4307 bench-strcat libc-2.20.90.so [.] strcpy 5.20% 902 bench-strcat libc-2.20.90.so [.] strlen 1.22% 214 bench-strcat bench-strcat [.] do_test 1.08% 190 bench-strcat libc-2.20.90.so [.] strcat So strcat+strlen+strcpy=31.18% vs simple_strcat 65.74%, ie. 2.1x speedup. Wilco
> Ondřej Bílka wrote: > Also trying to benchmark C implementations is mostly meaningless, if you > want good performance you need write at least assembly implementation of memcpy, > strlen and strcpy. Optimizing one of these has bigger performance impact > than rest of functions combined and if you do not care about that you do > not have to care about this c implementation as well. The issue is that you're forced to write lots of highly optimized assembly code in order to get good string performance for any new architecture. This issue exists across GLIBC, for example the math functions used to be extremely inefficient due to a bad generic implementation. With a simple architecture independent patch I achieved 99% of the performance of the best optimized implementation. Rather than forcing all targets to add large amounts of highly optimized assembler, why not put a little more effort into GLIBC generic code to ensure it is efficient to start with? So the goal here is to provide good string performance using just C routines and ensure they benefit further from assembler implementations of strlen, memcpy, memset etc. > Also in case of strcat its easy to see that no matter how well you > optimize it you cannot get faster than strcpy (dest+strlen (dest), src); > minus constant overhead of like one function call. Exactly, there is no need for a specialized assembler variant for strcat. Wilco
diff --git a/string/strcat.c b/string/strcat.c index 2cbe8b3..983d115 100644 --- a/string/strcat.c +++ b/string/strcat.c @@ -23,26 +23,7 @@ char * strcat (char *dest, const char *src) { - char *s1 = dest; - const char *s2 = src; - char c; - - /* Find the end of the string. */ - do - c = *s1++; - while (c != '\0'); - - /* Make S1 point before the next character, so we can increment - it while memory is read (wins on pipelined cpus). */ - s1 -= 2; - - do - { - c = *s2++; - *++s1 = c; - } - while (c != '\0'); - + strcpy (dest + strlen (dest), src); return dest; } libc_hidden_builtin_def (strcat)