Message ID | CAGQ9bdyxCW-_3rLy6uLg4Vc2FPx+gUL7PChaXA4i6aKmnjGVZg@mail.gmail.com |
---|---|
State | New |
Headers | show |
Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes: > Yep. I've realized that I've interpreted the test results incorrectly > (missed the failing tests). > The following patch passes tests. How does it affect code generation? Andreas.
Are you asking about the results from my particular compiler (GCC 4.8.2) or in theoretical sense? With 4.8.2 in both cases the calls to hack_digit are not inlined (which I think is correct). W/o the change the calls look like this: 4b362: 4c 8b 95 30 ff ff ff mov -0xd0(%rbp),%r10 4b369: 89 95 38 ff ff ff mov %edx,-0xc8(%rbp) 4b36f: e8 1c f0 ff ff callq 4a390 <hack_digit.13608> W/ this change the call takes more instructions to execute: 4b1e4: 4c 89 ad 50 ff ff ff mov %r13,-0xb0(%rbp) 4b1eb: 4c 89 bd 00 ff ff ff mov %r15,-0x100(%rbp) 4b1f2: 4c 8b a5 40 ff ff ff mov -0xc0(%rbp),%r12 4b1f9: 4c 8b ad f8 fe ff ff mov -0x108(%rbp),%r13 4b200: 31 db xor %ebx,%ebx 4b202: 4c 8b b5 70 ff ff ff mov -0x90(%rbp),%r14 4b209: 4c 8b bd 48 ff ff ff mov -0xb8(%rbp),%r15 4b210: 48 89 85 58 ff ff ff mov %rax,-0xa8(%rbp) 4b217: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 4b21e: 00 00 4b220: 4c 8b 8d 78 ff ff ff mov -0x88(%rbp),%r9 4b227: 4c 8b 85 58 ff ff ff mov -0xa8(%rbp),%r8 4b22e: 4c 89 fa mov %r15,%rdx 4b231: 48 8b 4d 80 mov -0x80(%rbp),%rcx 4b235: 8b b5 68 ff ff ff mov -0x98(%rbp),%esi 4b23b: 8b bd 60 ff ff ff mov -0xa0(%rbp),%edi 4b241: 4c 89 74 24 08 mov %r14,0x8(%rsp) 4b246: 4c 89 24 24 mov %r12,(%rsp) 4b24a: e8 41 f1 ff ff callq 4a390 <hack_digit> It doesn't mean the new code is worse or better. There is no miracle in the nested function call, you still need to pass all the parameters and this is just done with a different calling convention. --kcc On Tue, Sep 16, 2014 at 12:44 AM, Andreas Schwab <schwab@linux-m68k.org> wrote: > Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes: > >> Yep. I've realized that I've interpreted the test results incorrectly >> (missed the failing tests). >> The following patch passes tests. > > How does it affect code generation? > > Andreas. > > -- > Andreas Schwab, schwab@linux-m68k.org > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 > "And now for something completely different."
Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes:
> It doesn't mean the new code is worse or better.
That's something we should find out.
Andreas.
Suggestions on how to do that? On Tue, Sep 16, 2014 at 10:31 AM, Andreas Schwab <schwab@linux-m68k.org> wrote: > Konstantin Serebryany <konstantin.s.serebryany@gmail.com> writes: > >> It doesn't mean the new code is worse or better. > > That's something we should find out. > > Andreas. > > -- > Andreas Schwab, schwab@linux-m68k.org > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 > "And now for something completely different."
> Are you asking about the results from my particular compiler (GCC > 4.8.2) or in theoretical sense? When we're concerned with generated code quality, in general it is sufficient to demonstrate the effects just with the most-recent or a pretty-recent GCC version. 4.8.2 is a fine reference version today. > It doesn't mean the new code is worse or better. What you showed does indeed mean that the new code is worse: more code to do the same thing. You did not show the code of hack_digit itself. It's possible that things improved inside the function such that the total effect of your change is in fact an improvement or a wash, but you have to actually analyze all the code and make a real judgment to say one or the other. > There is no miracle in the nested function call, > you still need to pass all the parameters and this is just done with a > different calling convention. This is not really true, or at least presented in a rather misleading way. A call to a nested function does not pass "all the parameters" at all. Semantically, you could say that it "passes" them all by reference. But what it actually does is pass a single pointer parameter (the static chain) that is used inside the nested function for indirect access to the containing function's local variables. So the closest approximation of what the existing code is doing is to put all the referenced parent locals together into a local struct and pass a pointer to that. In this particular case, that might be worth doing or it might be just as good (or better) from a source maintenance perspective and/or a generated code quality perspective to pass individual variables by reference as your last patch does. The onus is on you to propose a particular change, demonstrate with thorough analysis that it improves or at least does not harm generated code quality, and make the aesthetic case that improves or does not substantially harm source maintenance.
On Mon, Sep 22, 2014 at 2:43 PM, Roland McGrath <roland@hack.frob.com> wrote: >> Are you asking about the results from my particular compiler (GCC >> 4.8.2) or in theoretical sense? > > When we're concerned with generated code quality, in general it is > sufficient to demonstrate the effects just with the most-recent or a > pretty-recent GCC version. 4.8.2 is a fine reference version today. > >> It doesn't mean the new code is worse or better. > > What you showed does indeed mean that the new code is worse: more code to > do the same thing. You did not show the code of hack_digit itself. hack_digit becomes longer too due to longer function prologue epilogue: Was: 000000000004a390 <hack_digit.13608>: 4a390: 55 push %rbp 4a391: 53 push %rbx 4a392: 4c 89 d3 mov %r10,%rbx 4a395: 48 83 ec 08 sub $0x8,%rsp 4a399: 41 8b 42 30 mov 0x30(%r10),%eax 4a39d: 85 c0 test %eax,%eax Now: 000000000004a390 <hack_digit>: 4a390: 41 55 push %r13 4a392: 41 54 push %r12 4a394: 55 push %rbp 4a395: 4c 89 c5 mov %r8,%rbp 4a398: 4d 89 c8 mov %r9,%r8 4a39b: 53 push %rbx 4a39c: 48 89 cb mov %rcx,%rbx 4a39f: 48 83 ec 08 sub $0x8,%rsp 4a3a3: 85 ff test %edi,%edi (similar for multiple epilogues) > It's > possible that things improved inside the function such that the total > effect of your change is in fact an improvement or a wash, but you have to > actually analyze all the code and make a real judgment to say one or the > other. > >> There is no miracle in the nested function call, >> you still need to pass all the parameters and this is just done with a >> different calling convention. > > This is not really true, or at least presented in a rather misleading way. > A call to a nested function does not pass "all the parameters" at all. > Semantically, you could say that it "passes" them all by reference. But > what it actually does is pass a single pointer parameter (the static chain) > that is used inside the nested function for indirect access to the > containing function's local variables. So the closest approximation of > what the existing code is doing is to put all the referenced parent locals > together into a local struct and pass a pointer to that. I considered doing such a patch but it turned out a huge textual change that will make the code much less readable. Still, let me do it and send it here anyway, unless you tell me no to. --kcc > > In this particular case, that might be worth doing or it might be just as > good (or better) from a source maintenance perspective and/or a generated > code quality perspective to pass individual variables by reference as your > last patch does. The onus is on you to propose a particular change, > demonstrate with thorough analysis that it improves or at least does not > harm generated code quality, and make the aesthetic case that improves or > does not substantially harm source maintenance.
> hack_digit becomes longer too due to longer function prologue epilogue: Clearly that cannot be the only difference. Was the function's code itself actually all nearly identical, modulo trivial differences like different register allocation choices? Each access to one of the parent's locals surely looks different, and how different that code looks is probably where the most important differences are. > I considered doing such a patch but it turned out a huge textual > change that will make the code much less readable. > Still, let me do it and send it here anyway, unless you tell me no to. Of course readability is very subjective, so there really is no substitute for each interested person just seeing how things look and giving their opinion. The most trivial mechanical change might harm readability in ways that can be improved with a little thought.
On 09/22/2014 02:43 PM, Roland McGrath wrote: > So the closest approximation of > what the existing code is doing is to put all the referenced parent locals > together into a local struct and pass a pointer to that. That description is no longer the closest approximation, that is *exactly* what happens. See the block comment at the start of gcc/tree-nested.c which explains how insane the gcc 2.x scheme really was. r~
diff --git a/stdio-common/printf_fp.c b/stdio-common/printf_fp.c index 9cd4b4b..679cb70 100644 --- a/stdio-common/printf_fp.c +++ b/stdio-common/printf_fp.c @@ -148,6 +148,49 @@ static wchar_t *group_number (wchar_t *buf, wchar_t *bufend, wchar_t thousands_sep, int ngroups) internal_function; +static wchar_t hack_digit (int expsign, int type, int *exponent, + mp_limb_t *frac, mp_size_t *fracsize, + mp_limb_t *scale, mp_size_t scalesize, + mp_limb_t *tmp) +{ + mp_limb_t hi; + + if (expsign != 0 && type == 'f' && (*exponent)-- > 0) + hi = 0; + else if (scalesize == 0) + { + hi = frac[*fracsize - 1]; + frac[*fracsize - 1] = __mpn_mul_1 (frac, frac, *fracsize - 1, 10); + } + else + { + if (*fracsize < scalesize) + hi = 0; + else + { + hi = mpn_divmod (tmp, frac, *fracsize, scale, scalesize); + tmp[*fracsize - scalesize] = hi; + hi = tmp[0]; + + *fracsize = scalesize; + while (*fracsize != 0 && frac[*fracsize - 1] == 0) + --(*fracsize); + if (*fracsize == 0) + { + /* We're not prepared for an mpn variable with zero + limbs. */ + *fracsize = 1; + return L'0' + hi; + } + } + + mp_limb_t _cy = __mpn_mul_1 (frac, frac, *fracsize, 10); + if (_cy != 0) + frac[(*fracsize)++] = _cy; + } + + return L'0' + hi; +} int ___printf_fp (FILE *fp, @@ -213,50 +256,6 @@ ___printf_fp (FILE *fp, /* Flag whether wbuffer is malloc'ed or not. */ int buffer_malloced = 0; - auto wchar_t hack_digit (void); - - wchar_t hack_digit (void) - { - mp_limb_t hi; - - if (expsign != 0 && type == 'f' && exponent-- > 0) - hi = 0; - else if (scalesize == 0) - { - hi = frac[fracsize - 1]; - frac[fracsize - 1] = __mpn_mul_1 (frac, frac, fracsize - 1, 10); - } - else - { - if (fracsize < scalesize) - hi = 0; - else - { - hi = mpn_divmod (tmp, frac, fracsize, scale, scalesize); - tmp[fracsize - scalesize] = hi; - hi = tmp[0]; - - fracsize = scalesize; - while (fracsize != 0 && frac[fracsize - 1] == 0) - --fracsize; - if (fracsize == 0) - { - /* We're not prepared for an mpn variable with zero - limbs. */ - fracsize = 1; - return L'0' + hi; - } - } - - mp_limb_t _cy = __mpn_mul_1 (frac, frac, fracsize, 10); - if (_cy != 0) - frac[fracsize++] = _cy; - } - - return L'0' + hi; - } - - /* Figure out the decimal point character. */ if (info->extra == 0) { @@ -914,7 +913,8 @@ ___printf_fp (FILE *fp, while (intdig_no < intdig_max) { ++intdig_no; - *wcp++ = hack_digit (); + *wcp++ = hack_digit (expsign, type, &exponent, frac, &fracsize, + scale, scalesize, tmp); } significant = 1; if (info->alt @@ -938,7 +938,8 @@ ___printf_fp (FILE *fp, || (fracdig_no < fracdig_max && (fracsize > 1 || frac[0] != 0))) { ++fracdig_no; - *wcp = hack_digit (); + *wcp = hack_digit (expsign, type, &exponent, frac, &fracsize, + scale, scalesize, tmp); if (*wcp++ != L'0') significant = 1; else if (significant == 0) @@ -951,7 +952,8 @@ ___printf_fp (FILE *fp, /* Do rounding. */ wchar_t last_digit = wcp[-1] != decimalwc ? wcp[-1] : wcp[-2]; - wchar_t next_digit = hack_digit (); + wchar_t next_digit = hack_digit (expsign, type, &exponent, frac, &fracsize, + scale, scalesize, tmp); bool more_bits; if (next_digit != L'0' && next_digit != L'5') more_bits = true;