Message ID | CADnVucB50UQqXTqn+ODx0QQ-5rFY15tM_iSart4nWXbbrW2zGQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
ping? On 22 May 2014 11:08, Charles Baylis <charles.baylis@linaro.org> wrote: > On 1 May 2014 16:41, Richard Earnshaw <rearnsha@arm.com> wrote: >> I think really, you've got three independent changes here: > > Version 2 of this patch series is now a 9 patch series which addresses > most of the following. Exceptions discussed below. > >> 1) Optimize the prologue/epilogue sequences when ldrd is available. >> 2) Replace the call to __gnu_ldivmod_helper with __udivmoddi4 > > I assume you mean __gnu_uldivmod_helper here, as __gnu_ldivmod_helper > performs signed division and can't be directly replaced with the > unsigned division performed by __udivmoddi4. > >> 3) Optimize the code to __aeabi_ldivmod. > > Converting to call __udivmoddi4, fixing up signedness of operands and > results and optimisation are all one change. > >> Ideally, therefore, this is a three patch series, but it's then missing >> a few bits. >> >> 4) Step 2 can also be trivially applied to bpabi-v6m.S as well, since >> it's a direct swap of one function for another (unless I've misread the >> changes, I think the ABI of the two helper functions are the same). > > For __aeabi_uldivmod this is true. For __aeabi_ldivmod this is not > trivial as the signedness fix-ups must be written. > >> 5) Step 4 then makes __gnu_ldivmod_helper in bpabi.c a dead function >> which can be deleted. This is good because currently pulling in either >> 64-bit division function causes both these helper functions to be pulled >> in and thus the whole of the 64-bit div-mod code for both signed and >> unsigned values. That's particularly unfortunate for ARMv6m class >> devices as that's potentially a lot of redundant code. > > Similarly, __gnu_uldivmod_helper not __gnu_ldivmod_helper. > > I've included two patches which do the trivial steps for the unsigned case. > >> >> Finally, I know this was the original code, but the complete lack of >> comments in this code made reviewing even the trivial parts a complete >> nightmare -- it took me half an hour before I remembered that >> __udivmoddi4 took three parameters, the third of which was on the stack: >> thus the messing around with sp/ip in the prologue wasn't just trivial >> padding but a necessary part of the function. Please could you add, at >> least some short comments clarifying the register disposition on input >> and what that prologue code is up to... > > Done. > >> Finally, how was this code tested? > > It has been built and "make check" has been run with no regressions on: > arm-unknown-linux-gnueabihf --with-mode=thumb --with-arch=armv7-a > arm-unknown-linux-gnueabihf --with-mode=arm --with-arch=armv7-a > arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv5te > arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv4t > > I have also run a simple test harness which checks the result of > several 64 bit division operations where gcc has been built with the > above configurations. > > I am not currently set up with a way to test v6M, so those parts aren't tested. > >> Anyway, some additional comments below: >> >> Don't repeat the function name for multiple tweaks to the same function; >> as mentioned above, if these are really separate changes they should be >> in separate submissions. Mixing unrelated changes just makes the >> reviewing step that much harder. > > Done. > > >>> + strd ip,lr, [sp, #-16]! >> >> Space after comma. > > Done > >> Also, since you've essentially rewritten the entire function, can you >> please also reformat them to follow the coding style of the rest of the >> file: namely "<tab>OP<tab>operands". > > Done > >>> #else >>> + sub sp, sp, #8 >>> do_push {sp, lr} >>> #endif >> >> Please add a comment that the value at *sp is the address of the the >> slot for the remainder. > > Done >>> +#if defined(__thumb2__) && CAN_USE_LDRD >>> + sub ip, sp, #8 >>> + strd ip,lr, [sp, #-16]! >> >> Space after comma. > > Done > >>> #else >>> + sub sp, sp, #8 >>> do_push {sp, lr} >>> #endif >>> + cmp xxh, #0 >>> + blt 1f >>> + cmp yyh, #0 >>> + blt 2f >>> + >>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >> >> The CFI push should really precede your conditional tests, it relates to >> the do_push expression. > > Done. > >>> + bl SYM(__udivmoddi4) __PLT__ >>> + ldr lr, [sp, #4] >>> +#if CAN_USE_LDRD >>> + ldrd r2, r3, [sp, #8] >>> + add sp, sp, #16 >>> +#else >>> + add sp, sp, #8 >>> + do_pop {r2, r3} >>> +#endif >> >> You're missing a CFI pop, which is needed when the values on the stack >> go out of scope. > > The existing code doesn't do this. Since there are multiple exit > points from the optimised function the existing cfi_* macros aren't > sufficient (there is no cfi_save_state/cfi_restore_state), so I have > included a patch which uses the gas .cfi_* directives. This may be > interesting on non-DWARF or non-ELF platforms, if any are still > supported . > >>> + RET >>> +1: /* xxh:xxl is negative */ >>> + rsbs xxl, xxl, #0 >> >> We're using unified syntax, so NEGS is preferable. > > Done > >>> + sbc xxh, xxh, xxh, lsl #1 >> >> Worthy of a comment, Thumb2 has no RSC instruction, so use X - 2X. > > Done > >>> + cmp yyh, #0 >>> + blt 3f >>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >> >> This CFI push looks wrong. You've already pushed things earlier. On >> the other hand, you should save the state before the CFI pop above, so >> that you can restore the state again for the next (ie this) block of code. > > Done (see above) > >>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >>> + bl SYM(__udivmoddi4) __PLT__ >>> + ldr lr, [sp, #4] >>> +#if CAN_USE_LDRD >>> + ldrd r2, r3, [sp, #8] >>> + add sp, sp, #16 >>> +#else >>> + add sp, sp, #8 >>> + do_pop {r2, r3} >>> +#endif >>> + rsbs yyl, yyl, #0 >>> + sbc yyh, yyh, yyh, lsl #1 >>> + RET >>> >>> #endif /* L_aeabi_ldivmod */ >>> >> >> You use the LDRD vs do_pop sequence identically several times. To avoid >> a lot of ifdefs, it might be worth considering a macro for this idiom to >> reduce the overall amount of conditionalized code. > > Done. > > > The updated patch series is attached. Hopefully, patches 1 through 6 > are now ready. Patches 7 through 9 can be dropped if necessary. > > > > > 0001-Whitespace.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace. > (__aeabi_ldivmod): Fix whitespace. > > > > 0002-Add-comments.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.S (__aeabi_uldivmod, __aeabi_ldivmod): Add comment > describing register usage on function entry and exit. > > > > 0003-Optimise-__aeabi_uldivmod-stack-manipulation.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer > manipulation. > > > > 0004-Optimise-__aeabi_uldivmod.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call > to __udivmoddi4. > > > > 0005-Optimise-__aeabi_ldivmod-stack-manipulation.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation. > > > > 0006-Optimise-__aeabi_ldivmod.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using > __udivmoddi4, and fixups for negative operands. > > > > 0007-Fix-cfi-annotations.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.S (__aeabi_ldivmod, __aeabi_uldivmod, > push_for_divide, pop_for_divide): Use .cfi_* directives for DWARF > annotations. Fix DWARF information. > > > > 0008-Use-__udivmoddi4-for-v6M-aeabi_uldivmod.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using > __udivmoddi4. > > > > 0009-Remove-__gnu_uldivmod_helper.patch > > 2014-05-22 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove.
On 11/06/14 10:30, Charles Baylis wrote: > ping? > Sorry, can you resend this as a series of mails, with one patch per mail. R. > On 22 May 2014 11:08, Charles Baylis <charles.baylis@linaro.org> wrote: >> On 1 May 2014 16:41, Richard Earnshaw <rearnsha@arm.com> wrote: >>> I think really, you've got three independent changes here: >> >> Version 2 of this patch series is now a 9 patch series which addresses >> most of the following. Exceptions discussed below. >> >>> 1) Optimize the prologue/epilogue sequences when ldrd is available. >>> 2) Replace the call to __gnu_ldivmod_helper with __udivmoddi4 >> >> I assume you mean __gnu_uldivmod_helper here, as __gnu_ldivmod_helper >> performs signed division and can't be directly replaced with the >> unsigned division performed by __udivmoddi4. >> >>> 3) Optimize the code to __aeabi_ldivmod. >> >> Converting to call __udivmoddi4, fixing up signedness of operands and >> results and optimisation are all one change. >> >>> Ideally, therefore, this is a three patch series, but it's then missing >>> a few bits. >>> >>> 4) Step 2 can also be trivially applied to bpabi-v6m.S as well, since >>> it's a direct swap of one function for another (unless I've misread the >>> changes, I think the ABI of the two helper functions are the same). >> >> For __aeabi_uldivmod this is true. For __aeabi_ldivmod this is not >> trivial as the signedness fix-ups must be written. >> >>> 5) Step 4 then makes __gnu_ldivmod_helper in bpabi.c a dead function >>> which can be deleted. This is good because currently pulling in either >>> 64-bit division function causes both these helper functions to be pulled >>> in and thus the whole of the 64-bit div-mod code for both signed and >>> unsigned values. That's particularly unfortunate for ARMv6m class >>> devices as that's potentially a lot of redundant code. >> >> Similarly, __gnu_uldivmod_helper not __gnu_ldivmod_helper. >> >> I've included two patches which do the trivial steps for the unsigned case. >> >>> >>> Finally, I know this was the original code, but the complete lack of >>> comments in this code made reviewing even the trivial parts a complete >>> nightmare -- it took me half an hour before I remembered that >>> __udivmoddi4 took three parameters, the third of which was on the stack: >>> thus the messing around with sp/ip in the prologue wasn't just trivial >>> padding but a necessary part of the function. Please could you add, at >>> least some short comments clarifying the register disposition on input >>> and what that prologue code is up to... >> >> Done. >> >>> Finally, how was this code tested? >> >> It has been built and "make check" has been run with no regressions on: >> arm-unknown-linux-gnueabihf --with-mode=thumb --with-arch=armv7-a >> arm-unknown-linux-gnueabihf --with-mode=arm --with-arch=armv7-a >> arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv5te >> arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv4t >> >> I have also run a simple test harness which checks the result of >> several 64 bit division operations where gcc has been built with the >> above configurations. >> >> I am not currently set up with a way to test v6M, so those parts aren't tested. >> >>> Anyway, some additional comments below: >>> >>> Don't repeat the function name for multiple tweaks to the same function; >>> as mentioned above, if these are really separate changes they should be >>> in separate submissions. Mixing unrelated changes just makes the >>> reviewing step that much harder. >> >> Done. >> >> >>>> + strd ip,lr, [sp, #-16]! >>> >>> Space after comma. >> >> Done >> >>> Also, since you've essentially rewritten the entire function, can you >>> please also reformat them to follow the coding style of the rest of the >>> file: namely "<tab>OP<tab>operands". >> >> Done >> >>>> #else >>>> + sub sp, sp, #8 >>>> do_push {sp, lr} >>>> #endif >>> >>> Please add a comment that the value at *sp is the address of the the >>> slot for the remainder. >> >> Done >>>> +#if defined(__thumb2__) && CAN_USE_LDRD >>>> + sub ip, sp, #8 >>>> + strd ip,lr, [sp, #-16]! >>> >>> Space after comma. >> >> Done >> >>>> #else >>>> + sub sp, sp, #8 >>>> do_push {sp, lr} >>>> #endif >>>> + cmp xxh, #0 >>>> + blt 1f >>>> + cmp yyh, #0 >>>> + blt 2f >>>> + >>>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >>> >>> The CFI push should really precede your conditional tests, it relates to >>> the do_push expression. >> >> Done. >> >>>> + bl SYM(__udivmoddi4) __PLT__ >>>> + ldr lr, [sp, #4] >>>> +#if CAN_USE_LDRD >>>> + ldrd r2, r3, [sp, #8] >>>> + add sp, sp, #16 >>>> +#else >>>> + add sp, sp, #8 >>>> + do_pop {r2, r3} >>>> +#endif >>> >>> You're missing a CFI pop, which is needed when the values on the stack >>> go out of scope. >> >> The existing code doesn't do this. Since there are multiple exit >> points from the optimised function the existing cfi_* macros aren't >> sufficient (there is no cfi_save_state/cfi_restore_state), so I have >> included a patch which uses the gas .cfi_* directives. This may be >> interesting on non-DWARF or non-ELF platforms, if any are still >> supported . >> >>>> + RET >>>> +1: /* xxh:xxl is negative */ >>>> + rsbs xxl, xxl, #0 >>> >>> We're using unified syntax, so NEGS is preferable. >> >> Done >> >>>> + sbc xxh, xxh, xxh, lsl #1 >>> >>> Worthy of a comment, Thumb2 has no RSC instruction, so use X - 2X. >> >> Done >> >>>> + cmp yyh, #0 >>>> + blt 3f >>>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >>> >>> This CFI push looks wrong. You've already pushed things earlier. On >>> the other hand, you should save the state before the CFI pop above, so >>> that you can restore the state again for the next (ie this) block of code. >> >> Done (see above) >> >>>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 >>>> + bl SYM(__udivmoddi4) __PLT__ >>>> + ldr lr, [sp, #4] >>>> +#if CAN_USE_LDRD >>>> + ldrd r2, r3, [sp, #8] >>>> + add sp, sp, #16 >>>> +#else >>>> + add sp, sp, #8 >>>> + do_pop {r2, r3} >>>> +#endif >>>> + rsbs yyl, yyl, #0 >>>> + sbc yyh, yyh, yyh, lsl #1 >>>> + RET >>>> >>>> #endif /* L_aeabi_ldivmod */ >>>> >>> >>> You use the LDRD vs do_pop sequence identically several times. To avoid >>> a lot of ifdefs, it might be worth considering a macro for this idiom to >>> reduce the overall amount of conditionalized code. >> >> Done. >> >> >> The updated patch series is attached. Hopefully, patches 1 through 6 >> are now ready. Patches 7 through 9 can be dropped if necessary. >> >> >> >> >> 0001-Whitespace.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace. >> (__aeabi_ldivmod): Fix whitespace. >> >> >> >> 0002-Add-comments.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.S (__aeabi_uldivmod, __aeabi_ldivmod): Add comment >> describing register usage on function entry and exit. >> >> >> >> 0003-Optimise-__aeabi_uldivmod-stack-manipulation.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer >> manipulation. >> >> >> >> 0004-Optimise-__aeabi_uldivmod.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call >> to __udivmoddi4. >> >> >> >> 0005-Optimise-__aeabi_ldivmod-stack-manipulation.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation. >> >> >> >> 0006-Optimise-__aeabi_ldivmod.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using >> __udivmoddi4, and fixups for negative operands. >> >> >> >> 0007-Fix-cfi-annotations.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.S (__aeabi_ldivmod, __aeabi_uldivmod, >> push_for_divide, pop_for_divide): Use .cfi_* directives for DWARF >> annotations. Fix DWARF information. >> >> >> >> 0008-Use-__udivmoddi4-for-v6M-aeabi_uldivmod.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using >> __udivmoddi4. >> >> >> >> 0009-Remove-__gnu_uldivmod_helper.patch >> >> 2014-05-22 Charles Baylis <charles.baylis@linaro.org> >> >> * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove. >
From 540a4016a32953cb37dbe2f7ea1ec3e1cf0bfdcf Mon Sep 17 00:00:00 2001 From: Charles Baylis <charles.baylis@linaro.org> Date: Mon, 12 May 2014 18:58:04 +0100 Subject: [PATCH 9/9] Remove __gnu_uldivmod_helper 2014-05-12 Charles Baylis <charles.baylis@linaro.org> * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove. --- libgcc/config/arm/bpabi.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c index 7b155cc..e90d044 100644 --- a/libgcc/config/arm/bpabi.c +++ b/libgcc/config/arm/bpabi.c @@ -26,9 +26,6 @@ extern long long __divdi3 (long long, long long); extern unsigned long long __udivdi3 (unsigned long long, unsigned long long); extern long long __gnu_ldivmod_helper (long long, long long, long long *); -extern unsigned long long __gnu_uldivmod_helper (unsigned long long, - unsigned long long, - unsigned long long *); long long @@ -43,14 +40,3 @@ __gnu_ldivmod_helper (long long a, return quotient; } -unsigned long long -__gnu_uldivmod_helper (unsigned long long a, - unsigned long long b, - unsigned long long *remainder) -{ - unsigned long long quotient; - - quotient = __udivdi3 (a, b); - *remainder = a - b * quotient; - return quotient; -} -- 1.9.1