Message ID | 78bc7fef-20cd-4161-905c-6b93582974ec@gjlay.de |
---|---|
State | New |
Headers | show |
Series | [avr] PR87376: Disable -ftree-ter | expand |
On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: > > Hi Jeff, > > This is a patch to get correct code out of 64-bit > loads from address-space __memx. > > The AVR address-spaces may require that move insns issue > calls to library support functions, a fact that -ftree-ter > doesn't account for. tree-ssa-ter.cc then replaces an > expression across such a library call, resulting in wrong code. > > This patch disables that pass per default on avr, as there is no > more fine grained way to avoid malicious optimizations. > The pass can still be re-enabled by means of explicit -ftree-ter. > > Ok to apply? I think this requires more details on what goes wrong - I assume it's not stmt reordering that effectively happens but recursive expand_expr on SSA defs when those invoke libcalls? In that case this would point to a deeper issue. So - if the wrongness is already apparent in the RTL expansion pass dump can you quote the respective pieces and explain why? > As an alternative, the option could be disabled permanently in > avr.cc::avr_option_override(). > > Johann > > -- > > AVR: middle-end/87376 - Use -fno-tree-ter per default. > > Temporary expression replacement might replace expressions across > library calls, for example with move insn from address-space __memx > like in PR87376. -ftree-ter has no way where the backend could hook > in to avoid only problematic replacements, thus kick it out altogether. > > PR middle-end/87376 > gcc/ > * common/config/avr/avr-common.cc (avr_option_optimization_table) > <OPT_ftree_ter>: Set to 0. > gcc/testsuite/ > * gcc.target/avr/torture/pr87376-memx.c: New test.
...just noticed that disabling -ftree-ter would also fix PR53049 where it moves a volatile asm across a library call. Am 02.07.24 um 15:42 schrieb Georg-Johann Lay: > Hi Jeff, > > This is a patch to get correct code out of 64-bit > loads from address-space __memx. > > The AVR address-spaces may require that move insns issue > calls to library support functions, a fact that -ftree-ter > doesn't account for. tree-ssa-ter.cc then replaces an > expression across such a library call, resulting in wrong code. > > This patch disables that pass per default on avr, as there is no > more fine grained way to avoid malicious optimizations. > The pass can still be re-enabled by means of explicit -ftree-ter. > > Ok to apply? > > As an alternative, the option could be disabled permanently in > avr.cc::avr_option_override(). > > Johann > > -- > > AVR: middle-end/87376 - Use -fno-tree-ter per default. > > Temporary expression replacement might replace expressions across > library calls, for example with move insn from address-space __memx > like in PR87376. -ftree-ter has no way where the backend could hook > in to avoid only problematic replacements, thus kick it out altogether. > > PR middle-end/87376 > gcc/ > * common/config/avr/avr-common.cc (avr_option_optimization_table) > <OPT_ftree_ter>: Set to 0. > gcc/testsuite/ > * gcc.target/avr/torture/pr87376-memx.c: New test.
On Tue, Jul 2, 2024 at 3:52 PM Georg-Johann Lay <avr@gjlay.de> wrote: > > ...just noticed that disabling -ftree-ter would also fix PR53049 > where it moves a volatile asm across a library call. As explained in the PR this only avoids the issue by accident (and with the other comment I'm trying to make sure this is not another such case). You can always use -O0 (disable everything) to "fix" bugs, but that's of course not a fix. Disabling TER might hide a bug in TER but it might just do things differently and _not_ fix or hide a bug. Richard. > > Am 02.07.24 um 15:42 schrieb Georg-Johann Lay: > > Hi Jeff, > > > > This is a patch to get correct code out of 64-bit > > loads from address-space __memx. > > > > The AVR address-spaces may require that move insns issue > > calls to library support functions, a fact that -ftree-ter > > doesn't account for. tree-ssa-ter.cc then replaces an > > expression across such a library call, resulting in wrong code. > > > > This patch disables that pass per default on avr, as there is no > > more fine grained way to avoid malicious optimizations. > > The pass can still be re-enabled by means of explicit -ftree-ter. > > > > Ok to apply? > > > > As an alternative, the option could be disabled permanently in > > avr.cc::avr_option_override(). > > > > Johann > > > > -- > > > > AVR: middle-end/87376 - Use -fno-tree-ter per default. > > > > Temporary expression replacement might replace expressions across > > library calls, for example with move insn from address-space __memx > > like in PR87376. -ftree-ter has no way where the backend could hook > > in to avoid only problematic replacements, thus kick it out altogether. > > > > PR middle-end/87376 > > gcc/ > > * common/config/avr/avr-common.cc (avr_option_optimization_table) > > <OPT_ftree_ter>: Set to 0. > > gcc/testsuite/ > > * gcc.target/avr/torture/pr87376-memx.c: New test.
On 7/2/24 7:57 AM, Richard Biener wrote: > On Tue, Jul 2, 2024 at 3:52 PM Georg-Johann Lay <avr@gjlay.de> wrote: >> >> ...just noticed that disabling -ftree-ter would also fix PR53049 >> where it moves a volatile asm across a library call. > > As explained in the PR this only avoids the issue by accident (and > with the other comment > I'm trying to make sure this is not another such case). > > You can always use -O0 (disable everything) to "fix" bugs, but that's > of course not a fix. Disabling TER might hide a bug in TER but it > might just do things differently and _not_ fix or hide a bug. Agreed. It's most likely just papering over the problem. Based on what I quickly read in the PR, most likely a problem in the AVR backend. jeff
Am 02.07.24 um 15:48 schrieb Richard Biener: > On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: >> >> Hi Jeff, >> >> This is a patch to get correct code out of 64-bit >> loads from address-space __memx. >> >> The AVR address-spaces may require that move insns issue >> calls to library support functions, a fact that -ftree-ter >> doesn't account for. tree-ssa-ter.cc then replaces an >> expression across such a library call, resulting in wrong code. >> >> This patch disables that pass per default on avr, as there is no >> more fine grained way to avoid malicious optimizations. >> The pass can still be re-enabled by means of explicit -ftree-ter. >> >> Ok to apply? > > I think this requires more details on what goes wrong - I assume > it's not stmt reordering that effectively happens but recursive > expand_expr on SSA defs when those invoke libcalls? In that > case this would point to a deeper issue. The difference is that with TER, we get a hard reg in .expand for a movdi from 24-bit address-space __memx. Such moves require library calls, which in turn require specific hard registers. As avr backend has no movdi, the moddi gets expanded as 8 * movqi, and that does not work when the target registers are hard regs, as some of them are clobbered by the libcalls. Moreover, even with TER, the code is no more efficient than without it, so it's not clear what's the point in propagating hard regs into expander operands. Later passes like fwprop1 and combine can do that, too. Requiring libcalls in a mov insn is quite special indeed, and there is no way to tell that to TER. TER itself does not optimize code involving libcalls, so it knows they are fragile. > So - if the wrongness is already apparent in the RTL expansion > pass dump can you quote the respective pieces and explain why? It expand a 64-bit move from __memx address-space to registers R18...R25. This is broken into 8 QI moves to these regs, but the movqi requires a libcall in some situations, which pass their arguments in R21...R25. Hence the libcalls clobber some of the destination regs. It would already help when TER would not propagate hard-regs into expander operands. Johann >> As an alternative, the option could be disabled permanently in >> avr.cc::avr_option_override(). >> >> Johann >> >> -- >> >> AVR: middle-end/87376 - Use -fno-tree-ter per default. >> >> Temporary expression replacement might replace expressions across >> library calls, for example with move insn from address-space __memx >> like in PR87376. -ftree-ter has no way where the backend could hook >> in to avoid only problematic replacements, thus kick it out altogether. >> >> PR middle-end/87376 >> gcc/ >> * common/config/avr/avr-common.cc (avr_option_optimization_table) >> <OPT_ftree_ter>: Set to 0. >> gcc/testsuite/ >> * gcc.target/avr/torture/pr87376-memx.c: New test.
On 7/3/24 1:26 PM, Georg-Johann Lay wrote: > > > Am 02.07.24 um 15:48 schrieb Richard Biener: >> On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: >>> >>> Hi Jeff, >>> >>> This is a patch to get correct code out of 64-bit >>> loads from address-space __memx. >>> >>> The AVR address-spaces may require that move insns issue >>> calls to library support functions, a fact that -ftree-ter >>> doesn't account for. tree-ssa-ter.cc then replaces an >>> expression across such a library call, resulting in wrong code. >>> >>> This patch disables that pass per default on avr, as there is no >>> more fine grained way to avoid malicious optimizations. >>> The pass can still be re-enabled by means of explicit -ftree-ter. >>> >>> Ok to apply? >> >> I think this requires more details on what goes wrong - I assume >> it's not stmt reordering that effectively happens but recursive >> expand_expr on SSA defs when those invoke libcalls? In that >> case this would point to a deeper issue. > > The difference is that with TER, we get a hard reg in .expand > for a movdi from 24-bit address-space __memx. > > Such moves require library calls, which in turn require > specific hard registers. As avr backend has no movdi, the > moddi gets expanded as 8 * movqi, and that does not work > when the target registers are hard regs, as some of them > are clobbered by the libcalls. But this is something that's handled for other targets. I don't remember all the details, but there's generic code to handle this situation. Jeff
Am 03.07.24 um 21:39 schrieb Jeff Law: > > > On 7/3/24 1:26 PM, Georg-Johann Lay wrote: >> >> >> Am 02.07.24 um 15:48 schrieb Richard Biener: >>> On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: >>>> >>>> Hi Jeff, >>>> >>>> This is a patch to get correct code out of 64-bit >>>> loads from address-space __memx. >>>> >>>> The AVR address-spaces may require that move insns issue >>>> calls to library support functions, a fact that -ftree-ter >>>> doesn't account for. tree-ssa-ter.cc then replaces an >>>> expression across such a library call, resulting in wrong code. >>>> >>>> This patch disables that pass per default on avr, as there is no >>>> more fine grained way to avoid malicious optimizations. >>>> The pass can still be re-enabled by means of explicit -ftree-ter. >>>> >>>> Ok to apply? >>> >>> I think this requires more details on what goes wrong - I assume >>> it's not stmt reordering that effectively happens but recursive >>> expand_expr on SSA defs when those invoke libcalls? In that >>> case this would point to a deeper issue. >> >> The difference is that with TER, we get a hard reg in .expand >> for a movdi from 24-bit address-space __memx. >> >> Such moves require library calls, which in turn require >> specific hard registers. As avr backend has no movdi, the >> moddi gets expanded as 8 * movqi, and that does not work >> when the target registers are hard regs, as some of them >> are clobbered by the libcalls. > But this is something that's handled for other targets. I don't > remember all the details, but there's generic code to handle this > situation. > > Jeff A libcall in a move insn? How would the middle-end know that? Johann
On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <avr@gjlay.de> wrote: > > > > Am 02.07.24 um 15:48 schrieb Richard Biener: > > On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: > >> > >> Hi Jeff, > >> > >> This is a patch to get correct code out of 64-bit > >> loads from address-space __memx. > >> > >> The AVR address-spaces may require that move insns issue > >> calls to library support functions, a fact that -ftree-ter > >> doesn't account for. tree-ssa-ter.cc then replaces an > >> expression across such a library call, resulting in wrong code. > >> > >> This patch disables that pass per default on avr, as there is no > >> more fine grained way to avoid malicious optimizations. > >> The pass can still be re-enabled by means of explicit -ftree-ter. > >> > >> Ok to apply? > > > > I think this requires more details on what goes wrong - I assume > > it's not stmt reordering that effectively happens but recursive > > expand_expr on SSA defs when those invoke libcalls? In that > > case this would point to a deeper issue. > > The difference is that with TER, we get a hard reg in .expand > for a movdi from 24-bit address-space __memx. > > Such moves require library calls, which in turn require > specific hard registers. As avr backend has no movdi, the > moddi gets expanded as 8 * movqi, and that does not work > when the target registers are hard regs, as some of them > are clobbered by the libcalls. So I see (insn 18 17 19 2 (parallel [ (set (reg:QI 22 r22 [+4 ]) (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) "t.c":12:13 38 {xloadqi_A} (nil)) (insn 19 18 20 2 (set (reg:PSI 56) (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split} (nil)) (insn 20 19 21 2 (parallel [ (set (reg/f:PSI 57) (plus:PSI (reg/f:PSI 47) (const_int 5 [0x5]))) (clobber (scratch:QI)) ]) "t.c":12:13 205 {addpsi3} (nil)) (insn 21 20 22 2 (parallel [ (set (reg:QI 23 r23 [+5 ]) (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) "t.c":12:13 38 {xloadqi_A} (nil)) for example - insn 21 clobbers r22 which is also the destination of insn 18. With -fno-tree-ter those oddly get _no_ intermediate reg but we have (insn 9 8 10 2 (parallel [ (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1) (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7])) (clobber (reg:QI 22 r22)) (clobber (reg:QI 21 r21)) (clobber (reg:HI 30 r30)) ]) "t.c":12:13 38 {xloadqi_A} (nil)) but since on GIMPLE we have DImode loads I don't see how TER comes into play here - TER should favor the second code generation, not the first ... (or TER shouldn't play into here at all). with -fno-tree-ter we come via #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 #1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false) at /space/rguenther/src/gcc/gcc/expr.cc:6740 #2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>, from=<var_decl 0x7ffff7162000 aa>, nontemporal=false) at /space/rguenther/src/gcc/gcc/expr.cc:6461 while with TER we instead have #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 #1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0, target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100 #2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>, target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278 the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we are not passing a target or a mode. I think the issue is that the expansion at some point doesn't expect the result to end up in a hard register. Maybe define_expand are not supposed to do that but maybe expansion needs to fix up. A first thought was diff --git a/gcc/expr.cc b/gcc/expr.cc index ffbac513692..1509acad02e 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx target, machine_mode tmode, gcc_unreachable (); } set_curr_insn_location (saved_loc); + if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER) + { + rtx tem = gen_reg_rtx (GET_MODE (r)); + emit_move_insn (tem, r); + r = tem; + } if (REG_P (r) && !REG_EXPR (r)) set_reg_attrs_for_decl_rtl (lhs, r); return r; but of course that's not the place to fix - this sees (mem/u/c:DI (reg/f:PSI 47) [1 aa+0 S8 A8 AS7]) as result and things go wrong somewhere in the chain of expanding things from the return, possibly at the point of expanding the plus and there possibly when building subregs of the DImode mem. You'd have to trace that down but the fix in the end is to do sth like the above or alternatively, in the expander producing the 'xload' make sure to not allow a hardreg as destination when you can still create pseudos? Richard. > Moreover, even with TER, the code is no more efficient than > without it, so it's not clear what's the point in propagating > hard regs into expander operands. Later passes like fwprop1 and > combine can do that, too. > > Requiring libcalls in a mov insn is quite special indeed, > and there is no way to tell that to TER. TER itself does not > optimize code involving libcalls, so it knows they are fragile. > > > So - if the wrongness is already apparent in the RTL expansion > > pass dump can you quote the respective pieces and explain why? > > It expand a 64-bit move from __memx address-space to registers > R18...R25. This is broken into 8 QI moves to these regs, but > the movqi requires a libcall in some situations, which pass their > arguments in R21...R25. Hence the libcalls clobber some of the > destination regs. > > It would already help when TER would not propagate hard-regs into > expander operands. > > Johann > > >> As an alternative, the option could be disabled permanently in > >> avr.cc::avr_option_override(). > >> > >> Johann > >> > >> -- > >> > >> AVR: middle-end/87376 - Use -fno-tree-ter per default. > >> > >> Temporary expression replacement might replace expressions across > >> library calls, for example with move insn from address-space __memx > >> like in PR87376. -ftree-ter has no way where the backend could hook > >> in to avoid only problematic replacements, thus kick it out altogether. > >> > >> PR middle-end/87376 > >> gcc/ > >> * common/config/avr/avr-common.cc (avr_option_optimization_table) > >> <OPT_ftree_ter>: Set to 0. > >> gcc/testsuite/ > >> * gcc.target/avr/torture/pr87376-memx.c: New test.
On Thu, Jul 4, 2024 at 11:24 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <avr@gjlay.de> wrote: > > > > > > > > Am 02.07.24 um 15:48 schrieb Richard Biener: > > > On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: > > >> > > >> Hi Jeff, > > >> > > >> This is a patch to get correct code out of 64-bit > > >> loads from address-space __memx. > > >> > > >> The AVR address-spaces may require that move insns issue > > >> calls to library support functions, a fact that -ftree-ter > > >> doesn't account for. tree-ssa-ter.cc then replaces an > > >> expression across such a library call, resulting in wrong code. > > >> > > >> This patch disables that pass per default on avr, as there is no > > >> more fine grained way to avoid malicious optimizations. > > >> The pass can still be re-enabled by means of explicit -ftree-ter. > > >> > > >> Ok to apply? > > > > > > I think this requires more details on what goes wrong - I assume > > > it's not stmt reordering that effectively happens but recursive > > > expand_expr on SSA defs when those invoke libcalls? In that > > > case this would point to a deeper issue. > > > > The difference is that with TER, we get a hard reg in .expand > > for a movdi from 24-bit address-space __memx. > > > > Such moves require library calls, which in turn require > > specific hard registers. As avr backend has no movdi, the > > moddi gets expanded as 8 * movqi, and that does not work > > when the target registers are hard regs, as some of them > > are clobbered by the libcalls. > > So I see > > (insn 18 17 19 2 (parallel [ > (set (reg:QI 22 r22 [+4 ]) > (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) "t.c":12:13 38 {xloadqi_A} > (nil)) > (insn 19 18 20 2 (set (reg:PSI 56) > (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split} > (nil)) > (insn 20 19 21 2 (parallel [ > (set (reg/f:PSI 57) > (plus:PSI (reg/f:PSI 47) > (const_int 5 [0x5]))) > (clobber (scratch:QI)) > ]) "t.c":12:13 205 {addpsi3} > (nil)) > (insn 21 20 22 2 (parallel [ > (set (reg:QI 23 r23 [+5 ]) > (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) "t.c":12:13 38 {xloadqi_A} > (nil)) > > for example - insn 21 clobbers r22 which is also the destination of insn 18. > > With -fno-tree-ter those oddly get _no_ intermediate reg but we have > > (insn 9 8 10 2 (parallel [ > (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1) > (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7])) > (clobber (reg:QI 22 r22)) > (clobber (reg:QI 21 r21)) > (clobber (reg:HI 30 r30)) > ]) "t.c":12:13 38 {xloadqi_A} > (nil)) > > but since on GIMPLE we have DImode loads I don't see how TER comes into > play here - TER should favor the second code generation, not the first ... > (or TER shouldn't play into here at all). > > with -fno-tree-ter we come via > > #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, > tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 > #1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>, > target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false) > at /space/rguenther/src/gcc/gcc/expr.cc:6740 > #2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>, > from=<var_decl 0x7ffff7162000 aa>, nontemporal=false) > at /space/rguenther/src/gcc/gcc/expr.cc:6461 > > while with TER we instead have > > #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0, > tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 > #1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0, > target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100 > #2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>, > target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278 > > the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we > are not passing a target or a mode. > > I think the issue is that the expansion at some point doesn't expect > the result to end up in > a hard register. Maybe define_expand are not supposed to do that but maybe > expansion needs to fix up. > > A first thought was > > diff --git a/gcc/expr.cc b/gcc/expr.cc > index ffbac513692..1509acad02e 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx > target, machine_mode tmode, > gcc_unreachable (); > } > set_curr_insn_location (saved_loc); > + if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER) > + { > + rtx tem = gen_reg_rtx (GET_MODE (r)); > + emit_move_insn (tem, r); > + r = tem; > + } > if (REG_P (r) && !REG_EXPR (r)) > set_reg_attrs_for_decl_rtl (lhs, r); > return r; > > but of course that's not the place to fix - this sees (mem/u/c:DI > (reg/f:PSI 47) [1 aa+0 S8 A8 AS7]) > as result and things go wrong somewhere in the chain of expanding > things from the return, possibly at the point of expanding the plus and > there possibly when building subregs of the DImode mem. > > You'd have to trace that down but the fix in the end is to do sth like the above > or alternatively, in the expander producing the 'xload' make sure to not allow > a hardreg as destination when you can still create pseudos? diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index dabf4c0fc5a..f897113c885 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -746,9 +746,7 @@ else { rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22); - if (reg_overlap_mentioned_p (dest2, reg_22) - || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21])) - dest2 = gen_reg_rtx (<MODE>mode); + dest2 = gen_reg_rtx (<MODE>mode); emit_insn (gen_xload<mode>_A (dest2, src)); } seems to fix it. I'm not sure what reg_overlap_mentioned_p should achieve in a define-expand. Richard. > Richard. > > > Moreover, even with TER, the code is no more efficient than > > without it, so it's not clear what's the point in propagating > > hard regs into expander operands. Later passes like fwprop1 and > > combine can do that, too. > > > > Requiring libcalls in a mov insn is quite special indeed, > > and there is no way to tell that to TER. TER itself does not > > optimize code involving libcalls, so it knows they are fragile. > > > > > So - if the wrongness is already apparent in the RTL expansion > > > pass dump can you quote the respective pieces and explain why? > > > > It expand a 64-bit move from __memx address-space to registers > > R18...R25. This is broken into 8 QI moves to these regs, but > > the movqi requires a libcall in some situations, which pass their > > arguments in R21...R25. Hence the libcalls clobber some of the > > destination regs. > > > > It would already help when TER would not propagate hard-regs into > > expander operands. > > > > Johann > > > > >> As an alternative, the option could be disabled permanently in > > >> avr.cc::avr_option_override(). > > >> > > >> Johann > > >> > > >> -- > > >> > > >> AVR: middle-end/87376 - Use -fno-tree-ter per default. > > >> > > >> Temporary expression replacement might replace expressions across > > >> library calls, for example with move insn from address-space __memx > > >> like in PR87376. -ftree-ter has no way where the backend could hook > > >> in to avoid only problematic replacements, thus kick it out altogether. > > >> > > >> PR middle-end/87376 > > >> gcc/ > > >> * common/config/avr/avr-common.cc (avr_option_optimization_table) > > >> <OPT_ftree_ter>: Set to 0. > > >> gcc/testsuite/ > > >> * gcc.target/avr/torture/pr87376-memx.c: New test.
Am 04.07.24 um 11:49 schrieb Richard Biener: > On Thu, Jul 4, 2024 at 11:24 AM Richard Biener > <richard.guenther@gmail.com> wrote: >> >> On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <avr@gjlay.de> wrote: >>> >>> >>> >>> Am 02.07.24 um 15:48 schrieb Richard Biener: >>>> On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: >>>>> >>>>> Hi Jeff, >>>>> >>>>> This is a patch to get correct code out of 64-bit >>>>> loads from address-space __memx. >>>>> >>>>> The AVR address-spaces may require that move insns issue >>>>> calls to library support functions, a fact that -ftree-ter >>>>> doesn't account for. tree-ssa-ter.cc then replaces an >>>>> expression across such a library call, resulting in wrong code. >>>>> >>>>> This patch disables that pass per default on avr, as there is no >>>>> more fine grained way to avoid malicious optimizations. >>>>> The pass can still be re-enabled by means of explicit -ftree-ter. >>>>> >>>>> Ok to apply? >>>> >>>> I think this requires more details on what goes wrong - I assume >>>> it's not stmt reordering that effectively happens but recursive >>>> expand_expr on SSA defs when those invoke libcalls? In that >>>> case this would point to a deeper issue. >>> >>> The difference is that with TER, we get a hard reg in .expand >>> for a movdi from 24-bit address-space __memx. >>> >>> Such moves require library calls, which in turn require >>> specific hard registers. As avr backend has no movdi, the >>> moddi gets expanded as 8 * movqi, and that does not work >>> when the target registers are hard regs, as some of them >>> are clobbered by the libcalls. >> >> So I see >> >> (insn 18 17 19 2 (parallel [ >> (set (reg:QI 22 r22 [+4 ]) >> (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7])) >> (clobber (reg:QI 22 r22)) >> (clobber (reg:QI 21 r21)) >> (clobber (reg:HI 30 r30)) >> ]) "t.c":12:13 38 {xloadqi_A} >> (nil)) >> (insn 19 18 20 2 (set (reg:PSI 56) >> (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split} >> (nil)) >> (insn 20 19 21 2 (parallel [ >> (set (reg/f:PSI 57) >> (plus:PSI (reg/f:PSI 47) >> (const_int 5 [0x5]))) >> (clobber (scratch:QI)) >> ]) "t.c":12:13 205 {addpsi3} >> (nil)) >> (insn 21 20 22 2 (parallel [ >> (set (reg:QI 23 r23 [+5 ]) >> (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7])) >> (clobber (reg:QI 22 r22)) >> (clobber (reg:QI 21 r21)) >> (clobber (reg:HI 30 r30)) >> ]) "t.c":12:13 38 {xloadqi_A} >> (nil)) >> >> for example - insn 21 clobbers r22 which is also the destination of insn 18. >> >> With -fno-tree-ter those oddly get _no_ intermediate reg but we have >> >> (insn 9 8 10 2 (parallel [ >> (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1) >> (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7])) >> (clobber (reg:QI 22 r22)) >> (clobber (reg:QI 21 r21)) >> (clobber (reg:HI 30 r30)) >> ]) "t.c":12:13 38 {xloadqi_A} >> (nil)) >> >> but since on GIMPLE we have DImode loads I don't see how TER comes into >> play here - TER should favor the second code generation, not the first ... >> (or TER shouldn't play into here at all). >> >> with -fno-tree-ter we come via >> >> #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, >> tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8, >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 >> #1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>, >> target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false) >> at /space/rguenther/src/gcc/gcc/expr.cc:6740 >> #2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>, >> from=<var_decl 0x7ffff7162000 aa>, nontemporal=false) >> at /space/rguenther/src/gcc/gcc/expr.cc:6461 >> >> while with TER we instead have >> >> #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0, >> tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 >> #1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0, >> target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100 >> #2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>, >> target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278 >> >> the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we >> are not passing a target or a mode. >> >> I think the issue is that the expansion at some point doesn't expect >> the result to end up in >> a hard register. Maybe define_expand are not supposed to do that but maybe >> expansion needs to fix up. >> >> A first thought was >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index ffbac513692..1509acad02e 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx >> target, machine_mode tmode, >> gcc_unreachable (); >> } >> set_curr_insn_location (saved_loc); >> + if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER) >> + { >> + rtx tem = gen_reg_rtx (GET_MODE (r)); >> + emit_move_insn (tem, r); >> + r = tem; >> + } >> if (REG_P (r) && !REG_EXPR (r)) >> set_reg_attrs_for_decl_rtl (lhs, r); >> return r; >> >> but of course that's not the place to fix - this sees (mem/u/c:DI >> (reg/f:PSI 47) [1 aa+0 S8 A8 AS7]) >> as result and things go wrong somewhere in the chain of expanding >> things from the return, possibly at the point of expanding the plus and >> there possibly when building subregs of the DImode mem. >> >> You'd have to trace that down but the fix in the end is to do sth like the above >> or alternatively, in the expander producing the 'xload' make sure to not allow >> a hardreg as destination when you can still create pseudos? > > diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md > index dabf4c0fc5a..f897113c885 100644 > --- a/gcc/config/avr/avr.md > +++ b/gcc/config/avr/avr.md > @@ -746,9 +746,7 @@ > else > { > rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22); > - if (reg_overlap_mentioned_p (dest2, reg_22) > - || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21])) > - dest2 = gen_reg_rtx (<MODE>mode); > + dest2 = gen_reg_rtx (<MODE>mode); > > emit_insn (gen_xload<mode>_A (dest2, src)); > } > > seems to fix it. I'm not sure what reg_overlap_mentioned_p should achieve in > a define-expand. > > Richard. I tried that, but it is still producing wrong code (test case aborts). The problem is that the middle-end comes up with a hard register as destination. Just using a pseudo register at that place does not help because you have to copy that pseudo (dest2) back to the hard register (dest) two lines below at avr.md:757. Then, in the next such call of gen_movqi, that hard register lives, and will be clobbered. Copying back and forth pseudos and hard regs does not help because the hard reg result from the previous gen_movqi is still supposed to live across the next gen_movqi *and* the 2nd gen_movqi is told to put a result in that exact same hard register. That cannot work. For hard regs that are inputs, we could save / restore them across the xload insns like for PR63633, but that doesn't work for output operands that are hard regs. Johann >> Richard. >> >>> Moreover, even with TER, the code is no more efficient than >>> without it, so it's not clear what's the point in propagating >>> hard regs into expander operands. Later passes like fwprop1 and >>> combine can do that, too. >>> >>> Requiring libcalls in a mov insn is quite special indeed, >>> and there is no way to tell that to TER. TER itself does not >>> optimize code involving libcalls, so it knows they are fragile. >>> >>>> So - if the wrongness is already apparent in the RTL expansion >>>> pass dump can you quote the respective pieces and explain why? >>> >>> It expand a 64-bit move from __memx address-space to registers >>> R18...R25. This is broken into 8 QI moves to these regs, but >>> the movqi requires a libcall in some situations, which pass their >>> arguments in R21...R25. Hence the libcalls clobber some of the >>> destination regs. >>> >>> It would already help when TER would not propagate hard-regs into >>> expander operands. >>> >>> Johann >>> >>>>> As an alternative, the option could be disabled permanently in >>>>> avr.cc::avr_option_override(). >>>>> >>>>> Johann >>>>> >>>>> -- >>>>> >>>>> AVR: middle-end/87376 - Use -fno-tree-ter per default. >>>>> >>>>> Temporary expression replacement might replace expressions across >>>>> library calls, for example with move insn from address-space __memx >>>>> like in PR87376. -ftree-ter has no way where the backend could hook >>>>> in to avoid only problematic replacements, thus kick it out altogether. >>>>> >>>>> PR middle-end/87376 >>>>> gcc/ >>>>> * common/config/avr/avr-common.cc (avr_option_optimization_table) >>>>> <OPT_ftree_ter>: Set to 0. >>>>> gcc/testsuite/ >>>>> * gcc.target/avr/torture/pr87376-memx.c: New test.
On Thu, Jul 4, 2024 at 1:08 PM Georg-Johann Lay <avr@gjlay.de> wrote: > > > > Am 04.07.24 um 11:49 schrieb Richard Biener: > > On Thu, Jul 4, 2024 at 11:24 AM Richard Biener > > <richard.guenther@gmail.com> wrote: > >> > >> On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <avr@gjlay.de> wrote: > >>> > >>> > >>> > >>> Am 02.07.24 um 15:48 schrieb Richard Biener: > >>>> On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: > >>>>> > >>>>> Hi Jeff, > >>>>> > >>>>> This is a patch to get correct code out of 64-bit > >>>>> loads from address-space __memx. > >>>>> > >>>>> The AVR address-spaces may require that move insns issue > >>>>> calls to library support functions, a fact that -ftree-ter > >>>>> doesn't account for. tree-ssa-ter.cc then replaces an > >>>>> expression across such a library call, resulting in wrong code. > >>>>> > >>>>> This patch disables that pass per default on avr, as there is no > >>>>> more fine grained way to avoid malicious optimizations. > >>>>> The pass can still be re-enabled by means of explicit -ftree-ter. > >>>>> > >>>>> Ok to apply? > >>>> > >>>> I think this requires more details on what goes wrong - I assume > >>>> it's not stmt reordering that effectively happens but recursive > >>>> expand_expr on SSA defs when those invoke libcalls? In that > >>>> case this would point to a deeper issue. > >>> > >>> The difference is that with TER, we get a hard reg in .expand > >>> for a movdi from 24-bit address-space __memx. > >>> > >>> Such moves require library calls, which in turn require > >>> specific hard registers. As avr backend has no movdi, the > >>> moddi gets expanded as 8 * movqi, and that does not work > >>> when the target registers are hard regs, as some of them > >>> are clobbered by the libcalls. > >> > >> So I see > >> > >> (insn 18 17 19 2 (parallel [ > >> (set (reg:QI 22 r22 [+4 ]) > >> (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7])) > >> (clobber (reg:QI 22 r22)) > >> (clobber (reg:QI 21 r21)) > >> (clobber (reg:HI 30 r30)) > >> ]) "t.c":12:13 38 {xloadqi_A} > >> (nil)) > >> (insn 19 18 20 2 (set (reg:PSI 56) > >> (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split} > >> (nil)) > >> (insn 20 19 21 2 (parallel [ > >> (set (reg/f:PSI 57) > >> (plus:PSI (reg/f:PSI 47) > >> (const_int 5 [0x5]))) > >> (clobber (scratch:QI)) > >> ]) "t.c":12:13 205 {addpsi3} > >> (nil)) > >> (insn 21 20 22 2 (parallel [ > >> (set (reg:QI 23 r23 [+5 ]) > >> (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7])) > >> (clobber (reg:QI 22 r22)) > >> (clobber (reg:QI 21 r21)) > >> (clobber (reg:HI 30 r30)) > >> ]) "t.c":12:13 38 {xloadqi_A} > >> (nil)) > >> > >> for example - insn 21 clobbers r22 which is also the destination of insn 18. > >> > >> With -fno-tree-ter those oddly get _no_ intermediate reg but we have > >> > >> (insn 9 8 10 2 (parallel [ > >> (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1) > >> (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7])) > >> (clobber (reg:QI 22 r22)) > >> (clobber (reg:QI 21 r21)) > >> (clobber (reg:HI 30 r30)) > >> ]) "t.c":12:13 38 {xloadqi_A} > >> (nil)) > >> > >> but since on GIMPLE we have DImode loads I don't see how TER comes into > >> play here - TER should favor the second code generation, not the first ... > >> (or TER shouldn't play into here at all). > >> > >> with -fno-tree-ter we come via > >> > >> #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, > >> tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8, > >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 > >> #1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>, > >> target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false) > >> at /space/rguenther/src/gcc/gcc/expr.cc:6740 > >> #2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>, > >> from=<var_decl 0x7ffff7162000 aa>, nontemporal=false) > >> at /space/rguenther/src/gcc/gcc/expr.cc:6461 > >> > >> while with TER we instead have > >> > >> #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0, > >> tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 > >> #1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0, > >> target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100 > >> #2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>, > >> target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, > >> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278 > >> > >> the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we > >> are not passing a target or a mode. > >> > >> I think the issue is that the expansion at some point doesn't expect > >> the result to end up in > >> a hard register. Maybe define_expand are not supposed to do that but maybe > >> expansion needs to fix up. > >> > >> A first thought was > >> > >> diff --git a/gcc/expr.cc b/gcc/expr.cc > >> index ffbac513692..1509acad02e 100644 > >> --- a/gcc/expr.cc > >> +++ b/gcc/expr.cc > >> @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx > >> target, machine_mode tmode, > >> gcc_unreachable (); > >> } > >> set_curr_insn_location (saved_loc); > >> + if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER) > >> + { > >> + rtx tem = gen_reg_rtx (GET_MODE (r)); > >> + emit_move_insn (tem, r); > >> + r = tem; > >> + } > >> if (REG_P (r) && !REG_EXPR (r)) > >> set_reg_attrs_for_decl_rtl (lhs, r); > >> return r; > >> > >> but of course that's not the place to fix - this sees (mem/u/c:DI > >> (reg/f:PSI 47) [1 aa+0 S8 A8 AS7]) > >> as result and things go wrong somewhere in the chain of expanding > >> things from the return, possibly at the point of expanding the plus and > >> there possibly when building subregs of the DImode mem. > >> > >> You'd have to trace that down but the fix in the end is to do sth like the above > >> or alternatively, in the expander producing the 'xload' make sure to not allow > >> a hardreg as destination when you can still create pseudos? > > > > diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md > > index dabf4c0fc5a..f897113c885 100644 > > --- a/gcc/config/avr/avr.md > > +++ b/gcc/config/avr/avr.md > > @@ -746,9 +746,7 @@ > > else > > { > > rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22); > > - if (reg_overlap_mentioned_p (dest2, reg_22) > > - || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21])) > > - dest2 = gen_reg_rtx (<MODE>mode); > > + dest2 = gen_reg_rtx (<MODE>mode); > > > > emit_insn (gen_xload<mode>_A (dest2, src)); > > } > > > > seems to fix it. I'm not sure what reg_overlap_mentioned_p should achieve in > > a define-expand. > > > > Richard. > > I tried that, but it is still producing wrong code (test case aborts). If that's so you need to look at where the middle-end comes up with that hard register as target but still continues to expand other "stuff" without emitting the hardreg use immediately (the call for what the hardreg is the argument?). It's definitely the fault of either the middle-end or the target and has nothing to do with TER. I do not have time to track this down further though. > The problem is that the middle-end comes up with a hard > register as destination. Just using a pseudo register at that > place does not help because you have to copy that pseudo (dest2) > back to the hard register (dest) two lines below at avr.md:757. > > Then, in the next such call of gen_movqi, that hard register > lives, and will be clobbered. Copying back and forth pseudos > and hard regs does not help because the hard reg result > from the previous gen_movqi is still supposed to live across > the next gen_movqi *and* the 2nd gen_movqi is told to put > a result in that exact same hard register. > > That cannot work. Exactly. > For hard regs that are inputs, we could save / restore them across > the xload insns like for PR63633, but that doesn't work for > output operands that are hard regs. RTL expansion is not supposed to produce defs or uses of hard registers that are otherwise free to use. IIRC we disabled TER across hardreg assignments also because of libcalls (PR70184 which looks similar to your issue). It seems that in your case we're expanding the arguments of a libcall to a libcall which would be the same issue. I'll note the libcalls only appear very late - after RTL expansion I still see (insn 44 43 53 2 (set (reg:DI 18 r18) (plus:DI (reg:DI 18 r18) (reg:DI 10 r10))) "t.c":12:13 2653 {adddi3_insn} (nil)) for example but those seem to have hardreg constraints already? I do wonder how say softfp targets avoid this when doing, say (a + b) + (d + e)? Richard. > Johann > > > >> Richard. > >> > >>> Moreover, even with TER, the code is no more efficient than > >>> without it, so it's not clear what's the point in propagating > >>> hard regs into expander operands. Later passes like fwprop1 and > >>> combine can do that, too. > >>> > >>> Requiring libcalls in a mov insn is quite special indeed, > >>> and there is no way to tell that to TER. TER itself does not > >>> optimize code involving libcalls, so it knows they are fragile. > >>> > >>>> So - if the wrongness is already apparent in the RTL expansion > >>>> pass dump can you quote the respective pieces and explain why? > >>> > >>> It expand a 64-bit move from __memx address-space to registers > >>> R18...R25. This is broken into 8 QI moves to these regs, but > >>> the movqi requires a libcall in some situations, which pass their > >>> arguments in R21...R25. Hence the libcalls clobber some of the > >>> destination regs. > >>> > >>> It would already help when TER would not propagate hard-regs into > >>> expander operands. > >>> > >>> Johann > >>> > >>>>> As an alternative, the option could be disabled permanently in > >>>>> avr.cc::avr_option_override(). > >>>>> > >>>>> Johann > >>>>> > >>>>> -- > >>>>> > >>>>> AVR: middle-end/87376 - Use -fno-tree-ter per default. > >>>>> > >>>>> Temporary expression replacement might replace expressions across > >>>>> library calls, for example with move insn from address-space __memx > >>>>> like in PR87376. -ftree-ter has no way where the backend could hook > >>>>> in to avoid only problematic replacements, thus kick it out altogether. > >>>>> > >>>>> PR middle-end/87376 > >>>>> gcc/ > >>>>> * common/config/avr/avr-common.cc (avr_option_optimization_table) > >>>>> <OPT_ftree_ter>: Set to 0. > >>>>> gcc/testsuite/ > >>>>> * gcc.target/avr/torture/pr87376-memx.c: New test.
Am 04.07.24 um 13:25 schrieb Richard Biener: > On Thu, Jul 4, 2024 at 1:08 PM Georg-Johann Lay <avr@gjlay.de> wrote: >> Am 04.07.24 um 11:49 schrieb Richard Biener: >>> On Thu, Jul 4, 2024 at 11:24 AM Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <avr@gjlay.de> wrote: >>>>> Am 02.07.24 um 15:48 schrieb Richard Biener: >>>>>> On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: >>>>>>> >>>>>>> This is a patch to get correct code out of 64-bit >>>>>>> loads from address-space __memx. >>>>>>> >>>>>>> The AVR address-spaces may require that move insns issue >>>>>>> calls to library support functions, a fact that -ftree-ter >>>>>>> doesn't account for. tree-ssa-ter.cc then replaces an >>>>>>> expression across such a library call, resulting in wrong code. >>>>>>> >>>>>>> This patch disables that pass per default on avr, as there is no >>>>>>> more fine grained way to avoid malicious optimizations. >>>>>>> The pass can still be re-enabled by means of explicit -ftree-ter. >>>>>>> >>>>>>> Ok to apply? >>>>>> >>>>>> I think this requires more details on what goes wrong - I assume >>>>>> it's not stmt reordering that effectively happens but recursive >>>>>> expand_expr on SSA defs when those invoke libcalls? In that >>>>>> case this would point to a deeper issue. >>>>> >>>>> The difference is that with TER, we get a hard reg in .expand >>>>> for a movdi from 24-bit address-space __memx. >>>>> >>>>> Such moves require library calls, which in turn require >>>>> specific hard registers. As avr backend has no movdi, the >>>>> moddi gets expanded as 8 * movqi, and that does not work >>>>> when the target registers are hard regs, as some of them >>>>> are clobbered by the libcalls. >>>> >>>> So I see >>>> >>>> (insn 18 17 19 2 (parallel [ >>>> (set (reg:QI 22 r22 [+4 ]) >>>> (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7])) >>>> (clobber (reg:QI 22 r22)) >>>> (clobber (reg:QI 21 r21)) >>>> (clobber (reg:HI 30 r30)) >>>> ]) "t.c":12:13 38 {xloadqi_A} >>>> (nil)) >>>> (insn 19 18 20 2 (set (reg:PSI 56) >>>> (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split} >>>> (nil)) >>>> (insn 20 19 21 2 (parallel [ >>>> (set (reg/f:PSI 57) >>>> (plus:PSI (reg/f:PSI 47) >>>> (const_int 5 [0x5]))) >>>> (clobber (scratch:QI)) >>>> ]) "t.c":12:13 205 {addpsi3} >>>> (nil)) >>>> (insn 21 20 22 2 (parallel [ >>>> (set (reg:QI 23 r23 [+5 ]) >>>> (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7])) >>>> (clobber (reg:QI 22 r22)) >>>> (clobber (reg:QI 21 r21)) >>>> (clobber (reg:HI 30 r30)) >>>> ]) "t.c":12:13 38 {xloadqi_A} >>>> (nil)) >>>> >>>> for example - insn 21 clobbers r22 which is also the destination of insn 18. >>>> >>>> With -fno-tree-ter those oddly get _no_ intermediate reg but we have >>>> >>>> (insn 9 8 10 2 (parallel [ >>>> (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1) >>>> (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7])) >>>> (clobber (reg:QI 22 r22)) >>>> (clobber (reg:QI 21 r21)) >>>> (clobber (reg:HI 30 r30)) >>>> ]) "t.c":12:13 38 {xloadqi_A} >>>> (nil)) >>>> >>>> but since on GIMPLE we have DImode loads I don't see how TER comes into >>>> play here - TER should favor the second code generation, not the first ... >>>> (or TER shouldn't play into here at all). >>>> >>>> with -fno-tree-ter we come via >>>> >>>> #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8, >>>> tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8, >>>> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 >>>> #1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>, >>>> target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false) >>>> at /space/rguenther/src/gcc/gcc/expr.cc:6740 >>>> #2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>, >>>> from=<var_decl 0x7ffff7162000 aa>, nontemporal=false) >>>> at /space/rguenther/src/gcc/gcc/expr.cc:6461 >>>> >>>> while with TER we instead have >>>> >>>> #0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0, >>>> tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, >>>> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433 >>>> #1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0, >>>> target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, >>>> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100 >>>> #2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>, >>>> target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, >>>> inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278 >>>> >>>> the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we >>>> are not passing a target or a mode. >>>> >>>> I think the issue is that the expansion at some point doesn't expect >>>> the result to end up in >>>> a hard register. Maybe define_expand are not supposed to do that but maybe >>>> expansion needs to fix up. >>>> >>>> A first thought was >>>> >>>> diff --git a/gcc/expr.cc b/gcc/expr.cc >>>> index ffbac513692..1509acad02e 100644 >>>> --- a/gcc/expr.cc >>>> +++ b/gcc/expr.cc >>>> @@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx >>>> target, machine_mode tmode, >>>> gcc_unreachable (); >>>> } >>>> set_curr_insn_location (saved_loc); >>>> + if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER) >>>> + { >>>> + rtx tem = gen_reg_rtx (GET_MODE (r)); >>>> + emit_move_insn (tem, r); >>>> + r = tem; >>>> + } >>>> if (REG_P (r) && !REG_EXPR (r)) >>>> set_reg_attrs_for_decl_rtl (lhs, r); >>>> return r; >>>> >>>> but of course that's not the place to fix - this sees (mem/u/c:DI >>>> (reg/f:PSI 47) [1 aa+0 S8 A8 AS7]) >>>> as result and things go wrong somewhere in the chain of expanding >>>> things from the return, possibly at the point of expanding the plus and >>>> there possibly when building subregs of the DImode mem. >>>> >>>> You'd have to trace that down but the fix in the end is to do sth like the above >>>> or alternatively, in the expander producing the 'xload' make sure to not allow >>>> a hardreg as destination when you can still create pseudos? >>> >>> diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md >>> index dabf4c0fc5a..f897113c885 100644 >>> --- a/gcc/config/avr/avr.md >>> +++ b/gcc/config/avr/avr.md >>> @@ -746,9 +746,7 @@ >>> else >>> { >>> rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22); >>> - if (reg_overlap_mentioned_p (dest2, reg_22) >>> - || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21])) >>> - dest2 = gen_reg_rtx (<MODE>mode); >>> + dest2 = gen_reg_rtx (<MODE>mode); >>> >>> emit_insn (gen_xload<mode>_A (dest2, src)); >>> } >>> >>> seems to fix it. I'm not sure what reg_overlap_mentioned_p should achieve in >>> a define-expand. >>> >>> Richard. >> >> I tried that, but it is still producing wrong code (test case aborts). > > If that's so you need to look at where the middle-end comes up with > that hard register as target but still continues to expand other "stuff" > without emitting the hardreg use immediately (the call for what the > hardreg is the argument?). > > It's definitely the fault of either the middle-end or the target and has > nothing to do with TER. I do not have time to track this down further > though. Found it: It's the emit_move_insn (acc_a, operands[1]) and alike in avr-dimode.md. acc_a is a hard reg and operands[1] is general_operand and thus may be MEM. I'll prepare a patch to use intermediate pseudo. Johann >> The problem is that the middle-end comes up with a hard >> register as destination. Just using a pseudo register at that >> place does not help because you have to copy that pseudo (dest2) >> back to the hard register (dest) two lines below at avr.md:757. >> >> Then, in the next such call of gen_movqi, that hard register >> lives, and will be clobbered. Copying back and forth pseudos >> and hard regs does not help because the hard reg result >> from the previous gen_movqi is still supposed to live across >> the next gen_movqi *and* the 2nd gen_movqi is told to put >> a result in that exact same hard register. >> >> That cannot work. > > Exactly. > >> For hard regs that are inputs, we could save / restore them across >> the xload insns like for PR63633, but that doesn't work for >> output operands that are hard regs. > > RTL expansion is not supposed to produce defs or uses of hard registers > that are otherwise free to use. IIRC we disabled TER across hardreg > assignments also because of libcalls (PR70184 which looks similar to > your issue). It seems that in your case we're expanding the arguments > of a libcall to a libcall which would be the same issue. > > I'll note the libcalls only appear very late - after RTL expansion I still see > > (insn 44 43 53 2 (set (reg:DI 18 r18) > (plus:DI (reg:DI 18 r18) > (reg:DI 10 r10))) "t.c":12:13 2653 {adddi3_insn} > (nil)) > > for example but those seem to have hardreg constraints already? > > I do wonder how say softfp targets avoid this when doing, say > (a + b) + (d + e)? > > Richard. > >> Johann >> >> >>>> Richard. >>>> >>>>> Moreover, even with TER, the code is no more efficient than >>>>> without it, so it's not clear what's the point in propagating >>>>> hard regs into expander operands. Later passes like fwprop1 and >>>>> combine can do that, too. >>>>> >>>>> Requiring libcalls in a mov insn is quite special indeed, >>>>> and there is no way to tell that to TER. TER itself does not >>>>> optimize code involving libcalls, so it knows they are fragile. >>>>> >>>>>> So - if the wrongness is already apparent in the RTL expansion >>>>>> pass dump can you quote the respective pieces and explain why? >>>>> >>>>> It expand a 64-bit move from __memx address-space to registers >>>>> R18...R25. This is broken into 8 QI moves to these regs, but >>>>> the movqi requires a libcall in some situations, which pass their >>>>> arguments in R21...R25. Hence the libcalls clobber some of the >>>>> destination regs. >>>>> >>>>> It would already help when TER would not propagate hard-regs into >>>>> expander operands. >>>>> >>>>> Johann >>>>> >>>>>>> As an alternative, the option could be disabled permanently in >>>>>>> avr.cc::avr_option_override(). >>>>>>> >>>>>>> Johann >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> AVR: middle-end/87376 - Use -fno-tree-ter per default. >>>>>>> >>>>>>> Temporary expression replacement might replace expressions across >>>>>>> library calls, for example with move insn from address-space __memx >>>>>>> like in PR87376. -ftree-ter has no way where the backend could hook >>>>>>> in to avoid only problematic replacements, thus kick it out altogether. >>>>>>> >>>>>>> PR middle-end/87376 >>>>>>> gcc/ >>>>>>> * common/config/avr/avr-common.cc (avr_option_optimization_table) >>>>>>> <OPT_ftree_ter>: Set to 0. >>>>>>> gcc/testsuite/ >>>>>>> * gcc.target/avr/torture/pr87376-memx.c: New test.
On 7/3/24 2:32 PM, Georg-Johann Lay wrote: > > > Am 03.07.24 um 21:39 schrieb Jeff Law: >> >> >> On 7/3/24 1:26 PM, Georg-Johann Lay wrote: >>> >>> >>> Am 02.07.24 um 15:48 schrieb Richard Biener: >>>> On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <avr@gjlay.de> wrote: >>>>> >>>>> Hi Jeff, >>>>> >>>>> This is a patch to get correct code out of 64-bit >>>>> loads from address-space __memx. >>>>> >>>>> The AVR address-spaces may require that move insns issue >>>>> calls to library support functions, a fact that -ftree-ter >>>>> doesn't account for. tree-ssa-ter.cc then replaces an >>>>> expression across such a library call, resulting in wrong code. >>>>> >>>>> This patch disables that pass per default on avr, as there is no >>>>> more fine grained way to avoid malicious optimizations. >>>>> The pass can still be re-enabled by means of explicit -ftree-ter. >>>>> >>>>> Ok to apply? >>>> >>>> I think this requires more details on what goes wrong - I assume >>>> it's not stmt reordering that effectively happens but recursive >>>> expand_expr on SSA defs when those invoke libcalls? In that >>>> case this would point to a deeper issue. >>> >>> The difference is that with TER, we get a hard reg in .expand >>> for a movdi from 24-bit address-space __memx. >>> >>> Such moves require library calls, which in turn require >>> specific hard registers. As avr backend has no movdi, the >>> moddi gets expanded as 8 * movqi, and that does not work >>> when the target registers are hard regs, as some of them >>> are clobbered by the libcalls. >> But this is something that's handled for other targets. I don't >> remember all the details, but there's generic code to handle this >> situation. >> >> Jeff > > A libcall in a move insn? How would the middle-end know that? No, cases where setting up one argument may clobber arguments that were already set up. But looking at the code in question (it's been at least a decade since I've looked at these bits) indicate it's just for arguments on the stack. So not necessarily applicable here. For reference I'm referring to the stack_usage_map and related code in calls.c. jeff
diff --git a/gcc/common/config/avr/avr-common.cc b/gcc/common/config/avr/avr-common.cc index fdf130f1e1a..2300d602cfe 100644 --- a/gcc/common/config/avr/avr-common.cc +++ b/gcc/common/config/avr/avr-common.cc @@ -32,6 +32,11 @@ static const struct default_options avr_option_optimization_table[] = // The only effect of -fcaller-saves might be that it triggers // a frame without need when it tries to be smart around calls. { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 }, + // Temporary expression replacement might replace expressions across + // library calls, for example with move insn from address-space __memx + // like in PR87376. -ftree-ter has no way where the backend could hook + // in to avoid only problematic replacements, thus kick it out altogether. + { OPT_LEVELS_ALL, OPT_ftree_ter, NULL, 0 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_mfuse_add_, NULL, 1 }, diff --git a/gcc/testsuite/gcc.target/avr/torture/pr87376-memx.c b/gcc/testsuite/gcc.target/avr/torture/pr87376-memx.c new file mode 100644 index 00000000000..e8373854ac8 --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/torture/pr87376-memx.c @@ -0,0 +1,33 @@ +/* { dg-do run { target { ! avr_tiny } } } */ +/* { dg-additional-options "-std=gnu99" } */ + +typedef __UINT64_TYPE__ uint64_t; + +extern const __memx uint64_t aa __asm ("real_aa"); +extern const uint64_t bb __asm ("real_bb"); + +const __memx uint64_t real_aa = 0x1122334455667788; +const uint64_t real_bb = 0x0908070605040302; + +__attribute__((noinline,noclone)) +uint64_t add1 (void) +{ + return aa + bb; +} + +__attribute__((noinline,noclone)) +uint64_t add2 (void) +{ + return bb + aa; +} + +int main (void) +{ + if (add1() != 0x1a2a3a4a5a6a7a8a) + __builtin_exit (__LINE__); + + if (add2() != 0x1a2a3a4a5a6a7a8a) + __builtin_exit (__LINE__); + + return 0; +}