Message ID | 20200910034405.39647-1-kito.cheng@sifive.com |
---|---|
State | New |
Headers | show |
Series | PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values. | expand |
On Thu, 10 Sep 2020, Kito Cheng wrote: > In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has handled > misaligned stores, but it didn't handle PARALLEL values, and I refer the > other part of this function, I found the PARALLEL need handled by > emit_group_* functions, so I add a check, and using emit_group_store if > storing a PARALLEL value, also checked this change didn't break the > testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal changes. > > For riscv64 target, struct S {int a; double b;} will pack into a parallel > value to return and it has TImode when misaligned access is supported, > however TImode required 16-byte align, but it only 8-byte align, so it go to > the misaligned stores handling, then it will try to generate move > instruction from a PARALLEL value. > > Tested on following target without introduced new reguression: > - riscv32/riscv64 elf > - x86_64-linux > - arm-eabi riscv doesn't seem to have movmisalign and I don't see why movmisalign should not support a TImode parallel RHS so at least you should put this check after the icode != CODE_FOR_nothing check? Also wouldn't it be better to support PARALLEL from within store_bit_field? After all this is a misaligned access and since you didn't quote the RTL involved I'm guessing that emit_group_store knows nothing about this fact. Richard. > gcc/ChangeLog: > > PR target/96759 > * expr.c (expand_assignment): Handle misaligned stores with PARALLEL > value. > > gcc/testsuite/ChangeLog: > > PR target/96759 > * g++.target/riscv/pr96759.C: New. > * gcc.target/riscv/pr96759.c: New. > --- > gcc/expr.c | 9 ++++++++- > gcc/testsuite/g++.target/riscv/pr96759.C | 8 ++++++++ > gcc/testsuite/gcc.target/riscv/pr96759.c | 13 +++++++++++++ > 3 files changed, 29 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.target/riscv/pr96759.C > create mode 100644 gcc/testsuite/gcc.target/riscv/pr96759.c > > diff --git a/gcc/expr.c b/gcc/expr.c > index 1a15f24b3979..cc2bc52c457f 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -5173,7 +5173,14 @@ expand_assignment (tree to, tree from, bool nontemporal) > if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to)) > reg = flip_storage_order (mode, reg); > > - if (icode != CODE_FOR_nothing) > + if (GET_CODE (reg) == PARALLEL) > + { > + push_temp_slots (); > + emit_group_store (mem, reg, TREE_TYPE (from), > + int_size_in_bytes (TREE_TYPE (from))); > + pop_temp_slots (); > + } > + else if (icode != CODE_FOR_nothing) > { > class expand_operand ops[2]; > > diff --git a/gcc/testsuite/g++.target/riscv/pr96759.C b/gcc/testsuite/g++.target/riscv/pr96759.C > new file mode 100644 > index 000000000000..673999a4baf7 > --- /dev/null > +++ b/gcc/testsuite/g++.target/riscv/pr96759.C > @@ -0,0 +1,8 @@ > +/* { dg-options "-mno-strict-align -std=gnu++17" } */ > +/* { dg-do compile } */ > +struct S { > + int a; > + double b; > +}; > +S GetNumbers(); > +auto [globalC, globalD] = GetNumbers(); > diff --git a/gcc/testsuite/gcc.target/riscv/pr96759.c b/gcc/testsuite/gcc.target/riscv/pr96759.c > new file mode 100644 > index 000000000000..621c39196fca > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr96759.c > @@ -0,0 +1,13 @@ > +/* { dg-options "-mno-strict-align" } */ > +/* { dg-do compile } */ > + > +struct S { > + int a; > + double b; > +}; > +struct S GetNumbers(); > +struct S g; > + > +void foo(){ > + g = GetNumbers(); > +} >
Hi Richard: Thanks for your review :) > riscv doesn't seem to have movmisalign and I don't see why movmisalign > should not support a TImode parallel RHS so at least you should > put this check after the icode != CODE_FOR_nothing check? RISC-V has an option `-mno-strict-align` to enable mis-aligned access, but we didn't provide movmisalign pattern, I guess might be another issue, I tried to add movmisalign pattern and then it will feed PARALLEL into that, and got into a similar situation again, most targets are not except got PARALLEL, so that's the reason why I put before icode != CODE_FOR_nothing check. > Also wouldn't it be better to support PARALLEL from within > store_bit_field? For the above reason, I think store_bit_field supports PARALLEL is not enough. > After all this is a misaligned access and > since you didn't quote the RTL involved I'm guessing that > emit_group_store knows nothing about this fact. Oh, good point, how do you think about loading into temp and then storing it in the DECL, so that we could use original logic to handle misaligned store. A PoC for this idea, didn't fully tested yet: diff --git a/gcc/expr.c b/gcc/expr.c index 1a15f24b3979..6ef97740c764 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -5166,8 +5166,19 @@ expand_assignment (tree to, tree from, bool nontemporal) || targetm.slow_unaligned_access (mode, align))) { rtx reg, mem; + push_temp_slots (); reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL); + + if (GET_CODE (reg) == PARALLEL) + { + rtx temp = assign_stack_temp (mode, + GET_MODE_SIZE (mode)); + emit_group_store (temp, reg, TREE_TYPE (from), + int_size_in_bytes (TREE_TYPE (from))); + reg = temp; + } + reg = force_not_mem (reg); mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE); if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to)) @@ -5186,6 +5197,7 @@ expand_assignment (tree to, tree from, bool nontemporal) else store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg, false); + pop_temp_slots (); return; }
On Thu, 10 Sep 2020, Kito Cheng wrote: > Hi Richard: > > Thanks for your review :) > > > riscv doesn't seem to have movmisalign and I don't see why movmisalign > > should not support a TImode parallel RHS so at least you should > > put this check after the icode != CODE_FOR_nothing check? > > RISC-V has an option `-mno-strict-align` to enable mis-aligned access, > but we didn't provide movmisalign pattern, I guess might be another issue, > I tried to add movmisalign pattern and then it will feed PARALLEL into that, > and got into a similar situation again, most targets are not except > got PARALLEL, > so that's the reason why I put before icode != CODE_FOR_nothing check. > > > Also wouldn't it be better to support PARALLEL from within > > store_bit_field? > For the above reason, I think store_bit_field supports PARALLEL is not enough. > > > After all this is a misaligned access and > > since you didn't quote the RTL involved I'm guessing that > > emit_group_store knows nothing about this fact. > > Oh, good point, how do you think about loading into temp and then > storing it in the DECL, so that we could use original logic to > handle misaligned store. > > A PoC for this idea, didn't fully tested yet: > > diff --git a/gcc/expr.c b/gcc/expr.c > index 1a15f24b3979..6ef97740c764 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -5166,8 +5166,19 @@ expand_assignment (tree to, tree from, bool nontemporal) > || targetm.slow_unaligned_access (mode, align))) > { > rtx reg, mem; > + push_temp_slots (); > > reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL); I'd say there must be (wishful thinking) some expand_expr modifier that guarantees this? Alternatively use reg = force_reg (mode, reg); instead of assign_stack_temp & emit_group_store? That said, I don't know very much about the fancy ways to handle stores from PARALLEL - but doesn't PARALLEL mean we can just take any of its members as source? How does 'reg' look like exactly in your case? What does 'to' expand to? Richard. > + > + if (GET_CODE (reg) == PARALLEL) > + { > + rtx temp = assign_stack_temp (mode, > + GET_MODE_SIZE (mode)); > + emit_group_store (temp, reg, TREE_TYPE (from), > + int_size_in_bytes (TREE_TYPE (from))); > + reg = temp; > + } > + > reg = force_not_mem (reg); > mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE); > if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to)) > @@ -5186,6 +5197,7 @@ expand_assignment (tree to, tree from, bool nontemporal) > else > store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg, > false); > + pop_temp_slots (); > return; > } >
> I'd say there must be (wishful thinking) some expand_expr > modifier that guarantees this? Alternatively use > > reg = force_reg (mode, reg); > > instead of assign_stack_temp & emit_group_store? > > That said, I don't know very much about the fancy ways to handle > stores from PARALLEL - but doesn't PARALLEL mean we can just take > any of its members as source? How does 'reg' look like exactly > in your case? What does 'to' expand to? No, PARALLEL describes a multi-register layout, typically a return value with aggregate type, and you need to use emit_group_store to decipher it properly. emit_group_store should already be able to handle misaligned stores I think.
On Thu, Sep 10, 2020 at 5:41 PM Eric Botcazou <botcazou@adacore.com> wrote: > > > I'd say there must be (wishful thinking) some expand_expr > > modifier that guarantees this? Alternatively use > > > > reg = force_reg (mode, reg); I did a try, seems like force_reg didn't handle PARALLE :( > > > > instead of assign_stack_temp & emit_group_store? > > > > That said, I don't know very much about the fancy ways to handle > > stores from PARALLEL - but doesn't PARALLEL mean we can just take > > any of its members as source? How does 'reg' look like exactly > > in your case? What does 'to' expand to? 'reg' look like this; (parallel:TI [ (expr_list:REG_DEP_TRUE (reg:SI 72) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:DF 73) (const_int 8 [0x8]))]) 'to' is global var decl as struct S {int a; double b;}, align to 4, and TYPE_MODE is TImode, and TImode requires 8 byte alignment. > > No, PARALLEL describes a multi-register layout, typically a return value with > aggregate type, and you need to use emit_group_store to decipher it properly. > > emit_group_store should already be able to handle misaligned stores I think. > > -- > Eric Botcazou > >
Optimized version of the v2 patch, get rid of assign_stack_temp. diff --git a/gcc/expr.c b/gcc/expr.c index 1a15f24b3979..5f744a6c1b8d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -5168,7 +5168,16 @@ expand_assignment (tree to, tree from, bool nontemporal) rtx reg, mem; reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL); - reg = force_not_mem (reg); + + if (GET_CODE (reg) == PARALLEL) + { + rtx temp = gen_reg_rtx (mode); + emit_group_store (temp, reg, TREE_TYPE (from), + int_size_in_bytes (TREE_TYPE (from))); + reg = temp; + } + + reg = force_not_mem (mode, reg); mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE); if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to)) reg = flip_storage_order (mode, reg);
diff --git a/gcc/expr.c b/gcc/expr.c index 1a15f24b3979..cc2bc52c457f 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -5173,7 +5173,14 @@ expand_assignment (tree to, tree from, bool nontemporal) if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to)) reg = flip_storage_order (mode, reg); - if (icode != CODE_FOR_nothing) + if (GET_CODE (reg) == PARALLEL) + { + push_temp_slots (); + emit_group_store (mem, reg, TREE_TYPE (from), + int_size_in_bytes (TREE_TYPE (from))); + pop_temp_slots (); + } + else if (icode != CODE_FOR_nothing) { class expand_operand ops[2]; diff --git a/gcc/testsuite/g++.target/riscv/pr96759.C b/gcc/testsuite/g++.target/riscv/pr96759.C new file mode 100644 index 000000000000..673999a4baf7 --- /dev/null +++ b/gcc/testsuite/g++.target/riscv/pr96759.C @@ -0,0 +1,8 @@ +/* { dg-options "-mno-strict-align -std=gnu++17" } */ +/* { dg-do compile } */ +struct S { + int a; + double b; +}; +S GetNumbers(); +auto [globalC, globalD] = GetNumbers(); diff --git a/gcc/testsuite/gcc.target/riscv/pr96759.c b/gcc/testsuite/gcc.target/riscv/pr96759.c new file mode 100644 index 000000000000..621c39196fca --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr96759.c @@ -0,0 +1,13 @@ +/* { dg-options "-mno-strict-align" } */ +/* { dg-do compile } */ + +struct S { + int a; + double b; +}; +struct S GetNumbers(); +struct S g; + +void foo(){ + g = GetNumbers(); +}