diff mbox

[avr,pr71676,and,pr71678] Issues with casesi expand

Message ID c190b63b-a993-e0f4-a103-239b528d08c1@microchip.com
State New
Headers show

Commit Message

Pitchumani Sivanupandi Sept. 26, 2016, 1:19 p.m. UTC
Attached patch for PR71676 and PR71678.

PR71676 is for AVR target that generates wrong code when switch case 
index is
more than 16 bits.

Switch case index of larger than SImode are checked for out of range before
'casesi' expand. RTL expand of casesi gets index as SImode, but index is
compared in HImode and ignores upper 16bits.

Attached patch changes the expansion for casesi to make the index comparison
in SImode and code generation accordingly.

PR71678 is ICE because below pattern in 'casesi' is not recognized.
(set (reg:HI 47)
      (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
                (reg:HI 45)))

Fix of PR71676 avoids the above pattern as it changes the comparison
to SImode.

Regtested using avrtest. No regression found.

If OK, could someone commit please?

Is this OK for gcc-5-branch?

Regards,
Pitchumani

gcc/ChangeLog

2016-09-26  Pitchumani Sivanupandi  <pitchumani.s@atmel.com>

     PR target/71676
     PR target/71678
     * config/avr/avr.md (casesi): Change index compare to SI mode.

gcc/testsuite/ChangeLog

2016-09-26  Pitchumani Sivanupandi  <pitchumani.s@atmel.com>

     PR target/71676
     PR target/71678
     * gcc.target/avr/pr71676-1.c: New test.
     * gcc.target/avr/pr71676.c: New test.
     * gcc.target/avr/pr71678.c: New test.

Comments

Georg-Johann Lay Sept. 26, 2016, 2:49 p.m. UTC | #1
On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:
> Attached patch for PR71676 and PR71678.
>
> PR71676 is for AVR target that generates wrong code when switch case index is
> more than 16 bits.
>
> Switch case index of larger than SImode are checked for out of range before
> 'casesi' expand. RTL expand of casesi gets index as SImode, but index is
> compared in HImode and ignores upper 16bits.
>
> Attached patch changes the expansion for casesi to make the index comparison
> in SImode and code generation accordingly.
>
> PR71678 is ICE because below pattern in 'casesi' is not recognized.
> (set (reg:HI 47)
>      (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
>                (reg:HI 45)))
>
> Fix of PR71676 avoids the above pattern as it changes the comparison
> to SImode.

But this means that all comparisons are now performed in SImode which is a 
great performance loss for most programs which will switch on 16-bit values.

IMO we need a less intrusive (w.r.t. performance) approach.

Johann


> Regtested using avrtest. No regression found.
>
> If OK, could someone commit please?
>
> Is this OK for gcc-5-branch?
>
> Regards,
> Pitchumani
>
> gcc/ChangeLog
>
> 2016-09-26  Pitchumani Sivanupandi  <pitchumani.s@atmel.com>
>
>     PR target/71676
>     PR target/71678
>     * config/avr/avr.md (casesi): Change index compare to SI mode.
>
> gcc/testsuite/ChangeLog
>
> 2016-09-26  Pitchumani Sivanupandi  <pitchumani.s@atmel.com>
>
>     PR target/71676
>     PR target/71678
>     * gcc.target/avr/pr71676-1.c: New test.
>     * gcc.target/avr/pr71676.c: New test.
>     * gcc.target/avr/pr71678.c: New test.
>
Pitchumani Sivanupandi Oct. 13, 2016, 11:44 a.m. UTC | #2
On Monday 26 September 2016 08:19 PM, Georg-Johann Lay wrote:
> On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:
>> Attached patch for PR71676 and PR71678.
>>
>> PR71676 is for AVR target that generates wrong code when switch case 
>> index is
>> more than 16 bits.
>>
>> Switch case index of larger than SImode are checked for out of range 
>> before
>> 'casesi' expand. RTL expand of casesi gets index as SImode, but index is
>> compared in HImode and ignores upper 16bits.
>>
>> Attached patch changes the expansion for casesi to make the index 
>> comparison
>> in SImode and code generation accordingly.
>>
>> PR71678 is ICE because below pattern in 'casesi' is not recognized.
>> (set (reg:HI 47)
>>      (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
>>                (reg:HI 45)))
>>
>> Fix of PR71676 avoids the above pattern as it changes the comparison
>> to SImode.
>
> But this means that all comparisons are now performed in SImode which 
> is a great performance loss for most programs which will switch on 
> 16-bit values.
>
> IMO we need a less intrusive (w.r.t. performance) approach.

Yes.

I tried to split 'casesi' into several based on case values so that 
compare is done
in less expensive modes (i.e. QI or HI). In few cases it is not possible 
without
SImode subtract/ compare.

Pattern casesi will have index in SI mode. So, out of range checks will 
be expensive
as most common uses (in AVR) of case values will be in QI/HI mode.

e.g.
   if case values in QI range
     if upper three bytes index is set
       goto out_of_range

     offset = index - lower_bound (QImode)
     if offset > case_range       (QImode)
       goto out_of_range
     goto jump_table + offset

   else if case values in HI range
     if index[2,3] is set
       goto out_of_range

     offset = index - lower_bound (HImode)
     if offset > case_range       (HImode)
       goto out_of_range
     goto jump_table + offset

This modification will not work for the negative index values. Because 
code to check
upper bytes of index will be expensive than the SImode subtract/ compare.

So, I'm trying to update fix to have SImode subtract/ compare if the 
case values include
negative integers. For, others will try to optimize as mentioned above. 
Is that approach OK?

Alternatively we can have flags to generate shorter code for 'casesi' 
using HImode
subtract/ compare. But correctness is not guaranteed (PR71676).

Regards,
Pitchumani
Georg-Johann Lay Oct. 13, 2016, 3:12 p.m. UTC | #3
On 13.10.2016 13:44, Pitchumani Sivanupandi wrote:
> On Monday 26 September 2016 08:19 PM, Georg-Johann Lay wrote:
>> On 26.09.2016 15:19, Pitchumani Sivanupandi wrote:
>>> Attached patch for PR71676 and PR71678.
>>>
>>> PR71676 is for AVR target that generates wrong code when switch case index is
>>> more than 16 bits.
>>>
>>> Switch case index of larger than SImode are checked for out of range before
>>> 'casesi' expand. RTL expand of casesi gets index as SImode, but index is
>>> compared in HImode and ignores upper 16bits.
>>>
>>> Attached patch changes the expansion for casesi to make the index comparison
>>> in SImode and code generation accordingly.
>>>
>>> PR71678 is ICE because below pattern in 'casesi' is not recognized.
>>> (set (reg:HI 47)
>>>      (minus:HI (subreg:HI (subreg:SI (reg:DI 44) 0) 0)
>>>                (reg:HI 45)))
>>>
>>> Fix of PR71676 avoids the above pattern as it changes the comparison
>>> to SImode.
>>
>> But this means that all comparisons are now performed in SImode which is a
>> great performance loss for most programs which will switch on 16-bit values.
>>
>> IMO we need a less intrusive (w.r.t. performance) approach.
>
> Yes.
>
> I tried to split 'casesi' into several based on case values so that compare is
> done
> in less expensive modes (i.e. QI or HI). In few cases it is not possible without
> SImode subtract/ compare.
>
> Pattern casesi will have index in SI mode. So, out of range checks will be
> expensive
> as most common uses (in AVR) of case values will be in QI/HI mode.
>
> e.g.
>   if case values in QI range
>     if upper three bytes index is set
>       goto out_of_range
>
>     offset = index - lower_bound (QImode)
>     if offset > case_range       (QImode)
>       goto out_of_range
>     goto jump_table + offset
>
>   else if case values in HI range
>     if index[2,3] is set
>       goto out_of_range
>
>     offset = index - lower_bound (HImode)
>     if offset > case_range       (HImode)
>       goto out_of_range
>     goto jump_table + offset
>
> This modification will not work for the negative index values. Because code to
> check
> upper bytes of index will be expensive than the SImode subtract/ compare.
>
> So, I'm trying to update fix to have SImode subtract/ compare if the case
> values include
> negative integers. For, others will try to optimize as mentioned above. Is that
> approach OK?

But the above code will be executed at run time and add even more overhead, or 
am I missing something?  If you conclude statically at expand time from the 
case ranges then we might hit a similar problem as with the original subreg 
computation.

Unfortunately, the generated code (setting cc0, a reg and pc) cannot be wrapped 
into an unspec or parallel and then later be rectified...

I am thinking about a new avr target pass to tidy up the code if no 32-bit 
computation is needed, but this will be some effort.


Johann

>
> Alternatively we can have flags to generate shorter code for 'casesi' using HImode
> subtract/ compare. But correctness is not guaranteed (PR71676).
>
> Regards,
> Pitchumani
>
>
diff mbox

Patch

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 97f3561..4b1bf9c 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -5155,12 +5155,12 @@ 
 
 (define_expand "casesi"
   [(parallel [(set (match_dup 6)
-                   (minus:HI (subreg:HI (match_operand:SI 0 "register_operand" "") 0)
-                             (match_operand:HI 1 "register_operand" "")))
+                   (minus:SI (match_operand:SI 0 "register_operand" "")
+                             (match_operand:SI 1 "register_operand" "")))
               (clobber (scratch:QI))])
    (parallel [(set (cc0)
                    (compare (match_dup 6)
-                            (match_operand:HI 2 "register_operand" "")))
+                            (match_operand:SI 2 "register_operand" "")))
               (clobber (match_scratch:QI 9 ""))])
 
    (set (pc)
@@ -5179,20 +5179,20 @@ 
               (clobber (match_dup 8))])]
   ""
   {
-    operands[6] = gen_reg_rtx (HImode);
+    operands[6] = gen_reg_rtx (SImode);
 
     if (AVR_HAVE_EIJMP_EICALL)
       {
-        operands[7] = operands[6];
+        operands[7] = simplify_gen_subreg (HImode, operands[6], SImode, 0);
         operands[8] = all_regs_rtx[24];
         operands[10] = gen_rtx_REG (HImode, REG_Z);
       }
     else
       {
-        operands[7] = gen_rtx_PLUS (HImode, operands[6],
+        operands[7] = gen_rtx_PLUS (HImode, simplify_gen_subreg (HImode, operands[6], SImode, 0),
                                     gen_rtx_LABEL_REF (VOIDmode, operands[3]));
         operands[8] = const0_rtx;
-        operands[10] = operands[6];
+        operands[10] = simplify_gen_subreg (HImode, operands[6], SImode, 0);
       }
   })
 
diff --git a/gcc/testsuite/gcc.target/avr/pr71676-1.c b/gcc/testsuite/gcc.target/avr/pr71676-1.c
new file mode 100644
index 0000000..9a74909
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr71676-1.c
@@ -0,0 +1,332 @@ 
+/* { dg-do run } */
+/* { dg-options "-Os -Wno-overflow" } */
+
+#include "exit-abort.h"
+volatile unsigned char y;
+
+unsigned char __attribute__((noinline)) foo1 (char x)
+{
+    switch (x)
+    {
+      case (char)0x11: y = 7; break;
+      case (char)0x12: y = 4; break;
+      case (char)0x13: y = 8; break;
+      case (char)0x14: y = 21; break;
+      case (char)0x15: y = 65; break;
+      case (char)0x16: y = 27; break;
+      case (char)0x17: y = 72; break;
+      case (char)0x18: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo2 (char x)
+{
+    switch (x)
+    {
+      case 0x01: y = 7; break;
+      case 0x02: y = 4; break;
+      case 0x03: y = 8; break;
+      case 0x04: y = 21; break;
+      case 0x05: y = 65; break;
+      case 0x06: y = 27; break;
+      case 0x07: y = 72; break;
+      case 0x08: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo3 (char x)
+{
+    switch (x)
+    {
+      case 0x1000001L: y = 7; break;
+      case 0x1000002L: y = 4; break;
+      case 0x1000003L: y = 8; break;
+      case 0x1000004L: y = 21; break;
+      case 0x1000005L: y = 65; break;
+      case 0x1000006L: y = 27; break;
+      case 0x1000007L: y = 72; break;
+      case 0x1000008L: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo4 (char x)
+{
+    switch (x)
+    {
+      case 0x100000001LL: y = 7; break;
+      case 0x100000002LL: y = 4; break;
+      case 0x100000003LL: y = 8; break;
+      case 0x100000004LL: y = 21; break;
+      case 0x100000005LL: y = 65; break;
+      case 0x100000006LL: y = 27; break;
+      case 0x100000007LL: y = 72; break;
+      case 0x100000008LL: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo5 (int x)
+{
+    switch (x)
+    {
+      case (char)0x11: y = 7; break;
+      case (char)0x12: y = 4; break;
+      case (char)0x13: y = 8; break;
+      case (char)0x14: y = 21; break;
+      case (char)0x15: y = 65; break;
+      case (char)0x16: y = 27; break;
+      case (char)0x17: y = 72; break;
+      case (char)0x18: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo6 (int x)
+{
+    switch (x)
+    {
+      case 0x101: y = 7; break;
+      case 0x102: y = 4; break;
+      case 0x103: y = 8; break;
+      case 0x104: y = 21; break;
+      case 0x105: y = 65; break;
+      case 0x106: y = 27; break;
+      case 0x107: y = 72; break;
+      case 0x108: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo7 (int x)
+{
+    switch (x)
+    {
+      case 0x1000001L: y = 7; break;
+      case 0x1000002L: y = 4; break;
+      case 0x1000003L: y = 8; break;
+      case 0x1000004L: y = 21; break;
+      case 0x1000005L: y = 65; break;
+      case 0x1000006L: y = 27; break;
+      case 0x1000007L: y = 72; break;
+      case 0x1000008L: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo8 (int x)
+{
+    switch (x)
+    {
+      case 0x100000001LL: y = 7; break;
+      case 0x100000002LL: y = 4; break;
+      case 0x100000003LL: y = 8; break;
+      case 0x100000004LL: y = 21; break;
+      case 0x100000005LL: y = 65; break;
+      case 0x100000006LL: y = 27; break;
+      case 0x100000007LL: y = 72; break;
+      case 0x100000008LL: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo9 (long x)
+{
+    switch (x)
+    {
+      case (char)0x11: y = 7; break;
+      case (char)0x12: y = 4; break;
+      case (char)0x13: y = 8; break;
+      case (char)0x14: y = 21; break;
+      case (char)0x15: y = 65; break;
+      case (char)0x16: y = 27; break;
+      case (char)0x17: y = 72; break;
+      case (char)0x18: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo10 (unsigned long x)
+{
+    switch (x)
+    {
+      case 0x100: y = 39; break;
+      case 0x101: y = 7; break;
+      case 0x102: y = 4; break;
+      case 0x103: y = 8; break;
+      case 0x104: y = 21; break;
+      case 0x105: y = 65; break;
+      case 0x106: y = 27; break;
+      case 0x107: y = 72; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo11 (long x)
+{
+    switch (x)
+    {
+      case 0x1000001L: y = 7; break;
+      case 0x1000002L: y = 4; break;
+      case 0x1000003L: y = 8; break;
+      case 0x1000004L: y = 21; break;
+      case 0x1000005L: y = 65; break;
+      case 0x1000006L: y = 27; break;
+      case 0x1000007L: y = 72; break;
+      case 0x1000008L: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo12 (long x)
+{
+    switch (x)
+    {
+      case 0x100000001LL: y = 7; break;
+      case 0x100000002LL: y = 4; break;
+      case 0x100000003LL: y = 8; break;
+      case 0x100000004LL: y = 21; break;
+      case 0x100000005LL: y = 65; break;
+      case 0x100000006LL: y = 27; break;
+      case 0x100000007LL: y = 72; break;
+      case 0x100000008LL: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo13 (long long x)
+{
+    switch (x)
+    {
+      case (char)0x11: y = 7; break;
+      case (char)0x12: y = 4; break;
+      case (char)0x13: y = 8; break;
+      case (char)0x14: y = 21; break;
+      case (char)0x15: y = 65; break;
+      case (char)0x16: y = 27; break;
+      case (char)0x17: y = 72; break;
+      case (char)0x18: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo14 (long long x)
+{
+    switch (x)
+    {
+      case 0x101: y = 7; break;
+      case 0x102: y = 4; break;
+      case 0x103: y = 8; break;
+      case 0x104: y = 21; break;
+      case 0x105: y = 65; break;
+      case 0x106: y = 27; break;
+      case 0x107: y = 72; break;
+      case 0x108: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+unsigned char __attribute__((noinline)) foo15 (long long x)
+{
+    switch (x)
+    {
+      case 0x1000001L: y = 7; break;
+      case 0x1000002L: y = 4; break;
+      case 0x1000003L: y = 8; break;
+      case 0x1000004L: y = 21; break;
+      case 0x1000005L: y = 65; break;
+      case 0x1000006L: y = 27; break;
+      case 0x1000007L: y = 72; break;
+      case 0x1000008L: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+
+unsigned char __attribute__((noinline)) foo16 (long long x)
+{
+    switch (x)
+    {
+      case 0x100000001LL: y = 7; break;
+      case 0x100000002LL: y = 4; break;
+      case 0x100000003LL: y = 8; break;
+      case 0x100000004LL: y = 21; break;
+      case 0x100000005LL: y = 65; break;
+      case 0x100000006LL: y = 27; break;
+      case 0x100000007LL: y = 72; break;
+      case 0x100000008LL: y = 39; break;
+      default: y=0;
+    }
+    return y;
+}
+
+int main ()
+{
+	if (foo1 (0x13) != 8)
+	  abort();
+
+	if (foo2 (0x06) != 27)
+	  abort();
+
+	if (foo3 (0x02) != 4)
+	  abort();
+
+	if (foo4 (0x01) != 7)
+	  abort();
+
+	if (foo5 (0x15) != 65)
+	  abort();
+
+	if (foo6 (0x103) != 8)
+	  abort();
+
+	if (foo7 (0x04) != 21)
+	  abort();
+
+	if (foo8 (0x07) != 72)
+	  abort();
+
+	if (foo9 (0x10000011L) != 0)
+	  abort();
+
+	if (foo10 (0x1000105L) != 0)
+	  abort();
+
+	if (foo11 (0x1000008L) != 39)
+	  abort();
+
+	if (foo12 (0x1000004L) != 0)
+	  abort();
+
+	if (foo13 (0x109LL) != 0)
+	  abort();
+
+	if (foo14 (0x108LL) != 39)
+	  abort();
+
+	if (foo15 (0x1000001LL) != 7)
+	  abort();
+
+	if (foo16 (0x100000004LL) != 21)
+	  abort();
+
+    return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/avr/pr71676.c b/gcc/testsuite/gcc.target/avr/pr71676.c
new file mode 100644
index 0000000..d7a543a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr71676.c
@@ -0,0 +1,31 @@ 
+/* { dg-do run } */
+/* { dg-options "-Os" } */
+
+#include "exit-abort.h"
+
+volatile unsigned char y;
+
+__attribute__((noinline,noclone))
+unsigned char foo (unsigned long x) 
+{
+    switch (x)
+    {
+        case 0:	y = 67; break;
+        case 1:	y = 20; break;
+        case 2:	y = 109; break;
+        case 3:	y = 33; break;
+        case 4:	y = 44; break;
+        case 5:	y = 37; break;
+        case 6:	y = 10; break;
+        case 7:	y = 98; break;
+    }
+    return y;
+}
+
+int main (void)
+{
+    if (0 != foo (7L + 0x10000L))
+        abort();
+    return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/avr/pr71678.c b/gcc/testsuite/gcc.target/avr/pr71678.c
new file mode 100644
index 0000000..290d6cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr71678.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Os -fno-tree-switch-conversion" } */
+
+unsigned char foo (long long x) 
+{
+    unsigned char y = 0;
+    switch (x)
+    {
+        case 0: y = 67; break;
+        case 1: y = 20; break;
+        case 2: y = 109; break;
+        case 3: y = 33; break;
+        case 4: y = 44; break;
+        case 5: y = 37; break;
+        case 6: y = 10; break;
+        case 7: y = 11; break;
+    }
+    return y;
+}
+