Message ID | 20100825170157.GW702@tyan-ft48-01.lab.bos.redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Aug 25, 2010 at 7:01 PM, Jakub Jelinek <jakub@redhat.com> wrote: > Hi! > > On this testcase, simplify_shift_const_1 is called on LSHIFTRT 24 > (subreg:SI (ashift:V4HI (reg:V4HI ...) (const_int 16)) > varop and happily first optimizes this into AND 255 on the > reg:V4HI and then crashes soon afterwards. > IMHO the only case when we should look through low part subregs > is when both modes are MODE_INT, when crossing mode class boundary > it will be wrong (e.g. vector mode shift is something completely > different from integral shift) and even when changing vector mode size, > or doing subreg between different floating point modes etc. isn't going > to work very well. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux. > Ok for trunk/4.5? Ok. Thanks, Richard. > 2010-08-25 Jakub Jelinek <jakub@redhat.com> > > PR rtl-optimization/45400 > * combine.c (simplify_shift_const_1) <case SUBREG>: Only use > SUBREG_REG if both modes are of MODE_INT class. > > * g++.dg/other/i386-8.C: New test. > > --- gcc/combine.c.jj 2010-08-12 11:12:13.000000000 +0200 > +++ gcc/combine.c 2010-08-25 11:10:01.694939344 +0200 > @@ -9505,7 +9505,9 @@ simplify_shift_const_1 (enum rtx_code co > > GET_MODE_SIZE (GET_MODE (varop))) > && (unsigned int) ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (varop))) > + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) > - == mode_words) > + == mode_words > + && GET_MODE_CLASS (GET_MODE (varop)) == MODE_INT > + && GET_MODE_CLASS (GET_MODE (SUBREG_REG (varop))) == MODE_INT) > { > varop = SUBREG_REG (varop); > if (GET_MODE_SIZE (GET_MODE (varop)) > GET_MODE_SIZE (mode)) > --- gcc/testsuite/g++.dg/other/i386-8.C.jj 2010-08-25 11:19:54.799157615 +0200 > +++ gcc/testsuite/g++.dg/other/i386-8.C 2010-08-25 11:21:20.688108083 +0200 > @@ -0,0 +1,23 @@ > +// PR rtl-optimization/45400 > +// { dg-do compile { target i?86-*-* x86_64-*-* } } > +// { dg-options "-O2 -msse2" } > +// { dg-options "-O2 -msse2 -fpic" { target fpic } } > +// { dg-require-effective-target sse2 } > + > +#include <xmmintrin.h> > + > +static inline unsigned short > +bar (unsigned short x) > +{ > + return ((x << 8) | (x >> 8)); > +} > + > +unsigned int > +foo (float *x, short *y) > +{ > + __m128 a = _mm_set_ps1 (32767.5f); > + __m128 b = _mm_mul_ps (_mm_load_ps (x), a); > + __m64 c = _mm_cvtps_pi16 (b); > + __builtin_memcpy (y, &c, sizeof (short) * 4); > + y[0] = bar (y[0]); > +} > > Jakub >
different from integral shift) and even when changing vector mode size, or doing subreg between different floating point modes etc. isn't going to work very well. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux. Ok for trunk/4.5? 2010-08-25 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/45400 * combine.c (simplify_shift_const_1) <case SUBREG>: Only use SUBREG_REG if both modes are of MODE_INT class. * g++.dg/other/i386-8.C: New test. --- gcc/combine.c.jj 2010-08-12 11:12:13.000000000 +0200 +++ gcc/combine.c 2010-08-25 11:10:01.694939344 +0200 @@ -9505,7 +9505,9 @@ simplify_shift_const_1 (enum rtx_code co > GET_MODE_SIZE (GET_MODE (varop))) && (unsigned int) ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (varop))) + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) - == mode_words) + == mode_words + && GET_MODE_CLASS (GET_MODE (varop)) == MODE_INT + && GET_MODE_CLASS (GET_MODE (SUBREG_REG (varop))) == MODE_INT) { varop = SUBREG_REG (varop); if (GET_MODE_SIZE (GET_MODE (varop)) > GET_MODE_SIZE (mode)) --- gcc/testsuite/g++.dg/other/i386-8.C.jj 2010-08-25 11:19:54.799157615 +0200 +++ gcc/testsuite/g++.dg/other/i386-8.C 2010-08-25 11:21:20.688108083 +0200 @@ -0,0 +1,23 @@ +// PR rtl-optimization/45400 +// { dg-do compile { target i?86-*-* x86_64-*-* } } +// { dg-options "-O2 -msse2" } +// { dg-options "-O2 -msse2 -fpic" { target fpic } } +// { dg-require-effective-target sse2 } + +#include <xmmintrin.h> + +static inline unsigned short +bar (unsigned short x) +{ + return ((x << 8) | (x >> 8)); +} + +unsigned int +foo (float *x, short *y) +{ + __m128 a = _mm_set_ps1 (32767.5f); + __m128 b = _mm_mul_ps (_mm_load_ps (x), a); + __m64 c = _mm_cvtps_pi16 (b); + __builtin_memcpy (y, &c, sizeof (short) * 4); + y[0] = bar (y[0]); +}