Message ID | 000001cfafc3$d5372c00$7fa58400$@arm.com |
---|---|
State | New |
Headers | show |
On Mon, Aug 4, 2014 at 11:09 AM, Zhenqiang Chen <zhenqiang.chen@arm.com> wrote: > > >> -----Original Message----- >> From: Bin.Cheng [mailto:amker.cheng@gmail.com] >> Sent: Monday, August 04, 2014 4:41 PM >> To: Zhenqiang Chen >> Cc: gcc-patches List >> Subject: Re: [PATCH, ivopt] Try aligned offset when get_address_cost >> >> On Mon, Aug 4, 2014 at 2:28 PM, Zhenqiang Chen >> <zhenqiang.chen@arm.com> wrote: >> > Hi, >> > >> > For some TARGET, like ARM THUMB1, the offset in load/store should be >> > nature aligned. But in function get_address_cost, when computing >> > max_offset, it only tries byte-aligned offsets: >> > >> > ((unsigned HOST_WIDE_INT) 1 << i) - 1 >> > >> > which can not meet thumb_legitimate_offset_p check called from >> > thumb1_legitimate_address_p for HImode and SImode. >> > >> > The patch adds additional try for aligned offset: >> > >> > ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode). >> > >> > Bootstrap and no make check regression on X86-64. >> > No make check regression on qemu for Cortex-m0 and Cortex-m3. >> > For Cortex-m0, no performance changes with coremark and dhrystone. >> > Coremark code size is ~0.44 smaller. And eembcv2 code size is ~0.22 >> > smaller. CSiBE code size is ~0.05% smaller. >> > >> > OK for trunk? >> > >> > Thanks! >> > -Zhenqiang >> > >> > ChangeLog >> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com> >> > >> > * tree-ssa-loop-ivopts.c (get_address_cost): Try aligned offset. >> > >> > testsuite/ChangeLog: >> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com> >> > >> > * gcc.target/arm/get_address_cost_aligned_max_offset.c: New > test. >> > >> > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >> > index 3b4a6cd..562122a 100644 >> > --- a/gcc/tree-ssa-loop-ivopts.c >> > +++ b/gcc/tree-ssa-loop-ivopts.c >> > @@ -3308,6 +3308,18 @@ get_address_cost (bool symbol_present, bool >> > var_present, >> > XEXP (addr, 1) = gen_int_mode (off, address_mode); >> > if (memory_address_addr_space_p (mem_mode, addr, as)) >> > break; >> > + /* For some TARGET, like ARM THUMB1, the offset should be > nature >> > + aligned. Try an aligned offset if address_mode is not > QImode. >> > */ >> > + off = (address_mode == QImode) >> > + ? 0 >> > + : ((unsigned HOST_WIDE_INT) 1 << i) >> > + - GET_MODE_SIZE (address_mode); >> > + if (off > 0) >> > + { >> > + XEXP (addr, 1) = gen_int_mode (off, address_mode); >> > + if (memory_address_addr_space_p (mem_mode, addr, as)) >> > + break; >> > + } >> Hi, Why not just check "address_mode != QImode"? Set off to 0 then check > it >> seems unnecessary. > > Thanks for the comments. > > ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode) might be a > negative value except QImode. A negative value can not be max_offset. So we > do not need to check it. > > For QImode, "((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE > (address_mode)" == "((unsigned HOST_WIDE_INT) 1 << i) - 1". It is already > checked. So no need to check it again. > > I think the compiler can optimize the patch like > > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c > index 3b4a6cd..213598a 100644 > --- a/gcc/tree-ssa-loop-ivopts.c > +++ b/gcc/tree-ssa-loop-ivopts.c > @@ -3308,6 +3308,19 @@ get_address_cost (bool symbol_present, bool > var_present, > XEXP (addr, 1) = gen_int_mode (off, address_mode); > if (memory_address_addr_space_p (mem_mode, addr, as)) > break; > + /* For some TARGET, like ARM THUMB1, the offset should be nature > + aligned. Try an aligned offset if address_mode is not QImode. > */ > + if (address_mode != QImode) > + { > + off = ((unsigned HOST_WIDE_INT) 1 << i) > + - GET_MODE_SIZE (address_mode); > + if (off > 0) > + { > + XEXP (addr, 1) = gen_int_mode (off, address_mode); > + if (memory_address_addr_space_p (mem_mode, addr, as)) > + break; > + } > + } > } > if (i == -1) > off = 0; But is off now guaranteed to be the max value? (1 << (i-1) ) - 1 for small i is larger than (1 << i) - GET_MODE_SIZE (address_mode). That is, I think you want to guard this with 1 << (i - 1) > GET_MODE_SIZE (address_mode)? You don't adjust the negative offset side - why? Richard. > >> Thanks, >> bin >> > } >> > if (i == -1) >> > off = 0; >> > diff --git >> > a/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c >> > b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c >> > new file mode 100644 >> > index 0000000..cc3e2f7 >> > --- /dev/null >> > +++ >> b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset >> > +++ .c >> > @@ -0,0 +1,28 @@ >> > +/* { dg-do compile } */ >> > +/* { dg-options "-mthumb -O2" } */ >> > +/* { dg-require-effective-target arm_thumb1_ok } */ >> > + >> > +unsigned int >> > +test (const short p16[6 * 64]) >> > +{ >> > + unsigned int i = 6; >> > + unsigned int ret = 0; >> > + >> > + do >> > + { >> > + unsigned long long *p64 = (unsigned long long*) p16; >> > + unsigned int *p32 = (unsigned int*) p16; >> > + ret += ret; >> > + if (p16[1] || p32[1]) >> > + ret++; >> > + else if (p64[1] | p64[2] | p64[3]) >> > + ret++; >> > + p16 += 64; >> > + i--; >> > + } while (i != 0); >> > + >> > + return ret; >> > +} >> > + >> > +/* { dg-final { scan-assembler-not "#22" } } */ >> > +/* { dg-final { scan-assembler-not "#14" } } */ >> > >> > >> > > > > >
On 5 August 2014 21:59, Richard Biener <richard.guenther@gmail.com> wrote: > On Mon, Aug 4, 2014 at 11:09 AM, Zhenqiang Chen <zhenqiang.chen@arm.com> wrote: >> >> >>> -----Original Message----- >>> From: Bin.Cheng [mailto:amker.cheng@gmail.com] >>> Sent: Monday, August 04, 2014 4:41 PM >>> To: Zhenqiang Chen >>> Cc: gcc-patches List >>> Subject: Re: [PATCH, ivopt] Try aligned offset when get_address_cost >>> >>> On Mon, Aug 4, 2014 at 2:28 PM, Zhenqiang Chen >>> <zhenqiang.chen@arm.com> wrote: >>> > Hi, >>> > >>> > For some TARGET, like ARM THUMB1, the offset in load/store should be >>> > nature aligned. But in function get_address_cost, when computing >>> > max_offset, it only tries byte-aligned offsets: >>> > >>> > ((unsigned HOST_WIDE_INT) 1 << i) - 1 >>> > >>> > which can not meet thumb_legitimate_offset_p check called from >>> > thumb1_legitimate_address_p for HImode and SImode. >>> > >>> > The patch adds additional try for aligned offset: >>> > >>> > ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode). >>> > >>> > Bootstrap and no make check regression on X86-64. >>> > No make check regression on qemu for Cortex-m0 and Cortex-m3. >>> > For Cortex-m0, no performance changes with coremark and dhrystone. >>> > Coremark code size is ~0.44 smaller. And eembcv2 code size is ~0.22 >>> > smaller. CSiBE code size is ~0.05% smaller. >>> > >>> > OK for trunk? >>> > >>> > Thanks! >>> > -Zhenqiang >>> > >>> > ChangeLog >>> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com> >>> > >>> > * tree-ssa-loop-ivopts.c (get_address_cost): Try aligned offset. >>> > >>> > testsuite/ChangeLog: >>> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com> >>> > >>> > * gcc.target/arm/get_address_cost_aligned_max_offset.c: New >> test. >>> > >>> > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >>> > index 3b4a6cd..562122a 100644 >>> > --- a/gcc/tree-ssa-loop-ivopts.c >>> > +++ b/gcc/tree-ssa-loop-ivopts.c >>> > @@ -3308,6 +3308,18 @@ get_address_cost (bool symbol_present, bool >>> > var_present, >>> > XEXP (addr, 1) = gen_int_mode (off, address_mode); >>> > if (memory_address_addr_space_p (mem_mode, addr, as)) >>> > break; >>> > + /* For some TARGET, like ARM THUMB1, the offset should be >> nature >>> > + aligned. Try an aligned offset if address_mode is not >> QImode. >>> > */ >>> > + off = (address_mode == QImode) >>> > + ? 0 >>> > + : ((unsigned HOST_WIDE_INT) 1 << i) >>> > + - GET_MODE_SIZE (address_mode); >>> > + if (off > 0) >>> > + { >>> > + XEXP (addr, 1) = gen_int_mode (off, address_mode); >>> > + if (memory_address_addr_space_p (mem_mode, addr, as)) >>> > + break; >>> > + } >>> Hi, Why not just check "address_mode != QImode"? Set off to 0 then check >> it >>> seems unnecessary. >> >> Thanks for the comments. >> >> ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode) might be a >> negative value except QImode. A negative value can not be max_offset. So we >> do not need to check it. >> >> For QImode, "((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE >> (address_mode)" == "((unsigned HOST_WIDE_INT) 1 << i) - 1". It is already >> checked. So no need to check it again. >> >> I think the compiler can optimize the patch like >> >> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >> index 3b4a6cd..213598a 100644 >> --- a/gcc/tree-ssa-loop-ivopts.c >> +++ b/gcc/tree-ssa-loop-ivopts.c >> @@ -3308,6 +3308,19 @@ get_address_cost (bool symbol_present, bool >> var_present, >> XEXP (addr, 1) = gen_int_mode (off, address_mode); >> if (memory_address_addr_space_p (mem_mode, addr, as)) >> break; >> + /* For some TARGET, like ARM THUMB1, the offset should be nature >> + aligned. Try an aligned offset if address_mode is not QImode. >> */ >> + if (address_mode != QImode) >> + { >> + off = ((unsigned HOST_WIDE_INT) 1 << i) >> + - GET_MODE_SIZE (address_mode); >> + if (off > 0) >> + { >> + XEXP (addr, 1) = gen_int_mode (off, address_mode); >> + if (memory_address_addr_space_p (mem_mode, addr, as)) >> + break; >> + } >> + } >> } >> if (i == -1) >> off = 0; > > But is off now guaranteed to be the max value? (1 << (i-1) ) - 1 for > small i is larger than (1 << i) - GET_MODE_SIZE (address_mode). > > That is, I think you want to guard this with 1 << (i - 1) > > GET_MODE_SIZE (address_mode)? Yes. Without off > 0, it can not guarantee the off is the max value. With off > 0, it can guarantee that (1 << i) - GET_MODE_SIZE (address_mode) is greater than (1 << (i-1) ) - 1. > You don't adjust the negative offset side - why? -((unsigned HOST_WIDE_INT) 1 << i) is already the min aligned offset. Thanks! -Zhenqiang > Richard. > >> >>> Thanks, >>> bin >>> > } >>> > if (i == -1) >>> > off = 0; >>> > diff --git >>> > a/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c >>> > b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c >>> > new file mode 100644 >>> > index 0000000..cc3e2f7 >>> > --- /dev/null >>> > +++ >>> b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset >>> > +++ .c >>> > @@ -0,0 +1,28 @@ >>> > +/* { dg-do compile } */ >>> > +/* { dg-options "-mthumb -O2" } */ >>> > +/* { dg-require-effective-target arm_thumb1_ok } */ >>> > + >>> > +unsigned int >>> > +test (const short p16[6 * 64]) >>> > +{ >>> > + unsigned int i = 6; >>> > + unsigned int ret = 0; >>> > + >>> > + do >>> > + { >>> > + unsigned long long *p64 = (unsigned long long*) p16; >>> > + unsigned int *p32 = (unsigned int*) p16; >>> > + ret += ret; >>> > + if (p16[1] || p32[1]) >>> > + ret++; >>> > + else if (p64[1] | p64[2] | p64[3]) >>> > + ret++; >>> > + p16 += 64; >>> > + i--; >>> > + } while (i != 0); >>> > + >>> > + return ret; >>> > +} >>> > + >>> > +/* { dg-final { scan-assembler-not "#22" } } */ >>> > +/* { dg-final { scan-assembler-not "#14" } } */ >>> > >>> > >>> > >> >> >> >>
On Wed, Aug 6, 2014 at 8:34 AM, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote: > On 5 August 2014 21:59, Richard Biener <richard.guenther@gmail.com> wrote: >> On Mon, Aug 4, 2014 at 11:09 AM, Zhenqiang Chen <zhenqiang.chen@arm.com> wrote: >>> >>> >>>> -----Original Message----- >>>> From: Bin.Cheng [mailto:amker.cheng@gmail.com] >>>> Sent: Monday, August 04, 2014 4:41 PM >>>> To: Zhenqiang Chen >>>> Cc: gcc-patches List >>>> Subject: Re: [PATCH, ivopt] Try aligned offset when get_address_cost >>>> >>>> On Mon, Aug 4, 2014 at 2:28 PM, Zhenqiang Chen >>>> <zhenqiang.chen@arm.com> wrote: >>>> > Hi, >>>> > >>>> > For some TARGET, like ARM THUMB1, the offset in load/store should be >>>> > nature aligned. But in function get_address_cost, when computing >>>> > max_offset, it only tries byte-aligned offsets: >>>> > >>>> > ((unsigned HOST_WIDE_INT) 1 << i) - 1 >>>> > >>>> > which can not meet thumb_legitimate_offset_p check called from >>>> > thumb1_legitimate_address_p for HImode and SImode. >>>> > >>>> > The patch adds additional try for aligned offset: >>>> > >>>> > ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode). >>>> > >>>> > Bootstrap and no make check regression on X86-64. >>>> > No make check regression on qemu for Cortex-m0 and Cortex-m3. >>>> > For Cortex-m0, no performance changes with coremark and dhrystone. >>>> > Coremark code size is ~0.44 smaller. And eembcv2 code size is ~0.22 >>>> > smaller. CSiBE code size is ~0.05% smaller. >>>> > >>>> > OK for trunk? >>>> > >>>> > Thanks! >>>> > -Zhenqiang >>>> > >>>> > ChangeLog >>>> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com> >>>> > >>>> > * tree-ssa-loop-ivopts.c (get_address_cost): Try aligned offset. >>>> > >>>> > testsuite/ChangeLog: >>>> > 2014-08-04 Zhenqiang Chen <zhenqiang.chen@arm.com> >>>> > >>>> > * gcc.target/arm/get_address_cost_aligned_max_offset.c: New >>> test. >>>> > >>>> > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >>>> > index 3b4a6cd..562122a 100644 >>>> > --- a/gcc/tree-ssa-loop-ivopts.c >>>> > +++ b/gcc/tree-ssa-loop-ivopts.c >>>> > @@ -3308,6 +3308,18 @@ get_address_cost (bool symbol_present, bool >>>> > var_present, >>>> > XEXP (addr, 1) = gen_int_mode (off, address_mode); >>>> > if (memory_address_addr_space_p (mem_mode, addr, as)) >>>> > break; >>>> > + /* For some TARGET, like ARM THUMB1, the offset should be >>> nature >>>> > + aligned. Try an aligned offset if address_mode is not >>> QImode. >>>> > */ >>>> > + off = (address_mode == QImode) >>>> > + ? 0 >>>> > + : ((unsigned HOST_WIDE_INT) 1 << i) >>>> > + - GET_MODE_SIZE (address_mode); >>>> > + if (off > 0) >>>> > + { >>>> > + XEXP (addr, 1) = gen_int_mode (off, address_mode); >>>> > + if (memory_address_addr_space_p (mem_mode, addr, as)) >>>> > + break; >>>> > + } >>>> Hi, Why not just check "address_mode != QImode"? Set off to 0 then check >>> it >>>> seems unnecessary. >>> >>> Thanks for the comments. >>> >>> ((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE (address_mode) might be a >>> negative value except QImode. A negative value can not be max_offset. So we >>> do not need to check it. >>> >>> For QImode, "((unsigned HOST_WIDE_INT) 1 << i) - GET_MODE_SIZE >>> (address_mode)" == "((unsigned HOST_WIDE_INT) 1 << i) - 1". It is already >>> checked. So no need to check it again. >>> >>> I think the compiler can optimize the patch like >>> >>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >>> index 3b4a6cd..213598a 100644 >>> --- a/gcc/tree-ssa-loop-ivopts.c >>> +++ b/gcc/tree-ssa-loop-ivopts.c >>> @@ -3308,6 +3308,19 @@ get_address_cost (bool symbol_present, bool >>> var_present, >>> XEXP (addr, 1) = gen_int_mode (off, address_mode); >>> if (memory_address_addr_space_p (mem_mode, addr, as)) >>> break; >>> + /* For some TARGET, like ARM THUMB1, the offset should be nature >>> + aligned. Try an aligned offset if address_mode is not QImode. >>> */ >>> + if (address_mode != QImode) >>> + { >>> + off = ((unsigned HOST_WIDE_INT) 1 << i) >>> + - GET_MODE_SIZE (address_mode); >>> + if (off > 0) >>> + { >>> + XEXP (addr, 1) = gen_int_mode (off, address_mode); >>> + if (memory_address_addr_space_p (mem_mode, addr, as)) >>> + break; >>> + } >>> + } >>> } >>> if (i == -1) >>> off = 0; >> >> But is off now guaranteed to be the max value? (1 << (i-1) ) - 1 for >> small i is larger than (1 << i) - GET_MODE_SIZE (address_mode). >> >> That is, I think you want to guard this with 1 << (i - 1) > >> GET_MODE_SIZE (address_mode)? > > Yes. Without off > 0, it can not guarantee the off is the max value. > With off > 0, it can guarantee that > > (1 << i) - GET_MODE_SIZE (address_mode) is greater than (1 << (i-1) ) - 1. > >> You don't adjust the negative offset side - why? > > -((unsigned HOST_WIDE_INT) 1 << i) is already the min aligned offset. Ok. The patch is ok then. Thanks, Richard. > Thanks! > -Zhenqiang > >> Richard. >> >>> >>>> Thanks, >>>> bin >>>> > } >>>> > if (i == -1) >>>> > off = 0; >>>> > diff --git >>>> > a/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c >>>> > b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset.c >>>> > new file mode 100644 >>>> > index 0000000..cc3e2f7 >>>> > --- /dev/null >>>> > +++ >>>> b/gcc/testsuite/gcc.target/arm/get_address_cost_aligned_max_offset >>>> > +++ .c >>>> > @@ -0,0 +1,28 @@ >>>> > +/* { dg-do compile } */ >>>> > +/* { dg-options "-mthumb -O2" } */ >>>> > +/* { dg-require-effective-target arm_thumb1_ok } */ >>>> > + >>>> > +unsigned int >>>> > +test (const short p16[6 * 64]) >>>> > +{ >>>> > + unsigned int i = 6; >>>> > + unsigned int ret = 0; >>>> > + >>>> > + do >>>> > + { >>>> > + unsigned long long *p64 = (unsigned long long*) p16; >>>> > + unsigned int *p32 = (unsigned int*) p16; >>>> > + ret += ret; >>>> > + if (p16[1] || p32[1]) >>>> > + ret++; >>>> > + else if (p64[1] | p64[2] | p64[3]) >>>> > + ret++; >>>> > + p16 += 64; >>>> > + i--; >>>> > + } while (i != 0); >>>> > + >>>> > + return ret; >>>> > +} >>>> > + >>>> > +/* { dg-final { scan-assembler-not "#22" } } */ >>>> > +/* { dg-final { scan-assembler-not "#14" } } */ >>>> > >>>> > >>>> > >>> >>> >>> >>>
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 3b4a6cd..213598a 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -3308,6 +3308,19 @@ get_address_cost (bool symbol_present, bool var_present, XEXP (addr, 1) = gen_int_mode (off, address_mode); if (memory_address_addr_space_p (mem_mode, addr, as)) break; + /* For some TARGET, like ARM THUMB1, the offset should be nature + aligned. Try an aligned offset if address_mode is not QImode. */ + if (address_mode != QImode) + { + off = ((unsigned HOST_WIDE_INT) 1 << i) + - GET_MODE_SIZE (address_mode); + if (off > 0) + { + XEXP (addr, 1) = gen_int_mode (off, address_mode); + if (memory_address_addr_space_p (mem_mode, addr, as)) + break; + } + } }