Message ID | 5c5c2878-6828-42aa-8cfb-2778aea8050b@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: Enable overlapped by-pieces operations | expand |
Hi, on 2024/5/8 14:47, HAO CHEN GUI wrote: > Hi, > This patch enables overlapped by-piece operations. On rs6000, default > move/set/clear ratio is 2. So the overlap is only enabled with compare > by-pieces. Thanks for enabling this, did you evaluate if it can help some benchmark? > > Bootstrapped and tested on powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk? > > Thanks > Gui Haochen > > ChangeLog > rs6000: Enable overlapped by-pieces operations > > This patch enables overlapped by-piece operations by defining > TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear > ratio is 2. So the overlap is only enabled with compare by-pieces. > > gcc/ > * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. > > gcc/testsuite/ > * gcc.target/powerpc/block-cmp-9.c: New. > > > patch.diff > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 6b9a40fcc66..2b5f5cf1d86 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] = > #undef TARGET_CONST_ANCHOR > #define TARGET_CONST_ANCHOR 0x8000 > > +#undef TARGET_OVERLAP_OP_BY_PIECES_P > +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true > + > > > /* Processor table. */ > diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c > new file mode 100644 > index 00000000000..b5f51affbb7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ Why does it need power8 forced here? BR, Kewen > +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ > + > +/* Test if by-piece overlap compare is enabled and following case is > + implemented by two overlap word loads and compares. */ > + > +int foo (const char* s1, const char* s2) > +{ > + return __builtin_memcmp (s1, s2, 7) == 0; > +}
Hi Kewen, Thanks for your comments. 在 2024/5/9 13:44, Kewen.Lin 写道: > Hi, > > on 2024/5/8 14:47, HAO CHEN GUI wrote: >> Hi, >> This patch enables overlapped by-piece operations. On rs6000, default >> move/set/clear ratio is 2. So the overlap is only enabled with compare >> by-pieces. > > Thanks for enabling this, did you evaluate if it can help some benchmark? Tested it with SPEC2017. No obvious performance impact. I think memory compare might not be hot enough. Tested it with my micro benchmark. 5-10% performance gain when compare length is 7. > >> >> Bootstrapped and tested on powerpc64-linux BE and LE with no >> regressions. Is it OK for the trunk? >> >> Thanks >> Gui Haochen >> >> ChangeLog >> rs6000: Enable overlapped by-pieces operations >> >> This patch enables overlapped by-piece operations by defining >> TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear >> ratio is 2. So the overlap is only enabled with compare by-pieces. >> >> gcc/ >> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. >> >> gcc/testsuite/ >> * gcc.target/powerpc/block-cmp-9.c: New. >> >> >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index 6b9a40fcc66..2b5f5cf1d86 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] = >> #undef TARGET_CONST_ANCHOR >> #define TARGET_CONST_ANCHOR 0x8000 >> >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P >> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true >> + >> >> >> /* Processor table. */ >> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> new file mode 100644 >> index 00000000000..b5f51affbb7 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> @@ -0,0 +1,11 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > > Why does it need power8 forced here? I just want to exclude P7 LE as targetm.slow_unaligned_access return false for it and the expand cmpmemsi won't be invoked. > > BR, > Kewen > >> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ >> + >> +/* Test if by-piece overlap compare is enabled and following case is >> + implemented by two overlap word loads and compares. */ >> + >> +int foo (const char* s1, const char* s2) >> +{ >> + return __builtin_memcmp (s1, s2, 7) == 0; >> +} > Thanks Gui Haochen
Hi Kewen,
在 2024/5/9 13:44, Kewen.Lin 写道:
> Why does it need power8 forced here?
I think it over. It's no need. For the sub-targets which library is
called, l[hb]z won't be generated too.
Thanks
Gui Haochen
Hi, on 2024/5/9 15:35, HAO CHEN GUI wrote: > Hi Kewen, > Thanks for your comments. > > 在 2024/5/9 13:44, Kewen.Lin 写道: >> Hi, >> >> on 2024/5/8 14:47, HAO CHEN GUI wrote: >>> Hi, >>> This patch enables overlapped by-piece operations. On rs6000, default >>> move/set/clear ratio is 2. So the overlap is only enabled with compare >>> by-pieces. >> >> Thanks for enabling this, did you evaluate if it can help some benchmark? > > Tested it with SPEC2017. No obvious performance impact. I think memory > compare might not be hot enough. > > Tested it with my micro benchmark. 5-10% performance gain when compare > length is 7. Nice! > >> >>> >>> Bootstrapped and tested on powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> ChangeLog >>> rs6000: Enable overlapped by-pieces operations >>> >>> This patch enables overlapped by-piece operations by defining >>> TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear >>> ratio is 2. So the overlap is only enabled with compare by-pieces. >>> >>> gcc/ >>> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. >>> >>> gcc/testsuite/ >>> * gcc.target/powerpc/block-cmp-9.c: New. >>> >>> >>> patch.diff >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >>> index 6b9a40fcc66..2b5f5cf1d86 100644 >>> --- a/gcc/config/rs6000/rs6000.cc >>> +++ b/gcc/config/rs6000/rs6000.cc >>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] = >>> #undef TARGET_CONST_ANCHOR >>> #define TARGET_CONST_ANCHOR 0x8000 >>> >>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P >>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true >>> + >>> >>> >>> /* Processor table. */ >>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> new file mode 100644 >>> index 00000000000..b5f51affbb7 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> @@ -0,0 +1,11 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ >> >> Why does it need power8 forced here? > > I just want to exclude P7 LE as targetm.slow_unaligned_access return false > for it and the expand cmpmemsi won't be invoked. > I think it over. It's no need. For the sub-targets which library is > called, l[hb]z won't be generated too. Thanks for checking, OK with dropping this forced power8. BR, Kewen > >> >> BR, >> Kewen >> >>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ >>> + >>> +/* Test if by-piece overlap compare is enabled and following case is >>> + implemented by two overlap word loads and compares. */ >>> + >>> +int foo (const char* s1, const char* s2) >>> +{ >>> + return __builtin_memcmp (s1, s2, 7) == 0; >>> +} >> > > Thanks > Gui Haochen
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 6b9a40fcc66..2b5f5cf1d86 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] = #undef TARGET_CONST_ANCHOR #define TARGET_CONST_ANCHOR 0x8000 +#undef TARGET_OVERLAP_OP_BY_PIECES_P +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true + /* Processor table. */ diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c new file mode 100644 index 00000000000..b5f51affbb7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ + +/* Test if by-piece overlap compare is enabled and following case is + implemented by two overlap word loads and compares. */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 7) == 0; +}