mbox series

[00/57] Replace the Power target-specific built-in machinery

Message ID cover.1619537141.git.wschmidt@linux.ibm.com
Headers show
Series Replace the Power target-specific built-in machinery | expand

Message

Bill Schmidt April 27, 2021, 3:32 p.m. UTC
The design of the target-specific built-in function support in the
Power back end has not stood the test of time.  The machinery is
grossly inefficient, confusing, and arcane; and adding new built-in
functions is inefficient and error-prone.  This patch set introduces a
replacement. 

Because of the scope of the changes, it's important to be able to
verify that the new system makes only intended changes to the
functions that are supported.  Therefore this patch set adds a new
mechanism, and (in the final patch) enables it instead of the existing
support, but does not yet remove the old support.  That will happen in
a follow-up patch once we're comfortable with the new system.

Most of the patches in this set are specific to the rs6000 back end.
However, the first two patches make changes in common code and require
review from the appropriate maintainers.  Jakub and Jeff, I would
appreciate it if you could look at these two small patches.

After these changes are upstream, adding new built-in functions will
usually be as simple as adding two lines to a file,
rs6000-builtin-new.def, that give the prototype of the function and a
little additional information.  Adding new overloaded functions will
require adding a new section to another file, rs6000-overload.def,
with one line describing the overload information, and two lines for
each function to be dispatched to from the overloaded function.

The patches are divided into the following sections.

Patches 0001-0002: Common code patches

  Patch 0001 adds a mechanism to the Makefile to allow specifying
  additional dependencies for "out_object_file", which is rs6000.o for
  the rs6000 back end.  I found this necessary to be able to have
  rs6000.o depend on a header file generated during the build.

  Patch 0002 expands the gengtype machinery to scan header files
  created during the build for GC roots.

Patches 0003, 0005-0023: Generator program

  A new program, rs6000-gen-builtins, is created and executed during
  the build.  It reads rs6000-builtin-new.def and rs6000-overload.def
  and produces three output files:  rs6000-builtins.h,
  rs6000-builtins.c, and rs6000-vecdefines.h.  rs6000-builtins.h
  defines the data structures representing the built-in functions,
  overloaded functions, overload instantiations, and function type
  specifiers.  rs6000-builtins.c contains static initializers for the
  data structures, as well as the function rs6000_autoinit_builtins
  that performs additional run-time initialization.
  rs6000-vecdefines.h contains a set of #defines that map external
  identifiers such as vec_add to their internal builtin names, such as
  __builtin_vec_add.  This replaces most of the similar #defines
  previously contained in altivec.h, which now #includes the new file
  instead.

  This set of patches adds the source for the generator program.

Patches 0024-0025: Target build machinery

  These patches make changes to config.gcc and t-rs6000 to build and
  run the new generator program, and to ensure that the garbage
  collection roots in rs6000-builtins.h are scanned by gengtype.

Patches 0004, 0026-0031, 0033-0037: Input files

  These patches build up the input files to the generator program,
  listing all of the built-in functions and overloads to be
  processed.

Patch 0032: Add pointer types

  This patch creates and caches a bunch of pointer type nodes.  The
  existing built-in machinery, for some reason, only created base
  types up front and created the pointer types on demand (over and
  over and over again).  The new mechanism needs all the type nodes
  available, so we add them here.

Patch 0038: Call rs6000_autoinit_builtins

Patch 0039: A little special handling for Darwin

Patches 0040-0041: Miscellaneous support patches

Patch 0042: Rewrite the overload processing

  Most of this code remains largely the same as before, with the same
  special handling for a few interesting built-in functions.  But the
  general handling of overloaded functions is now much more efficient
  since the new data structures are designed for quick lookup, whereas
  the old machinery does a brutal linear search.

Patch 0043: Rewrite gimple folding

  The "rewrite" here consists entirely of changing the names of the
  builtins to be processed, since we need a separate enumeration of
  builtins for the new machinery.

Patch 0044: Vectorization support

  Small updates to the functions used for mapping built-ins to their
  vectorized counterparts.

Patches 0045-0050: Rewrite built-in function expansion

  This is where most of the meat comes in.  Lookup of built-ins at
  expand time is again much more efficient, replacing the old
  mechanism of multiple linear searches over the whole built-in
  table.  Another major change is that all built-in functions are
  always defined, but a test at expand time is used to determine
  whether they are enabled.  This allows proper handling of
  built-ins in the presence of "#pragma target" directives.  Also,
  handling of special cases is made more uniform using an attribute
  system, which I hope makes this much easier to maintain.

Patches 0051-0052: Miscellaneous changes

Patch 0053: Debug support

  Small changes here to allow gathering of a little more data from
  -mdebug=builtin.  I used this to look for differences between
  functions defined by the old and new built-in support.

Patch 0054: Change altivec.h to use rs6000-vecdefines.h

Patch 0055: Test case adjustments

  Most of these changes are due to automating checks for literal
  arguments that must be within a certain range.  This gives us more
  regular error messages, which don't always match the previous error
  messages.  There are also some adjustments because altivec.h now
  includes rs6000-vecdefines.h.

Patch 0056: Flip the switch to enable the new support

  Victory is ours...

Patch 0057: Fix one last late-breaking change

  Keeping the code up-to-date with upstream has been fun.  When I
  rebased to create the patch set, I found one new issue where a
  small change had been made to the overload handling for the
  vec_insert builtins.  This patch reflects that change into the
  new handling.  My version of git is having trouble with
  interactive rebasing, so it was easier to just add the extra patch.

Now, with all that done, there are a few things that are not yet
done:

(1) A future patch will remove the old code.

(2) There are times where we ought to dispatch an overload to one
function if VSX is available, and to another function if it is not.
We need a general mechanism for allowing conditional dispatch.  I've
outlined a method for this in rs6000-overload.def that I want to
implement down the road.

(3) I want to investigate why vec_mul requires special handling in
rs6000-c.c; it doesn't seem like it should.

(4) Similarly, can we remove some of the special handling for
vec_adde, vec_addec, vec_sube, and vec_subec?

(5) The parser in the generator program doesn't yet handle
"escape-newline" sequences for breaking long lines.  I should add that
capability.

(6) Longer term, can we use a similar mechanism for built-in functions
used for all targets in common code?


A word about compatibility:

I deliberately implemented all the old built-ins exactly as previously
defined, wherever possible, despite an overwhelming desire to pitch
out a bunch of them that have already been considered deprecated for
ages.  I found that it was too difficult to both implement a new
system and remove deprecated things at the same time, and in general
it seems like a dangerous thing to do.  Better to do this in stages if
we're going to do it at all.  Unfortunately a lot of deprecated things
still appear all over our own test suite, and I'm afraid we can assume
they appear in user code all over the place as well.

What I've done instead is to make very clear which interfaces are
considered deprecated in the input files themselves.  Over time,
perhaps we can start to remove some of these, but in reality I think
we're going to just continue to be stuck with them.

Here is a complete list of known incompatibilities with the old
mechanism:

(1) __builtin_vec_vpopcntu[bdhw] were all registered as overloads but
didn't have any instantiations.  Therefore they could not have been
used anywhere, and I haven't implemented them.

(2) I added ten new built-ins named __builtin_vsx_xxpermx_<type> to be
used for the overloaded vec_xxpermx function, instead of the bloody
hack that was used before.  The functionality of vec_xxpermx is
unchanged.

(3) A number of built-ins used "long" for DImode types, which would
break these for 32-bit.  I changed those arguments and return values
to "long long" to avoid such problems, when those built-ins were not
restricted to 64-bit mode already.  There aren't many such cases.

(4) A small handful of builtins didn't have the correct return type to
match the mode of the pattern, so I fixed those.  They are all new in
GCC 11 and can't have worked properly.

(5) I handled the MMA internal functions slightly differently, so that
all the ones with extra vector_quad arguments are listed as such,
rather than having that hacked on during expand time.

(6) __builtin_vsx_xl_len_r took only a char * rather than a void *;
fixing this was backward compatible.

(7) __builtin_vsx_splat_2d[fi] were incompletely defined and couldn't
have ever worked; fixed.

(8) A small handful of builtins weren't marked as "const," but are
obviously const, so I fixed those.

I've kept a complete list of discrepancies for my records, in case any
issues arise from my misunderstanding something.

I do want to thank all the people who have contributed to the built-in
design over the years.  For all my griping, there are some marvelous
bits in there that I hope I have kept intact.  My hope is to make the
whole system much easier to use and maintain going forward.  Time will
tell.

The patches have been bootstrapped and tested on a Power10
little-endian system, and on a Power8 big-endian system with both 32-
and 64-bit enabled, with no regressions.  I'm not crazy enough to
believe I don't have any errors in here, but I have endeavoured to
test and minimize them to the best of my ability.

Is this series okay for trunk, in GCC 12 stage 1?

Thanks!
Bill

Bill Schmidt (57):
  Allow targets to specify build dependencies for out_object_file
  Support scanning of build-time GC roots in gengtype
  rs6000: Initial create of rs6000-gen-builtins.c
  rs6000: Add initial input files
  rs6000: Add file support and functions for diagnostic support
  rs6000: Add helper functions for parsing
  rs6000: Add functions for matching types, part 1 of 3
  rs6000: Add functions for matching types, part 2 of 3
  rs6000: Add functions for matching types, part 3 of 3
  rs6000: Red-black tree implementation for balanced tree search
  rs6000: Main function with stubs for parsing and output
  rs6000: Parsing built-in input file, part 1 of 3
  rs6000: Parsing built-in input file, part 2 of 3
  rs6000: Parsing built-in input file, part 3 of 3
  rs6000: Parsing of overload input file
  rs6000: Build and store function type identifiers
  rs6000: Write output to the builtin definition include file
  rs6000: Write output to the builtins header file
  rs6000: Write output to the builtins init file, part 1 of 3
  rs6000: Write output to the builtins init file, part 2 of 3
  rs6000: Write output to the builtins init file, part 3 of 3
  rs6000: Write static initializations for built-in table
  rs6000: Write static initializations for overload tables
  rs6000: Incorporate new builtins code into the build machinery
  rs6000: Add gengtype handling to the build machinery
  rs6000: Add the rest of the [altivec] stanza to the builtins file
  rs6000: Add VSX builtins
  rs6000: Add available-everywhere and ancient builtins
  rs6000: Add power7 and power7-64 builtins
  rs6000: Add power8-vector builtins
  rs6000: Add Power9 builtins
  rs6000: Add more type nodes to support builtin processing
  rs6000: Add Power10 builtins
  rs6000: Add MMA builtins
  rs6000: Add miscellaneous builtins
  rs6000: Add Cell builtins
  rs6000: Add remaining overloads
  rs6000: Execute the automatic built-in initialization code
  rs6000: Darwin builtin support
  rs6000: Add sanity to V2DI_type_node definitions
  rs6000: Always initialize vector_pair and vector_quad nodes
  rs6000: Handle overloads during program parsing
  rs6000: Handle gimple folding of target built-ins
  rs6000: Support for vectorizing built-in functions
  rs6000: Builtin expansion, part 1
  rs6000: Builtin expansion, part 2
  rs6000: Builtin expansion, part 3
  rs6000: Builtin expansion, part 4
  rs6000: Builtin expansion, part 5
  rs6000: Builtin expansion, part 6
  rs6000: Update rs6000_builtin_decl
  rs6000: Miscellaneous uses of rs6000_builtin_decls_x
  rs6000: Debug support
  rs6000: Update altivec.h for automated interfaces
  rs6000: Test case adjustments
  rs6000: Enable the new builtin support
  rs6000: Adjust to late-breaking change

 gcc/Makefile.in                               |    8 +-
 gcc/config.gcc                                |    2 +
 gcc/config/rs6000/altivec.h                   |  516 +-
 gcc/config/rs6000/darwin.h                    |    8 +-
 gcc/config/rs6000/rbtree.c                    |  233 +
 gcc/config/rs6000/rbtree.h                    |   51 +
 gcc/config/rs6000/rs6000-builtin-new.def      | 3875 +++++++++++
 gcc/config/rs6000/rs6000-c.c                  | 1083 +++
 gcc/config/rs6000/rs6000-call.c               | 3371 ++++++++-
 gcc/config/rs6000/rs6000-gen-builtins.c       | 2997 ++++++++
 gcc/config/rs6000/rs6000-overload.def         | 6076 +++++++++++++++++
 gcc/config/rs6000/rs6000.c                    |  219 +-
 gcc/config/rs6000/rs6000.h                    |   82 +
 gcc/config/rs6000/t-rs6000                    |   44 +-
 gcc/gengtype-state.c                          |   29 +-
 gcc/gengtype.c                                |   19 +-
 gcc/gengtype.h                                |    5 +
 .../powerpc/bfp/scalar-extract-exp-2.c        |    2 +-
 .../powerpc/bfp/scalar-extract-sig-2.c        |    2 +-
 .../powerpc/bfp/scalar-insert-exp-2.c         |    2 +-
 .../powerpc/bfp/scalar-insert-exp-5.c         |    2 +-
 .../powerpc/bfp/scalar-insert-exp-8.c         |    2 +-
 .../powerpc/bfp/scalar-test-neg-2.c           |    2 +-
 .../powerpc/bfp/scalar-test-neg-3.c           |    2 +-
 .../powerpc/bfp/scalar-test-neg-5.c           |    2 +-
 .../gcc.target/powerpc/byte-in-set-2.c        |    2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c     |    2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c   |    2 +-
 .../gcc.target/powerpc/crypto-builtin-2.c     |   14 +-
 .../powerpc/fold-vec-splat-floatdouble.c      |    4 +-
 .../powerpc/fold-vec-splat-longlong.c         |   10 +-
 .../powerpc/fold-vec-splat-misc-invalid.c     |    8 +-
 .../gcc.target/powerpc/p8vector-builtin-8.c   |    1 +
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr88100.c    |   12 +-
 .../gcc.target/powerpc/pragma_misc9.c         |    2 +-
 .../gcc.target/powerpc/pragma_power8.c        |    2 +
 .../gcc.target/powerpc/pragma_power9.c        |    3 +
 .../powerpc/test_fpscr_drn_builtin_error.c    |    4 +-
 .../powerpc/test_fpscr_rn_builtin_error.c     |   12 +-
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |    3 +-
 gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c  |    2 +-
 .../gcc.target/powerpc/vsu/vec-all-nez-7.c    |    2 +-
 .../gcc.target/powerpc/vsu/vec-any-eqz-7.c    |    2 +-
 .../gcc.target/powerpc/vsu/vec-cmpnez-7.c     |    2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c |    2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c |    2 +-
 .../gcc.target/powerpc/vsu/vec-xl-len-13.c    |    2 +-
 .../gcc.target/powerpc/vsu/vec-xst-len-12.c   |    2 +-
 52 files changed, 17915 insertions(+), 824 deletions(-)
 create mode 100644 gcc/config/rs6000/rbtree.c
 create mode 100644 gcc/config/rs6000/rbtree.h
 create mode 100644 gcc/config/rs6000/rs6000-builtin-new.def
 create mode 100644 gcc/config/rs6000/rs6000-gen-builtins.c
 create mode 100644 gcc/config/rs6000/rs6000-overload.def

Comments

Li, Pan2 via Gcc-patches May 11, 2021, 3:57 p.m. UTC | #1
Hi!  I'd like to ping this series.  This is a big change, so I'd like to 
get it committed fairly early in stage 1.  I know you have a lot stacked 
up, though.

Thanks!
Bill

On 4/27/21 10:32 AM, Bill Schmidt wrote:
> The design of the target-specific built-in function support in the
> Power back end has not stood the test of time.  The machinery is
> grossly inefficient, confusing, and arcane; and adding new built-in
> functions is inefficient and error-prone.  This patch set introduces a
> replacement.
>
> Because of the scope of the changes, it's important to be able to
> verify that the new system makes only intended changes to the
> functions that are supported.  Therefore this patch set adds a new
> mechanism, and (in the final patch) enables it instead of the existing
> support, but does not yet remove the old support.  That will happen in
> a follow-up patch once we're comfortable with the new system.
>
> Most of the patches in this set are specific to the rs6000 back end.
> However, the first two patches make changes in common code and require
> review from the appropriate maintainers.  Jakub and Jeff, I would
> appreciate it if you could look at these two small patches.
>
> After these changes are upstream, adding new built-in functions will
> usually be as simple as adding two lines to a file,
> rs6000-builtin-new.def, that give the prototype of the function and a
> little additional information.  Adding new overloaded functions will
> require adding a new section to another file, rs6000-overload.def,
> with one line describing the overload information, and two lines for
> each function to be dispatched to from the overloaded function.
>
> The patches are divided into the following sections.
>
> Patches 0001-0002: Common code patches
>
>    Patch 0001 adds a mechanism to the Makefile to allow specifying
>    additional dependencies for "out_object_file", which is rs6000.o for
>    the rs6000 back end.  I found this necessary to be able to have
>    rs6000.o depend on a header file generated during the build.
>
>    Patch 0002 expands the gengtype machinery to scan header files
>    created during the build for GC roots.
>
> Patches 0003, 0005-0023: Generator program
>
>    A new program, rs6000-gen-builtins, is created and executed during
>    the build.  It reads rs6000-builtin-new.def and rs6000-overload.def
>    and produces three output files:  rs6000-builtins.h,
>    rs6000-builtins.c, and rs6000-vecdefines.h.  rs6000-builtins.h
>    defines the data structures representing the built-in functions,
>    overloaded functions, overload instantiations, and function type
>    specifiers.  rs6000-builtins.c contains static initializers for the
>    data structures, as well as the function rs6000_autoinit_builtins
>    that performs additional run-time initialization.
>    rs6000-vecdefines.h contains a set of #defines that map external
>    identifiers such as vec_add to their internal builtin names, such as
>    __builtin_vec_add.  This replaces most of the similar #defines
>    previously contained in altivec.h, which now #includes the new file
>    instead.
>
>    This set of patches adds the source for the generator program.
>
> Patches 0024-0025: Target build machinery
>
>    These patches make changes to config.gcc and t-rs6000 to build and
>    run the new generator program, and to ensure that the garbage
>    collection roots in rs6000-builtins.h are scanned by gengtype.
>
> Patches 0004, 0026-0031, 0033-0037: Input files
>
>    These patches build up the input files to the generator program,
>    listing all of the built-in functions and overloads to be
>    processed.
>
> Patch 0032: Add pointer types
>
>    This patch creates and caches a bunch of pointer type nodes.  The
>    existing built-in machinery, for some reason, only created base
>    types up front and created the pointer types on demand (over and
>    over and over again).  The new mechanism needs all the type nodes
>    available, so we add them here.
>
> Patch 0038: Call rs6000_autoinit_builtins
>
> Patch 0039: A little special handling for Darwin
>
> Patches 0040-0041: Miscellaneous support patches
>
> Patch 0042: Rewrite the overload processing
>
>    Most of this code remains largely the same as before, with the same
>    special handling for a few interesting built-in functions.  But the
>    general handling of overloaded functions is now much more efficient
>    since the new data structures are designed for quick lookup, whereas
>    the old machinery does a brutal linear search.
>
> Patch 0043: Rewrite gimple folding
>
>    The "rewrite" here consists entirely of changing the names of the
>    builtins to be processed, since we need a separate enumeration of
>    builtins for the new machinery.
>
> Patch 0044: Vectorization support
>
>    Small updates to the functions used for mapping built-ins to their
>    vectorized counterparts.
>
> Patches 0045-0050: Rewrite built-in function expansion
>
>    This is where most of the meat comes in.  Lookup of built-ins at
>    expand time is again much more efficient, replacing the old
>    mechanism of multiple linear searches over the whole built-in
>    table.  Another major change is that all built-in functions are
>    always defined, but a test at expand time is used to determine
>    whether they are enabled.  This allows proper handling of
>    built-ins in the presence of "#pragma target" directives.  Also,
>    handling of special cases is made more uniform using an attribute
>    system, which I hope makes this much easier to maintain.
>
> Patches 0051-0052: Miscellaneous changes
>
> Patch 0053: Debug support
>
>    Small changes here to allow gathering of a little more data from
>    -mdebug=builtin.  I used this to look for differences between
>    functions defined by the old and new built-in support.
>
> Patch 0054: Change altivec.h to use rs6000-vecdefines.h
>
> Patch 0055: Test case adjustments
>
>    Most of these changes are due to automating checks for literal
>    arguments that must be within a certain range.  This gives us more
>    regular error messages, which don't always match the previous error
>    messages.  There are also some adjustments because altivec.h now
>    includes rs6000-vecdefines.h.
>
> Patch 0056: Flip the switch to enable the new support
>
>    Victory is ours...
>
> Patch 0057: Fix one last late-breaking change
>
>    Keeping the code up-to-date with upstream has been fun.  When I
>    rebased to create the patch set, I found one new issue where a
>    small change had been made to the overload handling for the
>    vec_insert builtins.  This patch reflects that change into the
>    new handling.  My version of git is having trouble with
>    interactive rebasing, so it was easier to just add the extra patch.
>
> Now, with all that done, there are a few things that are not yet
> done:
>
> (1) A future patch will remove the old code.
>
> (2) There are times where we ought to dispatch an overload to one
> function if VSX is available, and to another function if it is not.
> We need a general mechanism for allowing conditional dispatch.  I've
> outlined a method for this in rs6000-overload.def that I want to
> implement down the road.
>
> (3) I want to investigate why vec_mul requires special handling in
> rs6000-c.c; it doesn't seem like it should.
>
> (4) Similarly, can we remove some of the special handling for
> vec_adde, vec_addec, vec_sube, and vec_subec?
>
> (5) The parser in the generator program doesn't yet handle
> "escape-newline" sequences for breaking long lines.  I should add that
> capability.
>
> (6) Longer term, can we use a similar mechanism for built-in functions
> used for all targets in common code?
>
>
> A word about compatibility:
>
> I deliberately implemented all the old built-ins exactly as previously
> defined, wherever possible, despite an overwhelming desire to pitch
> out a bunch of them that have already been considered deprecated for
> ages.  I found that it was too difficult to both implement a new
> system and remove deprecated things at the same time, and in general
> it seems like a dangerous thing to do.  Better to do this in stages if
> we're going to do it at all.  Unfortunately a lot of deprecated things
> still appear all over our own test suite, and I'm afraid we can assume
> they appear in user code all over the place as well.
>
> What I've done instead is to make very clear which interfaces are
> considered deprecated in the input files themselves.  Over time,
> perhaps we can start to remove some of these, but in reality I think
> we're going to just continue to be stuck with them.
>
> Here is a complete list of known incompatibilities with the old
> mechanism:
>
> (1) __builtin_vec_vpopcntu[bdhw] were all registered as overloads but
> didn't have any instantiations.  Therefore they could not have been
> used anywhere, and I haven't implemented them.
>
> (2) I added ten new built-ins named __builtin_vsx_xxpermx_<type> to be
> used for the overloaded vec_xxpermx function, instead of the bloody
> hack that was used before.  The functionality of vec_xxpermx is
> unchanged.
>
> (3) A number of built-ins used "long" for DImode types, which would
> break these for 32-bit.  I changed those arguments and return values
> to "long long" to avoid such problems, when those built-ins were not
> restricted to 64-bit mode already.  There aren't many such cases.
>
> (4) A small handful of builtins didn't have the correct return type to
> match the mode of the pattern, so I fixed those.  They are all new in
> GCC 11 and can't have worked properly.
>
> (5) I handled the MMA internal functions slightly differently, so that
> all the ones with extra vector_quad arguments are listed as such,
> rather than having that hacked on during expand time.
>
> (6) __builtin_vsx_xl_len_r took only a char * rather than a void *;
> fixing this was backward compatible.
>
> (7) __builtin_vsx_splat_2d[fi] were incompletely defined and couldn't
> have ever worked; fixed.
>
> (8) A small handful of builtins weren't marked as "const," but are
> obviously const, so I fixed those.
>
> I've kept a complete list of discrepancies for my records, in case any
> issues arise from my misunderstanding something.
>
> I do want to thank all the people who have contributed to the built-in
> design over the years.  For all my griping, there are some marvelous
> bits in there that I hope I have kept intact.  My hope is to make the
> whole system much easier to use and maintain going forward.  Time will
> tell.
>
> The patches have been bootstrapped and tested on a Power10
> little-endian system, and on a Power8 big-endian system with both 32-
> and 64-bit enabled, with no regressions.  I'm not crazy enough to
> believe I don't have any errors in here, but I have endeavoured to
> test and minimize them to the best of my ability.
>
> Is this series okay for trunk, in GCC 12 stage 1?
>
> Thanks!
> Bill
>
> Bill Schmidt (57):
>    Allow targets to specify build dependencies for out_object_file
>    Support scanning of build-time GC roots in gengtype
>    rs6000: Initial create of rs6000-gen-builtins.c
>    rs6000: Add initial input files
>    rs6000: Add file support and functions for diagnostic support
>    rs6000: Add helper functions for parsing
>    rs6000: Add functions for matching types, part 1 of 3
>    rs6000: Add functions for matching types, part 2 of 3
>    rs6000: Add functions for matching types, part 3 of 3
>    rs6000: Red-black tree implementation for balanced tree search
>    rs6000: Main function with stubs for parsing and output
>    rs6000: Parsing built-in input file, part 1 of 3
>    rs6000: Parsing built-in input file, part 2 of 3
>    rs6000: Parsing built-in input file, part 3 of 3
>    rs6000: Parsing of overload input file
>    rs6000: Build and store function type identifiers
>    rs6000: Write output to the builtin definition include file
>    rs6000: Write output to the builtins header file
>    rs6000: Write output to the builtins init file, part 1 of 3
>    rs6000: Write output to the builtins init file, part 2 of 3
>    rs6000: Write output to the builtins init file, part 3 of 3
>    rs6000: Write static initializations for built-in table
>    rs6000: Write static initializations for overload tables
>    rs6000: Incorporate new builtins code into the build machinery
>    rs6000: Add gengtype handling to the build machinery
>    rs6000: Add the rest of the [altivec] stanza to the builtins file
>    rs6000: Add VSX builtins
>    rs6000: Add available-everywhere and ancient builtins
>    rs6000: Add power7 and power7-64 builtins
>    rs6000: Add power8-vector builtins
>    rs6000: Add Power9 builtins
>    rs6000: Add more type nodes to support builtin processing
>    rs6000: Add Power10 builtins
>    rs6000: Add MMA builtins
>    rs6000: Add miscellaneous builtins
>    rs6000: Add Cell builtins
>    rs6000: Add remaining overloads
>    rs6000: Execute the automatic built-in initialization code
>    rs6000: Darwin builtin support
>    rs6000: Add sanity to V2DI_type_node definitions
>    rs6000: Always initialize vector_pair and vector_quad nodes
>    rs6000: Handle overloads during program parsing
>    rs6000: Handle gimple folding of target built-ins
>    rs6000: Support for vectorizing built-in functions
>    rs6000: Builtin expansion, part 1
>    rs6000: Builtin expansion, part 2
>    rs6000: Builtin expansion, part 3
>    rs6000: Builtin expansion, part 4
>    rs6000: Builtin expansion, part 5
>    rs6000: Builtin expansion, part 6
>    rs6000: Update rs6000_builtin_decl
>    rs6000: Miscellaneous uses of rs6000_builtin_decls_x
>    rs6000: Debug support
>    rs6000: Update altivec.h for automated interfaces
>    rs6000: Test case adjustments
>    rs6000: Enable the new builtin support
>    rs6000: Adjust to late-breaking change
>
>   gcc/Makefile.in                               |    8 +-
>   gcc/config.gcc                                |    2 +
>   gcc/config/rs6000/altivec.h                   |  516 +-
>   gcc/config/rs6000/darwin.h                    |    8 +-
>   gcc/config/rs6000/rbtree.c                    |  233 +
>   gcc/config/rs6000/rbtree.h                    |   51 +
>   gcc/config/rs6000/rs6000-builtin-new.def      | 3875 +++++++++++
>   gcc/config/rs6000/rs6000-c.c                  | 1083 +++
>   gcc/config/rs6000/rs6000-call.c               | 3371 ++++++++-
>   gcc/config/rs6000/rs6000-gen-builtins.c       | 2997 ++++++++
>   gcc/config/rs6000/rs6000-overload.def         | 6076 +++++++++++++++++
>   gcc/config/rs6000/rs6000.c                    |  219 +-
>   gcc/config/rs6000/rs6000.h                    |   82 +
>   gcc/config/rs6000/t-rs6000                    |   44 +-
>   gcc/gengtype-state.c                          |   29 +-
>   gcc/gengtype.c                                |   19 +-
>   gcc/gengtype.h                                |    5 +
>   .../powerpc/bfp/scalar-extract-exp-2.c        |    2 +-
>   .../powerpc/bfp/scalar-extract-sig-2.c        |    2 +-
>   .../powerpc/bfp/scalar-insert-exp-2.c         |    2 +-
>   .../powerpc/bfp/scalar-insert-exp-5.c         |    2 +-
>   .../powerpc/bfp/scalar-insert-exp-8.c         |    2 +-
>   .../powerpc/bfp/scalar-test-neg-2.c           |    2 +-
>   .../powerpc/bfp/scalar-test-neg-3.c           |    2 +-
>   .../powerpc/bfp/scalar-test-neg-5.c           |    2 +-
>   .../gcc.target/powerpc/byte-in-set-2.c        |    2 +-
>   gcc/testsuite/gcc.target/powerpc/cmpb-2.c     |    2 +-
>   gcc/testsuite/gcc.target/powerpc/cmpb32-2.c   |    2 +-
>   .../gcc.target/powerpc/crypto-builtin-2.c     |   14 +-
>   .../powerpc/fold-vec-splat-floatdouble.c      |    4 +-
>   .../powerpc/fold-vec-splat-longlong.c         |   10 +-
>   .../powerpc/fold-vec-splat-misc-invalid.c     |    8 +-
>   .../gcc.target/powerpc/p8vector-builtin-8.c   |    1 +
>   gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr88100.c    |   12 +-
>   .../gcc.target/powerpc/pragma_misc9.c         |    2 +-
>   .../gcc.target/powerpc/pragma_power8.c        |    2 +
>   .../gcc.target/powerpc/pragma_power9.c        |    3 +
>   .../powerpc/test_fpscr_drn_builtin_error.c    |    4 +-
>   .../powerpc/test_fpscr_rn_builtin_error.c     |   12 +-
>   gcc/testsuite/gcc.target/powerpc/test_mffsl.c |    3 +-
>   gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c  |    2 +-
>   .../gcc.target/powerpc/vsu/vec-all-nez-7.c    |    2 +-
>   .../gcc.target/powerpc/vsu/vec-any-eqz-7.c    |    2 +-
>   .../gcc.target/powerpc/vsu/vec-cmpnez-7.c     |    2 +-
>   .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c |    2 +-
>   .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c |    2 +-
>   .../gcc.target/powerpc/vsu/vec-xl-len-13.c    |    2 +-
>   .../gcc.target/powerpc/vsu/vec-xst-len-12.c   |    2 +-
>   52 files changed, 17915 insertions(+), 824 deletions(-)
>   create mode 100644 gcc/config/rs6000/rbtree.c
>   create mode 100644 gcc/config/rs6000/rbtree.h
>   create mode 100644 gcc/config/rs6000/rs6000-builtin-new.def
>   create mode 100644 gcc/config/rs6000/rs6000-gen-builtins.c
>   create mode 100644 gcc/config/rs6000/rs6000-overload.def
>
Segher Boessenkool May 11, 2021, 11:20 p.m. UTC | #2
On Tue, May 11, 2021 at 10:57:56AM -0500, Bill Schmidt wrote:
> Hi!  I'd like to ping this series.  This is a big change, so I'd like to 
> get it committed fairly early in stage 1.  I know you have a lot stacked 
> up, though.

I haven't received most of this series (only the last three patches).
I'll dig it up from the archives.


Segher
Segher Boessenkool May 20, 2021, 9:57 p.m. UTC | #3
Hi!

Just a few small things about this -- I'll reply to more of it later.

On Tue, Apr 27, 2021 at 10:32:35AM -0500, Bill Schmidt via Gcc-patches wrote:
> The design of the target-specific built-in function support in the
> Power back end has not stood the test of time.  The machinery is
> grossly inefficient, confusing, and arcane; and adding new built-in
> functions is inefficient and error-prone.

You are too nice to it.  People have had to work with it over the
years, there is some pent-up anger :-)

> Because of the scope of the changes, it's important to be able to
> verify that the new system makes only intended changes to the
> functions that are supported.  Therefore this patch set adds a new
> mechanism, and (in the final patch) enables it instead of the existing
> support, but does not yet remove the old support.  That will happen in
> a follow-up patch once we're comfortable with the new system.

Is there some (semi-)automatic way to compare the results of the old
and new systems?

> Patch 0057: Fix one last late-breaking change
> 
>   Keeping the code up-to-date with upstream has been fun.  When I
>   rebased to create the patch set, I found one new issue where a
>   small change had been made to the overload handling for the
>   vec_insert builtins.  This patch reflects that change into the
>   new handling.  My version of git is having trouble with
>   interactive rebasing, so it was easier to just add the extra patch.

What breaks by keeping this fix after the other patches?

> I deliberately implemented all the old built-ins exactly as previously
> defined, wherever possible, despite an overwhelming desire to pitch
> out a bunch of them that have already been considered deprecated for
> ages.  I found that it was too difficult to both implement a new
> system and remove deprecated things at the same time, and in general
> it seems like a dangerous thing to do.  Better to do this in stages if
> we're going to do it at all.

Independent fixes you can put before the meat of the series.  This often
is the best way to do it, since then you don't have to duplicate the
weird / buggy / whatever behaviour of the old system.

But things that aren't simple fixes, that need deprecation periods and
everything...  no no no, you want this done *this* decade!

> Unfortunately a lot of deprecated things
> still appear all over our own test suite, and I'm afraid we can assume
> they appear in user code all over the place as well.

Pretty much the only old features you can remove are features that have
been broken for many years.  You can break something on purpose to see
if anyone still uses it, too, but :-)

> What I've done instead is to make very clear which interfaces are
> considered deprecated in the input files themselves.  Over time,
> perhaps we can start to remove some of these, but in reality I think
> we're going to just continue to be stuck with them.

It is extremely useful to have it clearly documented which interfaces
shoul;d be considered deprecated, even if we will never remove it.  It
is useful in the documentation, but it is even more useful in the code,
for ourselves!

> (3) A number of built-ins used "long" for DImode types, which would
> break these for 32-bit.  I changed those arguments and return values
> to "long long" to avoid such problems, when those built-ins were not
> restricted to 64-bit mode already.  There aren't many such cases.

You can do this for 64-bit-only builtins as well -- the actual argument
type is never visible (to the user), and everything becomes modes early.


Segher
Li, Pan2 via Gcc-patches May 21, 2021, 12:53 p.m. UTC | #4
On 5/20/21 4:57 PM, Segher Boessenkool wrote:
> Is there some (semi-)automatic way to compare the results of the old
> and new systems?
Yes...very "semi".  There's a patch in the series that updates the 
information printed from -mdebug=builtin.  I use this to print the 
builtins generated by the old and new versions, and run them through a 
Python script to look for anything missing or added, any mismatches in 
the type system, mismatches in const/pure/etc., that sort of thing.  I 
found quite a lot of mistakes this way and got them fixed ahead of 
time.  In the script I maintain a table of expected differences, where 
there were just errors before, etc., and there's a short doc string with 
each of them.  So I can go back and look at that if we run into any 
discrepancies in the future.  The script doesn't have long-term value 
beyond that; once we delete the old version in a future patch, it won't 
have anything to compare against.
>> Patch 0057: Fix one last late-breaking change
>>
>>    Keeping the code up-to-date with upstream has been fun.  When I
>>    rebased to create the patch set, I found one new issue where a
>>    small change had been made to the overload handling for the
>>    vec_insert builtins.  This patch reflects that change into the
>>    new handling.  My version of git is having trouble with
>>    interactive rebasing, so it was easier to just add the extra patch.
> What breaks by keeping this fix after the other patches?
Nothing truly breaks.  There is a VSX-efficient implementation of 
vec_insert and a less-efficient implementation.  Late in GCC 11, 
somebody recognized the VSX-efficient implementation could be used in 
more situations.  Without this patch, we still use the less-efficient 
implementation in the new version.
>
>> (3) A number of built-ins used "long" for DImode types, which would
>> break these for 32-bit.  I changed those arguments and return values
>> to "long long" to avoid such problems, when those built-ins were not
>> restricted to 64-bit mode already.  There aren't many such cases.
> You can do this for 64-bit-only builtins as well -- the actual argument
> type is never visible (to the user), and everything becomes modes early.
>
Yep -- in fact, the old system generally used a (V2)DImode type for such 
things, and I've just translated all of those to (vector) long long in 
the new prototypes.

Thanks for the ongoing review!

Bill