Message ID | cover.1619537141.git.wschmidt@linux.ibm.com |
---|---|
Headers | show |
Series | Replace the Power target-specific built-in machinery | expand |
Hi! I'd like to ping this series. This is a big change, so I'd like to get it committed fairly early in stage 1. I know you have a lot stacked up, though. Thanks! Bill On 4/27/21 10:32 AM, Bill Schmidt wrote: > The design of the target-specific built-in function support in the > Power back end has not stood the test of time. The machinery is > grossly inefficient, confusing, and arcane; and adding new built-in > functions is inefficient and error-prone. This patch set introduces a > replacement. > > Because of the scope of the changes, it's important to be able to > verify that the new system makes only intended changes to the > functions that are supported. Therefore this patch set adds a new > mechanism, and (in the final patch) enables it instead of the existing > support, but does not yet remove the old support. That will happen in > a follow-up patch once we're comfortable with the new system. > > Most of the patches in this set are specific to the rs6000 back end. > However, the first two patches make changes in common code and require > review from the appropriate maintainers. Jakub and Jeff, I would > appreciate it if you could look at these two small patches. > > After these changes are upstream, adding new built-in functions will > usually be as simple as adding two lines to a file, > rs6000-builtin-new.def, that give the prototype of the function and a > little additional information. Adding new overloaded functions will > require adding a new section to another file, rs6000-overload.def, > with one line describing the overload information, and two lines for > each function to be dispatched to from the overloaded function. > > The patches are divided into the following sections. > > Patches 0001-0002: Common code patches > > Patch 0001 adds a mechanism to the Makefile to allow specifying > additional dependencies for "out_object_file", which is rs6000.o for > the rs6000 back end. I found this necessary to be able to have > rs6000.o depend on a header file generated during the build. > > Patch 0002 expands the gengtype machinery to scan header files > created during the build for GC roots. > > Patches 0003, 0005-0023: Generator program > > A new program, rs6000-gen-builtins, is created and executed during > the build. It reads rs6000-builtin-new.def and rs6000-overload.def > and produces three output files: rs6000-builtins.h, > rs6000-builtins.c, and rs6000-vecdefines.h. rs6000-builtins.h > defines the data structures representing the built-in functions, > overloaded functions, overload instantiations, and function type > specifiers. rs6000-builtins.c contains static initializers for the > data structures, as well as the function rs6000_autoinit_builtins > that performs additional run-time initialization. > rs6000-vecdefines.h contains a set of #defines that map external > identifiers such as vec_add to their internal builtin names, such as > __builtin_vec_add. This replaces most of the similar #defines > previously contained in altivec.h, which now #includes the new file > instead. > > This set of patches adds the source for the generator program. > > Patches 0024-0025: Target build machinery > > These patches make changes to config.gcc and t-rs6000 to build and > run the new generator program, and to ensure that the garbage > collection roots in rs6000-builtins.h are scanned by gengtype. > > Patches 0004, 0026-0031, 0033-0037: Input files > > These patches build up the input files to the generator program, > listing all of the built-in functions and overloads to be > processed. > > Patch 0032: Add pointer types > > This patch creates and caches a bunch of pointer type nodes. The > existing built-in machinery, for some reason, only created base > types up front and created the pointer types on demand (over and > over and over again). The new mechanism needs all the type nodes > available, so we add them here. > > Patch 0038: Call rs6000_autoinit_builtins > > Patch 0039: A little special handling for Darwin > > Patches 0040-0041: Miscellaneous support patches > > Patch 0042: Rewrite the overload processing > > Most of this code remains largely the same as before, with the same > special handling for a few interesting built-in functions. But the > general handling of overloaded functions is now much more efficient > since the new data structures are designed for quick lookup, whereas > the old machinery does a brutal linear search. > > Patch 0043: Rewrite gimple folding > > The "rewrite" here consists entirely of changing the names of the > builtins to be processed, since we need a separate enumeration of > builtins for the new machinery. > > Patch 0044: Vectorization support > > Small updates to the functions used for mapping built-ins to their > vectorized counterparts. > > Patches 0045-0050: Rewrite built-in function expansion > > This is where most of the meat comes in. Lookup of built-ins at > expand time is again much more efficient, replacing the old > mechanism of multiple linear searches over the whole built-in > table. Another major change is that all built-in functions are > always defined, but a test at expand time is used to determine > whether they are enabled. This allows proper handling of > built-ins in the presence of "#pragma target" directives. Also, > handling of special cases is made more uniform using an attribute > system, which I hope makes this much easier to maintain. > > Patches 0051-0052: Miscellaneous changes > > Patch 0053: Debug support > > Small changes here to allow gathering of a little more data from > -mdebug=builtin. I used this to look for differences between > functions defined by the old and new built-in support. > > Patch 0054: Change altivec.h to use rs6000-vecdefines.h > > Patch 0055: Test case adjustments > > Most of these changes are due to automating checks for literal > arguments that must be within a certain range. This gives us more > regular error messages, which don't always match the previous error > messages. There are also some adjustments because altivec.h now > includes rs6000-vecdefines.h. > > Patch 0056: Flip the switch to enable the new support > > Victory is ours... > > Patch 0057: Fix one last late-breaking change > > Keeping the code up-to-date with upstream has been fun. When I > rebased to create the patch set, I found one new issue where a > small change had been made to the overload handling for the > vec_insert builtins. This patch reflects that change into the > new handling. My version of git is having trouble with > interactive rebasing, so it was easier to just add the extra patch. > > Now, with all that done, there are a few things that are not yet > done: > > (1) A future patch will remove the old code. > > (2) There are times where we ought to dispatch an overload to one > function if VSX is available, and to another function if it is not. > We need a general mechanism for allowing conditional dispatch. I've > outlined a method for this in rs6000-overload.def that I want to > implement down the road. > > (3) I want to investigate why vec_mul requires special handling in > rs6000-c.c; it doesn't seem like it should. > > (4) Similarly, can we remove some of the special handling for > vec_adde, vec_addec, vec_sube, and vec_subec? > > (5) The parser in the generator program doesn't yet handle > "escape-newline" sequences for breaking long lines. I should add that > capability. > > (6) Longer term, can we use a similar mechanism for built-in functions > used for all targets in common code? > > > A word about compatibility: > > I deliberately implemented all the old built-ins exactly as previously > defined, wherever possible, despite an overwhelming desire to pitch > out a bunch of them that have already been considered deprecated for > ages. I found that it was too difficult to both implement a new > system and remove deprecated things at the same time, and in general > it seems like a dangerous thing to do. Better to do this in stages if > we're going to do it at all. Unfortunately a lot of deprecated things > still appear all over our own test suite, and I'm afraid we can assume > they appear in user code all over the place as well. > > What I've done instead is to make very clear which interfaces are > considered deprecated in the input files themselves. Over time, > perhaps we can start to remove some of these, but in reality I think > we're going to just continue to be stuck with them. > > Here is a complete list of known incompatibilities with the old > mechanism: > > (1) __builtin_vec_vpopcntu[bdhw] were all registered as overloads but > didn't have any instantiations. Therefore they could not have been > used anywhere, and I haven't implemented them. > > (2) I added ten new built-ins named __builtin_vsx_xxpermx_<type> to be > used for the overloaded vec_xxpermx function, instead of the bloody > hack that was used before. The functionality of vec_xxpermx is > unchanged. > > (3) A number of built-ins used "long" for DImode types, which would > break these for 32-bit. I changed those arguments and return values > to "long long" to avoid such problems, when those built-ins were not > restricted to 64-bit mode already. There aren't many such cases. > > (4) A small handful of builtins didn't have the correct return type to > match the mode of the pattern, so I fixed those. They are all new in > GCC 11 and can't have worked properly. > > (5) I handled the MMA internal functions slightly differently, so that > all the ones with extra vector_quad arguments are listed as such, > rather than having that hacked on during expand time. > > (6) __builtin_vsx_xl_len_r took only a char * rather than a void *; > fixing this was backward compatible. > > (7) __builtin_vsx_splat_2d[fi] were incompletely defined and couldn't > have ever worked; fixed. > > (8) A small handful of builtins weren't marked as "const," but are > obviously const, so I fixed those. > > I've kept a complete list of discrepancies for my records, in case any > issues arise from my misunderstanding something. > > I do want to thank all the people who have contributed to the built-in > design over the years. For all my griping, there are some marvelous > bits in there that I hope I have kept intact. My hope is to make the > whole system much easier to use and maintain going forward. Time will > tell. > > The patches have been bootstrapped and tested on a Power10 > little-endian system, and on a Power8 big-endian system with both 32- > and 64-bit enabled, with no regressions. I'm not crazy enough to > believe I don't have any errors in here, but I have endeavoured to > test and minimize them to the best of my ability. > > Is this series okay for trunk, in GCC 12 stage 1? > > Thanks! > Bill > > Bill Schmidt (57): > Allow targets to specify build dependencies for out_object_file > Support scanning of build-time GC roots in gengtype > rs6000: Initial create of rs6000-gen-builtins.c > rs6000: Add initial input files > rs6000: Add file support and functions for diagnostic support > rs6000: Add helper functions for parsing > rs6000: Add functions for matching types, part 1 of 3 > rs6000: Add functions for matching types, part 2 of 3 > rs6000: Add functions for matching types, part 3 of 3 > rs6000: Red-black tree implementation for balanced tree search > rs6000: Main function with stubs for parsing and output > rs6000: Parsing built-in input file, part 1 of 3 > rs6000: Parsing built-in input file, part 2 of 3 > rs6000: Parsing built-in input file, part 3 of 3 > rs6000: Parsing of overload input file > rs6000: Build and store function type identifiers > rs6000: Write output to the builtin definition include file > rs6000: Write output to the builtins header file > rs6000: Write output to the builtins init file, part 1 of 3 > rs6000: Write output to the builtins init file, part 2 of 3 > rs6000: Write output to the builtins init file, part 3 of 3 > rs6000: Write static initializations for built-in table > rs6000: Write static initializations for overload tables > rs6000: Incorporate new builtins code into the build machinery > rs6000: Add gengtype handling to the build machinery > rs6000: Add the rest of the [altivec] stanza to the builtins file > rs6000: Add VSX builtins > rs6000: Add available-everywhere and ancient builtins > rs6000: Add power7 and power7-64 builtins > rs6000: Add power8-vector builtins > rs6000: Add Power9 builtins > rs6000: Add more type nodes to support builtin processing > rs6000: Add Power10 builtins > rs6000: Add MMA builtins > rs6000: Add miscellaneous builtins > rs6000: Add Cell builtins > rs6000: Add remaining overloads > rs6000: Execute the automatic built-in initialization code > rs6000: Darwin builtin support > rs6000: Add sanity to V2DI_type_node definitions > rs6000: Always initialize vector_pair and vector_quad nodes > rs6000: Handle overloads during program parsing > rs6000: Handle gimple folding of target built-ins > rs6000: Support for vectorizing built-in functions > rs6000: Builtin expansion, part 1 > rs6000: Builtin expansion, part 2 > rs6000: Builtin expansion, part 3 > rs6000: Builtin expansion, part 4 > rs6000: Builtin expansion, part 5 > rs6000: Builtin expansion, part 6 > rs6000: Update rs6000_builtin_decl > rs6000: Miscellaneous uses of rs6000_builtin_decls_x > rs6000: Debug support > rs6000: Update altivec.h for automated interfaces > rs6000: Test case adjustments > rs6000: Enable the new builtin support > rs6000: Adjust to late-breaking change > > gcc/Makefile.in | 8 +- > gcc/config.gcc | 2 + > gcc/config/rs6000/altivec.h | 516 +- > gcc/config/rs6000/darwin.h | 8 +- > gcc/config/rs6000/rbtree.c | 233 + > gcc/config/rs6000/rbtree.h | 51 + > gcc/config/rs6000/rs6000-builtin-new.def | 3875 +++++++++++ > gcc/config/rs6000/rs6000-c.c | 1083 +++ > gcc/config/rs6000/rs6000-call.c | 3371 ++++++++- > gcc/config/rs6000/rs6000-gen-builtins.c | 2997 ++++++++ > gcc/config/rs6000/rs6000-overload.def | 6076 +++++++++++++++++ > gcc/config/rs6000/rs6000.c | 219 +- > gcc/config/rs6000/rs6000.h | 82 + > gcc/config/rs6000/t-rs6000 | 44 +- > gcc/gengtype-state.c | 29 +- > gcc/gengtype.c | 19 +- > gcc/gengtype.h | 5 + > .../powerpc/bfp/scalar-extract-exp-2.c | 2 +- > .../powerpc/bfp/scalar-extract-sig-2.c | 2 +- > .../powerpc/bfp/scalar-insert-exp-2.c | 2 +- > .../powerpc/bfp/scalar-insert-exp-5.c | 2 +- > .../powerpc/bfp/scalar-insert-exp-8.c | 2 +- > .../powerpc/bfp/scalar-test-neg-2.c | 2 +- > .../powerpc/bfp/scalar-test-neg-3.c | 2 +- > .../powerpc/bfp/scalar-test-neg-5.c | 2 +- > .../gcc.target/powerpc/byte-in-set-2.c | 2 +- > gcc/testsuite/gcc.target/powerpc/cmpb-2.c | 2 +- > gcc/testsuite/gcc.target/powerpc/cmpb32-2.c | 2 +- > .../gcc.target/powerpc/crypto-builtin-2.c | 14 +- > .../powerpc/fold-vec-splat-floatdouble.c | 4 +- > .../powerpc/fold-vec-splat-longlong.c | 10 +- > .../powerpc/fold-vec-splat-misc-invalid.c | 8 +- > .../gcc.target/powerpc/p8vector-builtin-8.c | 1 + > gcc/testsuite/gcc.target/powerpc/pr80315-1.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr80315-2.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr80315-3.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr80315-4.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr88100.c | 12 +- > .../gcc.target/powerpc/pragma_misc9.c | 2 +- > .../gcc.target/powerpc/pragma_power8.c | 2 + > .../gcc.target/powerpc/pragma_power9.c | 3 + > .../powerpc/test_fpscr_drn_builtin_error.c | 4 +- > .../powerpc/test_fpscr_rn_builtin_error.c | 12 +- > gcc/testsuite/gcc.target/powerpc/test_mffsl.c | 3 +- > gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c | 2 +- > .../gcc.target/powerpc/vsu/vec-all-nez-7.c | 2 +- > .../gcc.target/powerpc/vsu/vec-any-eqz-7.c | 2 +- > .../gcc.target/powerpc/vsu/vec-cmpnez-7.c | 2 +- > .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c | 2 +- > .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c | 2 +- > .../gcc.target/powerpc/vsu/vec-xl-len-13.c | 2 +- > .../gcc.target/powerpc/vsu/vec-xst-len-12.c | 2 +- > 52 files changed, 17915 insertions(+), 824 deletions(-) > create mode 100644 gcc/config/rs6000/rbtree.c > create mode 100644 gcc/config/rs6000/rbtree.h > create mode 100644 gcc/config/rs6000/rs6000-builtin-new.def > create mode 100644 gcc/config/rs6000/rs6000-gen-builtins.c > create mode 100644 gcc/config/rs6000/rs6000-overload.def >
On Tue, May 11, 2021 at 10:57:56AM -0500, Bill Schmidt wrote: > Hi! I'd like to ping this series. This is a big change, so I'd like to > get it committed fairly early in stage 1. I know you have a lot stacked > up, though. I haven't received most of this series (only the last three patches). I'll dig it up from the archives. Segher
Hi! Just a few small things about this -- I'll reply to more of it later. On Tue, Apr 27, 2021 at 10:32:35AM -0500, Bill Schmidt via Gcc-patches wrote: > The design of the target-specific built-in function support in the > Power back end has not stood the test of time. The machinery is > grossly inefficient, confusing, and arcane; and adding new built-in > functions is inefficient and error-prone. You are too nice to it. People have had to work with it over the years, there is some pent-up anger :-) > Because of the scope of the changes, it's important to be able to > verify that the new system makes only intended changes to the > functions that are supported. Therefore this patch set adds a new > mechanism, and (in the final patch) enables it instead of the existing > support, but does not yet remove the old support. That will happen in > a follow-up patch once we're comfortable with the new system. Is there some (semi-)automatic way to compare the results of the old and new systems? > Patch 0057: Fix one last late-breaking change > > Keeping the code up-to-date with upstream has been fun. When I > rebased to create the patch set, I found one new issue where a > small change had been made to the overload handling for the > vec_insert builtins. This patch reflects that change into the > new handling. My version of git is having trouble with > interactive rebasing, so it was easier to just add the extra patch. What breaks by keeping this fix after the other patches? > I deliberately implemented all the old built-ins exactly as previously > defined, wherever possible, despite an overwhelming desire to pitch > out a bunch of them that have already been considered deprecated for > ages. I found that it was too difficult to both implement a new > system and remove deprecated things at the same time, and in general > it seems like a dangerous thing to do. Better to do this in stages if > we're going to do it at all. Independent fixes you can put before the meat of the series. This often is the best way to do it, since then you don't have to duplicate the weird / buggy / whatever behaviour of the old system. But things that aren't simple fixes, that need deprecation periods and everything... no no no, you want this done *this* decade! > Unfortunately a lot of deprecated things > still appear all over our own test suite, and I'm afraid we can assume > they appear in user code all over the place as well. Pretty much the only old features you can remove are features that have been broken for many years. You can break something on purpose to see if anyone still uses it, too, but :-) > What I've done instead is to make very clear which interfaces are > considered deprecated in the input files themselves. Over time, > perhaps we can start to remove some of these, but in reality I think > we're going to just continue to be stuck with them. It is extremely useful to have it clearly documented which interfaces shoul;d be considered deprecated, even if we will never remove it. It is useful in the documentation, but it is even more useful in the code, for ourselves! > (3) A number of built-ins used "long" for DImode types, which would > break these for 32-bit. I changed those arguments and return values > to "long long" to avoid such problems, when those built-ins were not > restricted to 64-bit mode already. There aren't many such cases. You can do this for 64-bit-only builtins as well -- the actual argument type is never visible (to the user), and everything becomes modes early. Segher
On 5/20/21 4:57 PM, Segher Boessenkool wrote: > Is there some (semi-)automatic way to compare the results of the old > and new systems? Yes...very "semi". There's a patch in the series that updates the information printed from -mdebug=builtin. I use this to print the builtins generated by the old and new versions, and run them through a Python script to look for anything missing or added, any mismatches in the type system, mismatches in const/pure/etc., that sort of thing. I found quite a lot of mistakes this way and got them fixed ahead of time. In the script I maintain a table of expected differences, where there were just errors before, etc., and there's a short doc string with each of them. So I can go back and look at that if we run into any discrepancies in the future. The script doesn't have long-term value beyond that; once we delete the old version in a future patch, it won't have anything to compare against. >> Patch 0057: Fix one last late-breaking change >> >> Keeping the code up-to-date with upstream has been fun. When I >> rebased to create the patch set, I found one new issue where a >> small change had been made to the overload handling for the >> vec_insert builtins. This patch reflects that change into the >> new handling. My version of git is having trouble with >> interactive rebasing, so it was easier to just add the extra patch. > What breaks by keeping this fix after the other patches? Nothing truly breaks. There is a VSX-efficient implementation of vec_insert and a less-efficient implementation. Late in GCC 11, somebody recognized the VSX-efficient implementation could be used in more situations. Without this patch, we still use the less-efficient implementation in the new version. > >> (3) A number of built-ins used "long" for DImode types, which would >> break these for 32-bit. I changed those arguments and return values >> to "long long" to avoid such problems, when those built-ins were not >> restricted to 64-bit mode already. There aren't many such cases. > You can do this for 64-bit-only builtins as well -- the actual argument > type is never visible (to the user), and everything becomes modes early. > Yep -- in fact, the old system generally used a (V2)DImode type for such things, and I've just translated all of those to (vector) long long in the new prototypes. Thanks for the ongoing review! Bill