Message ID | 87egn5yis1.fsf@e105548-lin.cambridge.arm.com |
---|---|
State | New |
Headers | show |
On 04/27/2015 04:20 AM, Richard Sandiford wrote: > I think it's been the case for a while that parallel builds of GCC tend > to serialise around the compilation of insn-recog.c, especially with > higher --enable-checking settings. This patch tries to speed that > up by replacing most of genrecog with a new algorithm. > > I think the main problems with the current code are: > > 1. Vector architectures have added lots of new instructions that have > a similar shape and differ only in mode, code or unspec number. > The current algorithm doesn't have any way of factoring out those > similarities. > > 2. When matching a particular instruction, the current code examines > everything about a SET_DEST before moving on to the SET_SRC. This has > two subproblems: > > 2a. The destination of a SET isn't very distinctive. It's usually > just a register_operand, a memory_operand, a nonimmediate_operand > or a flags register. We therefore tend to backtrack to the > SET_DEST a lot, oscillating between groups of instructions with > the same kind of destination. > > 2b. Backtracking through predicate checks is relatively expensive. > It would be good to narrow down the "shape" of the instruction > first and only then check the predicates. (The backtracking is > expensive in terms of insn-recog.o compile time too, both because > we need to copy into argument registers and out of the result > register, and because it adds more sites where spills are needed.) > > 3. The code keeps one local variable per rtx depth, so it ends up > loading the same rtx many times over (mostly when backtracking). > This is very expensive in rtl-checking builds because each XEXP > includes a code check and a line-specific failure call. > > In principle the idea of having one local variable per depth > is good. But it was originally written that way when all optimisations > were done at the rtl level and I imagine each local variable mapped > to one pseudo register. These days the statements that reload the > value needed on backtracking lead to many more SSA names and phi > statements than you'd get with just a single variable per position > (loaded once, so naturally SSA). There is still the potential benefit > of avoiding having sibling rtxes live at once, but fixing (2) above > reduces that problem. > > Also, the code is all goto-based, which makes it rather hard to step through. > > The patch deals with these as follows: > > 1. Detect subpatterns that differ only by mode, code and/or integer > (e.g. unspec number) and split them out into a common routine. > > 2. Match the "shape" of the instruction first, in terms of codes, > integers and vector lengths, and only then check the modes, predicates > and dups. When checking the shape, handle SET_SRCs before SET_DESTs. > In practice this seems to greatly reduce the amount of backtracking. > > 3. Have one local variable per rtx position. I tested the patch with > and without the change and it helped a lot with rtl-checking builds > without seeming to affect release builds much either way. > > As far as debuggability goes, the new code avoids gotos and just > uses "natural" control flow. > > The headline stat is that a stage 3 --enable-checking=yes,rtl,df > build of insn-recog.c on my box goes from 7m43s to 2m2s (using the > same stage 2 compiler). The corresponding --enable-checking=release > change is from 49s to 24s (less impressive, as expected). > > The patch seems to speed up recog. E.g. the time taken to build > fold-const.ii goes from 6.74s before the patch to 6.69s after it; > not a big speed-up, but reproducible. [ big snip ] I don't see anything that makes me think we don't want this :-) It's interesting to note that the regressions in loadable size of a release-checking insn-recog aren't any mainstream ports. Hell, I'd probably claim they're all fringe ports (iq2000, m68k, mcore, microblaze, pdp11, rx) > Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi. > Also tested by building the testsuite for each of the targets above > and making sure there were no assembly differences. Made sure that no > objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench > perl.o and cactusADM datestamp.o, where the differences are timestamps). > OK to install? To be very blunt, I'm probably not capable of reviewing the new code. You're going to know it best and you should probably own it. Given your history with gcc and attention to detail, I'm comfortable with approving this knowing you'll deal with any issues as they arise. The one thing I would ask is that you make sure to include the recently added matching constraint checking bits in genrecog. I'm assuming all the older sanity checks that have been in genrecog for a longer period of time have been kept. Jeff
> I think it's been the case for a while that parallel builds of GCC tend > to serialise around the compilation of insn-recog.c, especially with > higher --enable-checking settings. This patch tries to speed that > up by replacing most of genrecog with a new algorithm. I can confirm this, especially with --enable-checking=rtl. > Also, the code is all goto-based, which makes it rather hard to step > through. Do you mean the code in genrecog.c or the generated code in insn-recog.c? > The patch deals with these as follows: > > 1. Detect subpatterns that differ only by mode, code and/or integer > (e.g. unspec number) and split them out into a common routine. > > 2. Match the "shape" of the instruction first, in terms of codes, > integers and vector lengths, and only then check the modes, predicates > and dups. When checking the shape, handle SET_SRCs before SET_DESTs. > In practice this seems to greatly reduce the amount of backtracking. > > 3. Have one local variable per rtx position. I tested the patch with > and without the change and it helped a lot with rtl-checking builds > without seeming to affect release builds much either way. > > As far as debuggability goes, the new code avoids gotos and just > uses "natural" control flow. See below. > The headline stat is that a stage 3 --enable-checking=yes,rtl,df > build of insn-recog.c on my box goes from 7m43s to 2m2s (using the > same stage 2 compiler). The corresponding --enable-checking=release > change is from 49s to 24s (less impressive, as expected). The first figure is quite impressive, great work! > PS. I've attached the new genrecog.c since the diff version is unreadable. Small request: you didn't change a single line of the head comment, yet the size of the file is doubled. Could you add a sketch of the algorithm to the head comment, e.g. by copy-and-pasting the above part of your message? The old code contained a couple of DEBUG_FUNCTIONs but the new one doesn't.
Eric Botcazou <ebotcazou@adacore.com> writes: >> Also, the code is all goto-based, which makes it rather hard to step >> through. > > Do you mean the code in genrecog.c or the generated code in insn-recog.c? The generated code. genrecog.c itself isn't bad. :-) >> PS. I've attached the new genrecog.c since the diff version is unreadable. > > Small request: you didn't change a single line of the head comment, yet the > size of the file is doubled. Could you add a sketch of the algorithm to the > head comment, e.g. by copy-and-pasting the above part of your message? OK. I'd left the head comment alone because it just described the interface, which hasn't changed. But I suppose past lack of commentary doesn't justify future lack of commentary. Here's what I added: At a high level, the algorithm used in this file is as follows: 1. Build up a decision tree for each routine, using the following approach to matching an rtx: - First determine the "shape" of the rtx, based on GET_CODE, XVECLEN and XINT. This phase examines SET_SRCs before SET_DESTs since SET_SRCs tend to be more distinctive. It examines other operands in numerical order, since the canonicalization rules prefer putting complex operands of commutative operators first. - Next check modes and predicates. This phase examines all operands in numerical order, even for SETs, since the mode of a SET_DEST is exact while the mode of a SET_SRC can be VOIDmode for constant integers. - Next check match_dups. - Finally check the C condition and (where appropriate) pnum_clobbers. 2. Try to optimize the tree by removing redundant tests, CSEing tests, folding tests together, etc. 3. Look for common subtrees and split them out into "pattern" routines. These common subtrees can be identical or they can differ in mode, code, or integer (usually an UNSPEC or UNSPEC_VOLATILE code). In the latter case the users of the pattern routine pass the appropriate mode, etc., as argument. For example, if two patterns contain: (plus:SI (match_operand:SI 1 "register_operand") (match_operand:SI 2 "register_operand")) we can split the associated matching code out into a subroutine. If a pattern contains: (minus:DI (match_operand:DI 1 "register_operand") (match_operand:DI 2 "register_operand")) then we can consider using the same matching routine for both the plus and minus expressions, passing PLUS and SImode in the former case and MINUS and DImode in the latter case. The main aim of this phase is to reduce the compile time of the insn-recog.c code and to reduce the amount of object code in insn-recog.o. 4. Split the matching trees into functions, trying to limit the size of each function to a sensible amount. Again, the main aim of this phase is to reduce the compile time of insn-recog.c. (It doesn't help with the size of insn-recog.o.) 5. Write out C++ code for each function. BTW, hope at least part of the doubling in size is due to more commentary in the code itself. > The old code contained a couple of DEBUG_FUNCTIONs but the new one doesn't. Yeah, but I don't know how useful they were. I had to debug the old code several times and never used them. I'd rather leave stuff like that to someone who wants it rather than try to write routines speculatively in the hope that someone would find them useful. Thanks, Richard
Jeff Law <law@redhat.com> writes: > On 04/27/2015 04:20 AM, Richard Sandiford wrote: >> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi. >> Also tested by building the testsuite for each of the targets above >> and making sure there were no assembly differences. Made sure that no >> objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench >> perl.o and cactusADM datestamp.o, where the differences are timestamps). >> OK to install? > To be very blunt, I'm probably not capable of reviewing the new code. > You're going to know it best and you should probably own it. > > Given your history with gcc and attention to detail, I'm comfortable > with approving this knowing you'll deal with any issues as they arise. Thanks, applied. > The one thing I would ask is that you make sure to include the recently > added matching constraint checking bits in genrecog. I'm assuming all > the older sanity checks that have been in genrecog for a longer period > of time have been kept. Yeah, Chen Gang's changes are all still in there. All the older checks should still be in there too. Richard
On Mon, Apr 27, 2015 at 6:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote: > I think it's been the case for a while that parallel builds of GCC tend > to serialise around the compilation of insn-recog.c, especially with > higher --enable-checking settings. This patch tries to speed that > up by replacing most of genrecog with a new algorithm. > > I think the main problems with the current code are: > > 1. Vector architectures have added lots of new instructions that have > a similar shape and differ only in mode, code or unspec number. > The current algorithm doesn't have any way of factoring out those > similarities. > > 2. When matching a particular instruction, the current code examines > everything about a SET_DEST before moving on to the SET_SRC. This has > two subproblems: > > 2a. The destination of a SET isn't very distinctive. It's usually > just a register_operand, a memory_operand, a nonimmediate_operand > or a flags register. We therefore tend to backtrack to the > SET_DEST a lot, oscillating between groups of instructions with > the same kind of destination. > > 2b. Backtracking through predicate checks is relatively expensive. > It would be good to narrow down the "shape" of the instruction > first and only then check the predicates. (The backtracking is > expensive in terms of insn-recog.o compile time too, both because > we need to copy into argument registers and out of the result > register, and because it adds more sites where spills are needed.) > > 3. The code keeps one local variable per rtx depth, so it ends up > loading the same rtx many times over (mostly when backtracking). > This is very expensive in rtl-checking builds because each XEXP > includes a code check and a line-specific failure call. > > In principle the idea of having one local variable per depth > is good. But it was originally written that way when all optimisations > were done at the rtl level and I imagine each local variable mapped > to one pseudo register. These days the statements that reload the > value needed on backtracking lead to many more SSA names and phi > statements than you'd get with just a single variable per position > (loaded once, so naturally SSA). There is still the potential benefit > of avoiding having sibling rtxes live at once, but fixing (2) above > reduces that problem. > > Also, the code is all goto-based, which makes it rather hard to step through. > > The patch deals with these as follows: > > 1. Detect subpatterns that differ only by mode, code and/or integer > (e.g. unspec number) and split them out into a common routine. > > 2. Match the "shape" of the instruction first, in terms of codes, > integers and vector lengths, and only then check the modes, predicates > and dups. When checking the shape, handle SET_SRCs before SET_DESTs. > In practice this seems to greatly reduce the amount of backtracking. > > 3. Have one local variable per rtx position. I tested the patch with > and without the change and it helped a lot with rtl-checking builds > without seeming to affect release builds much either way. > > As far as debuggability goes, the new code avoids gotos and just > uses "natural" control flow. > > The headline stat is that a stage 3 --enable-checking=yes,rtl,df > build of insn-recog.c on my box goes from 7m43s to 2m2s (using the > same stage 2 compiler). The corresponding --enable-checking=release > change is from 49s to 24s (less impressive, as expected). > > The patch seems to speed up recog. E.g. the time taken to build > fold-const.ii goes from 6.74s before the patch to 6.69s after it; > not a big speed-up, but reproducible. > > Here's a comparison of the number of lines of code in insn-recog.c > before and after the patch on one target per config/ CPU: > > aarch64-linux-gnueabi 115526 38169 : 33.04% > alpha-linux-gnu 24479 10740 : 43.87% > arm-linux-gnueabi 169208 67759 : 40.04% > avr-rtems 55647 22127 : 39.76% > bfin-elf 13928 6498 : 46.65% > c6x-elf 29928 13324 : 44.52% > cr16-elf 2650 1419 : 53.55% > cris-elf 18669 7257 : 38.87% > epiphany-elf 19308 6131 : 31.75% > fr30-elf 2204 1112 : 50.45% > frv-linux-gnu 13541 5950 : 43.94% > h8300-elf 19584 9327 : 47.63% > hppa64-hp-hpux11.23 18299 8549 : 46.72% > ia64-linux-gnu 37629 17101 : 45.45% > iq2000-elf 2752 1609 : 58.47% > lm32-elf 1536 872 : 56.77% > m32c-elf 10040 4145 : 41.28% > m32r-elf 4436 2307 : 52.01% > m68k-linux-gnu 15739 8147 : 51.76% > mcore-elf 4816 2577 : 53.51% > mep-elf 67335 15929 : 23.66% > microblaze-elf 2656 1587 : 59.75% > mips-linux-gnu 54543 24186 : 44.34% > mmix 2597 1487 : 57.26% > mn10300-elf 6384 3294 : 51.60% > moxie-elf 1311 659 : 50.27% > msp430-elf 6054 2382 : 39.35% > nds32le-elf 5953 3152 : 52.95% > nios2-linux-gnu 3735 2143 : 57.38% > pdp11 2137 1157 : 54.14% > powerpc-eabispe 109322 40582 : 37.12% > powerpc-linux-gnu 82976 32192 : 38.80% > rl78-elf 5321 2432 : 45.71% > rx-elf 14454 7534 : 52.12% > s390-linux-gnu 48487 20454 : 42.18% > sh-linux-gnu 104087 41820 : 40.18% > sparc-linux-gnu 21912 10509 : 47.96% > spu-elf 19937 8182 : 41.04% > tilegx-elf 15412 6970 : 45.22% > tilepro-elf 11998 5479 : 45.67% > v850-elf 8725 4438 : 50.87% > vax-netbsdelf 4537 2410 : 53.12% > visium-elf 15190 7224 : 47.56% > x86_64-darwin 346396 119593 : 34.52% > xstormy16-elf 4660 2229 : 47.83% > xtensa-elf 2799 1514 : 54.09% > > Here's the loadable size of insn-recog.o in an --enable-checking=release > build on an x86_64-linux-gnu box: > > aarch64-linux-gnueabi 443955 298026 : 67.13% > alpha-linux-gnu 97194 80893 : 83.23% > arm-linux-gnueabi 782325 632248 : 80.82% > avr-rtems 226827 159763 : 70.43% > bfin-elf 52563 42376 : 80.62% > c6x-elf 112512 90142 : 80.12% > cr16-elf 10446 10006 : 95.79% > cris-elf 74771 52634 : 70.39% > epiphany-elf 87577 52284 : 59.70% > fr30-elf 8041 7713 : 95.92% > frv-linux-gnu 53699 47543 : 88.54% > h8300-elf 70708 66274 : 93.73% > hppa64-hp-hpux11.23 71597 57484 : 80.29% > ia64-linux-gnu 147286 130632 : 88.69% > iq2000-elf 11002 11774 : 107.02% > lm32-elf 5894 5798 : 98.37% > m32c-elf 36563 28855 : 78.92% > m32r-elf 17252 16910 : 98.02% > m68k-linux-gnu 58248 59781 : 102.63% > mcore-elf 17708 18948 : 107.00% > mep-elf 314466 150771 : 47.95% > microblaze-elf 10257 10534 : 102.70% > mips-linux-gnu 230991 191155 : 82.75% > mmix 10782 10678 : 99.04% > mn10300-elf 24035 22802 : 94.87% > moxie-elf 4622 4198 : 90.83% > msp430-elf 21707 16539 : 76.19% > nds32le-elf 22041 19444 : 88.22% > nios2-linux-gnu 15595 13238 : 84.89% > pdp11 7630 8254 : 108.18% > powerpc-eabispe 430816 308481 : 71.60% > powerpc-linux-gnu 317738 248534 : 78.22% > rl78-elf 18904 16329 : 86.38% > rx-elf 55015 56632 : 102.94% > s390-linux-gnu 190584 148961 : 78.16% > sh-linux-gnu 408446 307927 : 75.39% > sparc-linux-gnu 91016 80640 : 88.60% > spu-elf 80387 69151 : 86.02% > tilegx-elf 63815 49977 : 78.32% > tilepro-elf 51763 39252 : 75.83% > v850-elf 32812 28462 : 86.74% > vax-netbsdelf 18350 18259 : 99.50% > visium-elf 56872 46790 : 82.27% > x86_64-darwin 1306182 883169 : 67.61% > xstormy16-elf 17044 14430 : 84.66% > xtensa-elf 10780 9678 : 89.78% > > The same for --enable-checking=yes,rtl: > > aarch64-linux-gnueabi 1790272 507488 : 28.35% > alpha-linux-gnu 440058 193826 : 44.05% > arm-linux-gnueabi 2845568 1299337 : 45.66% > avr-rtems 885672 294387 : 33.24% > bfin-elf 280606 142836 : 50.90% > c6x-elf 486345 259256 : 53.31% > cr16-elf 46626 20044 : 42.99% > cris-elf 426813 144414 : 33.84% > epiphany-elf 353078 122166 : 34.60% > fr30-elf 40414 21042 : 52.07% > frv-linux-gnu 259550 111335 : 42.90% > h8300-elf 355199 158411 : 44.60% > hppa64-hp-hpux11.23 340584 149009 : 43.75% > ia64-linux-gnu 661364 293710 : 44.41% > iq2000-elf 41123 26709 : 64.95% > lm32-elf 20370 14781 : 72.56% > m32c-elf 174344 62000 : 35.56% > m32r-elf 74357 41773 : 56.18% > m68k-linux-gnu 275733 117445 : 42.59% > mcore-elf 85180 48018 : 56.37% > mep-elf 1450168 376020 : 25.93% > microblaze-elf 44189 26295 : 59.51% > mips-linux-gnu 876650 375753 : 42.86% > mmix 49882 25363 : 50.85% > mn10300-elf 128148 66768 : 52.10% > moxie-elf 23388 9011 : 38.53% > msp430-elf 114200 34426 : 30.15% > nds32le-elf 101416 73677 : 72.65% > nios2-linux-gnu 58799 29825 : 50.72% > pdp11 32836 18557 : 56.51% > powerpc-eabispe 1976098 626942 : 31.73% > powerpc-linux-gnu 1510652 526841 : 34.88% > rl78-elf 93675 40538 : 43.28% > rx-elf 279748 137284 : 49.07% > s390-linux-gnu 857009 316494 : 36.93% > sh-linux-gnu 2154337 806571 : 37.44% > sparc-linux-gnu 367682 169019 : 45.97% > spu-elf 341945 135629 : 39.66% > tilegx-elf 235480 112103 : 47.61% > tilepro-elf 246231 104137 : 42.29% > v850-elf 158028 72875 : 46.12% > vax-netbsdelf 85057 37578 : 44.18% > visium-elf 257148 103331 : 40.18% > x86_64-darwin 5514235 1721777 : 31.22% > xstormy16-elf 83456 46128 : 55.27% > xtensa-elf 52652 29880 : 56.75% > > Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi. > Also tested by building the testsuite for each of the targets above > and making sure there were no assembly differences. Made sure that no > objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench > perl.o and cactusADM datestamp.o, where the differences are timestamps). > OK to install? > > Thanks, > Richard > > PS. I've attached the new genrecog.c since the diff version is unreadable. > > > gcc/ > * Makefile.in (build/genrecog.o): Depend on inchash.h. > (build/genrecog$(build_exeext): Depend on build/hash-table.o and > build/inchash.o > * genrecog.c: Rewrite most of the code except for the third page. > > Index: gcc/Makefile.in > =================================================================== > --- gcc/Makefile.in 2015-04-27 10:42:57.783191573 +0100 > +++ gcc/Makefile.in 2015-04-27 10:43:42.878643078 +0100 > @@ -2527,7 +2527,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H > build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ > coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H) > build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ > - coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h > + coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h \ > + $(HASH_TABLE_H) inchash.h > build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF) \ > $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h > build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ > @@ -2559,6 +2560,8 @@ genprog = $(genprogerr) check checksum c > # These programs need libs over and above what they get from the above list. > build/genautomata$(build_exeext) : BUILD_LIBS += -lm > > +build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o > + > # For stage1 and when cross-compiling use the build libcpp which is > # built with NLS disabled. For stage2+ use the host library and > # its dependencies. > Hi Richard, I noticed that this patch caused ICE for gcc.target/arm/mmx-2.c on arm-none-linux-gnueabi. Could you please have a look at it? The log message is as below, /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c: In function 'foo': /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1: error: unrecognizable insn: (insn 541 540 542 2 (set (reg:V4HI 512 [ D.4809 ]) (vec_merge:V4HI (vec_select:V4HI (reg:V4HI 510 [ D.4806 ]) (parallel [ (const_int 2 [0x2]) (const_int 0 [0]) (const_int 3 [0x3]) (const_int 1 [0x1]) ])) (vec_select:V4HI (reg:V4HI 511 [ D.4806 ]) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) (const_int 1 [0x1]) (const_int 3 [0x3]) ])) (const_int 5 [0x5]))) /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:159 -1 (nil)) /projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1: internal compiler error: in extract_insn, at recog.c:2341 0xa42d2a _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /projects/.../src/gcc/gcc/rtl-error.c:110 0xa42d59 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /projects/.../src/gcc/gcc/rtl-error.c:118 0xa15ff7 extract_insn(rtx_insn*) /projects/.../src/gcc/gcc/recog.c:2341 0x7ffb42 instantiate_virtual_regs_in_insn /projects/.../src/gcc/gcc/function.c:1598 0x7ffb42 instantiate_virtual_regs /projects/.../src/gcc/gcc/function.c:1966 0x7ffb42 execute /projects/.../src/gcc/gcc/function.c:2015 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. GCC is configured with gcc/configure --target=arm-none-linux-gnueabi --prefix= --with-sysroot=... --enable-shared --disable-libsanitizer --disable-libssp --disable-libmudflap --with-plugin-ld=arm-none-linux-gnueabi-ld --enable-checking=yes --enable-languages=c,c++,fortran --with-gmp=... --with-mpfr=... --with-mpc=... --with-isl=... --with-cloog=... --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=softfp --with-arch=armv7-a Thanks, bin
Richard Sandiford <richard.sandiford@arm.com> writes: > /* Represents a test and the action that should be taken on the result. > If a transition exists for the test outcome, the machine switches > to the transition's target state. If no suitable transition exists, > the machine either falls through to the next decision or, if there are no > more decisions to try, fails the match. */ > struct decision : list_head <transition> > { > decision (const test &); > > void set_parent (list_head <decision> *s); > bool if_statement_p (uint64_t * = 0) const; > > /* The state to which this decision belongs. */ > state *s; > > /* Links to other decisions in the same state. */ > decision *prev, *next; > > /* The test to perform. */ > struct test test; > }; ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test' ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test' Bootstrap compiler is gcc 4.3.4. Andreas.
Andreas Schwab <schwab@linux-m68k.org> writes: > Richard Sandiford <richard.sandiford@arm.com> writes: > >> /* Represents a test and the action that should be taken on the result. >> If a transition exists for the test outcome, the machine switches >> to the transition's target state. If no suitable transition exists, >> the machine either falls through to the next decision or, if there are no >> more decisions to try, fails the match. */ >> struct decision : list_head <transition> >> { >> decision (const test &); >> >> void set_parent (list_head <decision> *s); >> bool if_statement_p (uint64_t * = 0) const; >> >> /* The state to which this decision belongs. */ >> state *s; >> >> /* Links to other decisions in the same state. */ >> decision *prev, *next; >> >> /* The test to perform. */ >> struct test test; >> }; > > ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test' > ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test' > > Bootstrap compiler is gcc 4.3.4. Bah. Does it like "::test test" instead of "struct test test"? Richard
> The generated code. genrecog.c itself isn't bad. :-) Nice work then. > OK. I'd left the head comment alone because it just described the > interface, which hasn't changed. But I suppose past lack of commentary > doesn't justify future lack of commentary. Here's what I added: > [...] > BTW, hope at least part of the doubling in size is due to more commentary > in the code itself. I see. Thanks a lot for writing down the description of the algorithm! > I'd rather leave stuff like that to someone who wants it rather than try > to write routines speculatively in the hope that someone would find them > useful. OK.
Richard Sandiford <richard.sandiford@arm.com> writes: > Andreas Schwab <schwab@linux-m68k.org> writes: >> Richard Sandiford <richard.sandiford@arm.com> writes: >> >>> /* Represents a test and the action that should be taken on the result. >>> If a transition exists for the test outcome, the machine switches >>> to the transition's target state. If no suitable transition exists, >>> the machine either falls through to the next decision or, if there are no >>> more decisions to try, fails the match. */ >>> struct decision : list_head <transition> >>> { >>> decision (const test &); >>> >>> void set_parent (list_head <decision> *s); >>> bool if_statement_p (uint64_t * = 0) const; >>> >>> /* The state to which this decision belongs. */ >>> state *s; >>> >>> /* Links to other decisions in the same state. */ >>> decision *prev, *next; >>> >>> /* The test to perform. */ >>> struct test test; >>> }; >> >> ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test' >> ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test' >> >> Bootstrap compiler is gcc 4.3.4. > > Bah. Does it like "::test test" instead of "struct test test"? Same error. Andreas.
On Thu, Apr 30, 2015 at 2:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote: > Richard Sandiford <richard.sandiford@arm.com> writes: > >> Andreas Schwab <schwab@linux-m68k.org> writes: >>> Richard Sandiford <richard.sandiford@arm.com> writes: >>> >>>> /* Represents a test and the action that should be taken on the result. >>>> If a transition exists for the test outcome, the machine switches >>>> to the transition's target state. If no suitable transition exists, >>>> the machine either falls through to the next decision or, if there are no >>>> more decisions to try, fails the match. */ >>>> struct decision : list_head <transition> >>>> { >>>> decision (const test &); >>>> >>>> void set_parent (list_head <decision> *s); >>>> bool if_statement_p (uint64_t * = 0) const; >>>> >>>> /* The state to which this decision belongs. */ >>>> state *s; >>>> >>>> /* Links to other decisions in the same state. */ >>>> decision *prev, *next; >>>> >>>> /* The test to perform. */ >>>> struct test test; >>>> }; >>> >>> ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test' >>> ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from 'struct test' >>> >>> Bootstrap compiler is gcc 4.3.4. >> >> Bah. Does it like "::test test" instead of "struct test test"? > > Same error. You have to use a different name I believe (or -fpermissive). Richard. > Andreas. > > -- > Andreas Schwab, schwab@linux-m68k.org > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 > "And now for something completely different."
Richard Biener <richard.guenther@gmail.com> writes: > On Thu, Apr 30, 2015 at 2:08 PM, Andreas Schwab <schwab@linux-m68k.org> wrote: >> Richard Sandiford <richard.sandiford@arm.com> writes: >> >>> Andreas Schwab <schwab@linux-m68k.org> writes: >>>> Richard Sandiford <richard.sandiford@arm.com> writes: >>>> >>>>> /* Represents a test and the action that should be taken on the result. >>>>> If a transition exists for the test outcome, the machine switches >>>>> to the transition's target state. If no suitable transition exists, >>>>> the machine either falls through to the next decision or, if there are no >>>>> more decisions to try, fails the match. */ >>>>> struct decision : list_head <transition> >>>>> { >>>>> decision (const test &); >>>>> >>>>> void set_parent (list_head <decision> *s); >>>>> bool if_statement_p (uint64_t * = 0) const; >>>>> >>>>> /* The state to which this decision belongs. */ >>>>> state *s; >>>>> >>>>> /* Links to other decisions in the same state. */ >>>>> decision *prev, *next; >>>>> >>>>> /* The test to perform. */ >>>>> struct test test; >>>>> }; >>>> >>>> ../../gcc/genrecog.c:1467: error: declaration of 'test decision::test' >>>> ../../gcc/genrecog.c:1051: error: changes meaning of 'test' from >>>> struct test' >>>> >>>> Bootstrap compiler is gcc 4.3.4. >>> >>> Bah. Does it like "::test test" instead of "struct test test"? >> >> Same error. > > You have to use a different name I believe (or -fpermissive). Hmm, but then why does it work with more recent compilers? Thanks, Richard
Hi! On Mon, 27 Apr 2015 11:20:30 +0100, Richard Sandiford <richard.sandiford@arm.com> wrote: > I think it's been the case for a while that parallel builds of GCC tend > to serialise around the compilation of insn-recog.c, especially with > higher --enable-checking settings. This patch tries to speed that > up by replacing most of genrecog with a new algorithm. [...] > Here's a comparison of the number of lines of code in insn-recog.c > before and after the patch on one target per config/ CPU: > > aarch64-linux-gnueabi 115526 38169 : 33.04% > alpha-linux-gnu 24479 10740 : 43.87% > arm-linux-gnueabi 169208 67759 : 40.04% > avr-rtems 55647 22127 : 39.76% > bfin-elf 13928 6498 : 46.65% > c6x-elf 29928 13324 : 44.52% > [...] Nice work! For a nvptx-none build of r222446 and r222860, respectively: $ wc -l {prev/,}build-gcc/gcc/insn-recog.c | head -n -1 5042 prev/build-gcc/gcc/insn-recog.c 2570 build-gcc/gcc/insn-recog.c I'm mostly illiterate regarding GCC's machine description files, and genrecog, but noticed one thing: with the genrecog rewrite, I get the following diff compared to a previous build log: build/genrecog [...]/source-gcc/gcc/common.md [...]/source-gcc/gcc/config/nvptx/nvptx.md \ insn-conditions.md > tmp-recog.c -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 0 missing mode? -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 1 missing mode? +Statistics for recog: + Number of decisions: 799 + longest path: 28 (code: 208) + longest backtrack: 2 (code: 136) +Statistics for split_insns: + Number of decisions: 0 + longest path: 0 (code: -1) + longest backtrack: 0 (code: -1) +Statistics for peephole2_insns: + Number of decisions: 0 + longest path: 0 (code: -1) + longest backtrack: 0 (code: -1) +Shared 655 out of 1350 states by creating 153 new states, saving 502 gcc/config/nvptx/nvptx.md: 1206 (define_insn "allocate_stack" 1207 [(set (match_operand 0 "nvptx_register_operand" "=R") 1208 (unspec [(match_operand 1 "nvptx_register_operand" "R")] 1209 UNSPEC_ALLOCA))] 1210 "" 1211 "%.\\tcall (%0), %%alloca, (%1);") Are these two (former) warnings a) something that should still be reported by genrecog, and b) something that should be addressed (Bernd)? Grüße, Thomas
On Thu, May 07, 2015 at 10:59:01AM +0200, Thomas Schwinge wrote: > build/genrecog [...]/source-gcc/gcc/common.md [...]/source-gcc/gcc/config/nvptx/nvptx.md \ > insn-conditions.md > tmp-recog.c > -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 0 missing mode? > -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 1 missing mode? > > gcc/config/nvptx/nvptx.md: > > 1206 (define_insn "allocate_stack" > 1207 [(set (match_operand 0 "nvptx_register_operand" "=R") > 1208 (unspec [(match_operand 1 "nvptx_register_operand" "R")] > 1209 UNSPEC_ALLOCA))] > 1210 "" > 1211 "%.\\tcall (%0), %%alloca, (%1);") > > Are these two (former) warnings a) something that should still be > reported by genrecog, Yes. > and b) something that should be addressed (Bernd)? Yes. Supposedly you want :P on both match_operand and unspec too, but as this serves not just as an insn pattern, but also as expander that needs to have this particular name, supposedly you want: (define_expand "allocate_stack" [(match_operand 0 "nvptx_register_operand") (match_operand 1 "nvptx_register_operand")] "" { if (TARGET_ABI64) emit_insn (gen_allocate_stack_di (operands[0], operands[1])); else emit_insn (gen_allocate_stack_si (operands[0], operands[1])); DONE; }) (define_insn "allocate_stack_<mode>" [(set (match_operand:P 0 "nvptx_register_operand" "=R") (unspec:P [(match_operand:P 1 "nvptx_register_operand" "R")] UNSPEC_ALLOCA))] "" "%.\\tcall (%0), %%alloca, (%1);") rr so. Of course, as even latest Cuda drop doesn't support alloca, this is quite dubious, perhaps better would be sorry on it. BTW, with Cuda 7.0, even printf doesn't work anymore, is that known? Jakub
Hi Richard, I see regressions with the current IBM z13 vector patchset which appear to be related to the new genrecog. The following two insn definitions only differ in the mode and predicate of the shift count operand. (define_insn "lshr<mode>3" [(set (match_operand:VI 0 "register_operand" "=v") (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))] "TARGET_VX" "vesrl<bhfgq>\t%v0,%v1,%Y2" [(set_attr "op_type" "VRS")]) (define_insn "vlshr<mode>3" [(set (match_operand:VI 0 "register_operand" "=v") (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") (match_operand:VI 2 "register_operand" "v")))] "TARGET_VX" "vesrlv<bhfgq>\t%v0,%v1,%v2" [(set_attr "op_type" "VRR")]) However, the insn-recog.c code only seem to check the predicate. This is a problem since shift_count_or_setmem_operand does not check the mode. if (shift_count_or_setmem_operand (operands[2], SImode) && #line 717 "/home3/andreas/patched/../gcc/gcc/config/s390/vector.md" (TARGET_VX)) return 600; /* lshrv2qi3 */ if (register_operand (operands[2], V2QImode) && #line 747 "/home3/andreas/patched/../gcc/gcc/config/s390/vector.md" (TARGET_VX)) return 630; /* vlshrv2qi3 */ break; I could add a mode check to the predicate. However, I just wanted to check whether this change was intentional. Bye, -Andreas-
Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes: > Hi Richard, > > I see regressions with the current IBM z13 vector patchset which appear to be related to the new > genrecog. > > The following two insn definitions only differ in the mode and predicate of the shift count operand. > > (define_insn "lshr<mode>3" > [(set (match_operand:VI 0 "register_operand" "=v") > (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") > (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))] > "TARGET_VX" > "vesrl<bhfgq>\t%v0,%v1,%Y2" > [(set_attr "op_type" "VRS")]) > > (define_insn "vlshr<mode>3" > [(set (match_operand:VI 0 "register_operand" "=v") > (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") > (match_operand:VI 2 "register_operand" "v")))] > "TARGET_VX" > "vesrlv<bhfgq>\t%v0,%v1,%v2" > [(set_attr "op_type" "VRR")]) > > > However, the insn-recog.c code only seem to check the predicate. This is a problem since > shift_count_or_setmem_operand does not check the mode. Yeah, it's a bug if a "non-special" predicate doesn't check the mode. Even old genreog relied on that: /* After factoring, try to simplify the tests on any one node. Tests that are useful for switch statements are recognizable by having only a single test on a node -- we'll be manipulating nodes with multiple tests: If we have mode tests or code tests that are redundant with predicates, remove them. */ although it sounds like the old optimisation didn't trigger for your case. genpreds.c:mark_mode_tests is supposed to add these tests automatically if needed. I suppose it isn't doing so here because the predicate accepts const_int and because of: /* Given an RTL expression EXP, find all subexpressions which we may assume to perform mode tests. Normal MATCH_OPERAND does; MATCH_CODE does if it applies to the whole expression and accepts CONST_INT or CONST_DOUBLE; and we have to assume that MATCH_TEST does not. [...] */ static void mark_mode_tests (rtx exp) { switch (GET_CODE (exp)) { [...] case MATCH_CODE: if (XSTR (exp, 1)[0] != '\0' || (!strstr (XSTR (exp, 0), "const_int") && !strstr (XSTR (exp, 0), "const_double"))) NO_MODE_TEST (exp) = 1; break; The code matches the comment, but it doesn't look right. Perhaps it was supposed to mean match_codes that _only_ contain const_int and const_double? Knowing that the rtx is one of those codes guarantees that the mode is VOIDmode, but a match_code that includes other rtxes as well doesn't itself test the mode of those rtxes. Even then, a predicate that matches const_ints and is passed SImode mustn't accept const_ints outside the SImode range. I suppose we just rely on all predicates to perform some kind of range check. (The standard ones do of course.) As a quick workaround, try replacing the case above with: case MATCH_CODE: if (XSTR (exp, 1)[0] != '\0') NO_MODE_TEST (exp) = 1; break; I'll try to come up with a better fix in the meantime. Thanks, Richard
On 05/17/2015 11:12 PM, Richard Sandiford wrote: > Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes: >> Hi Richard, >> >> I see regressions with the current IBM z13 vector patchset which appear to be related to the new >> genrecog. >> >> The following two insn definitions only differ in the mode and predicate of the shift count operand. >> >> (define_insn "lshr<mode>3" >> [(set (match_operand:VI 0 "register_operand" "=v") >> (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") >> (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))] >> "TARGET_VX" >> "vesrl<bhfgq>\t%v0,%v1,%Y2" >> [(set_attr "op_type" "VRS")]) >> >> (define_insn "vlshr<mode>3" >> [(set (match_operand:VI 0 "register_operand" "=v") >> (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") >> (match_operand:VI 2 "register_operand" "v")))] >> "TARGET_VX" >> "vesrlv<bhfgq>\t%v0,%v1,%v2" >> [(set_attr "op_type" "VRR")]) >> >> >> However, the insn-recog.c code only seem to check the predicate. This is a problem since >> shift_count_or_setmem_operand does not check the mode. > > Yeah, it's a bug if a "non-special" predicate doesn't check the mode. > Even old genreog relied on that: > > /* After factoring, try to simplify the tests on any one node. > Tests that are useful for switch statements are recognizable > by having only a single test on a node -- we'll be manipulating > nodes with multiple tests: > > If we have mode tests or code tests that are redundant with > predicates, remove them. */ > > although it sounds like the old optimisation didn't trigger for your case. > > genpreds.c:mark_mode_tests is supposed to add these tests automatically > if needed. I suppose it isn't doing so here because the predicate > accepts const_int and because of: > > /* Given an RTL expression EXP, find all subexpressions which we may > assume to perform mode tests. Normal MATCH_OPERAND does; > MATCH_CODE does if it applies to the whole expression and accepts > CONST_INT or CONST_DOUBLE; and we have to assume that MATCH_TEST > does not. [...] > */ > static void > mark_mode_tests (rtx exp) > { > switch (GET_CODE (exp)) > { > [...] > case MATCH_CODE: > if (XSTR (exp, 1)[0] != '\0' > || (!strstr (XSTR (exp, 0), "const_int") > && !strstr (XSTR (exp, 0), "const_double"))) > NO_MODE_TEST (exp) = 1; > break; > > The code matches the comment, but it doesn't look right. Perhaps it > was supposed to mean match_codes that _only_ contain const_int and > const_double? Knowing that the rtx is one of those codes guarantees > that the mode is VOIDmode, but a match_code that includes other rtxes > as well doesn't itself test the mode of those rtxes. > > Even then, a predicate that matches const_ints and is passed SImode > mustn't accept const_ints outside the SImode range. I suppose we > just rely on all predicates to perform some kind of range check. > (The standard ones do of course.) > > As a quick workaround, try replacing the case above with: > > case MATCH_CODE: > if (XSTR (exp, 1)[0] != '\0') > NO_MODE_TEST (exp) = 1; > break; > > I'll try to come up with a better fix in the meantime. Thanks for looking into this! I've applied a patch adding a mode check to shift_count_or_setmem_operand which as expected fixes the issue for me. -Andreas-
Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes: > On 05/17/2015 11:12 PM, Richard Sandiford wrote: >> Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes: >>> Hi Richard, >>> >>> I see regressions with the current IBM z13 vector patchset which appear to be related to the new >>> genrecog. >>> >>> The following two insn definitions only differ in the mode and predicate of the shift count operand. >>> >>> (define_insn "lshr<mode>3" >>> [(set (match_operand:VI 0 "register_operand" "=v") >>> (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") >>> (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))] >>> "TARGET_VX" >>> "vesrl<bhfgq>\t%v0,%v1,%Y2" >>> [(set_attr "op_type" "VRS")]) >>> >>> (define_insn "vlshr<mode>3" >>> [(set (match_operand:VI 0 "register_operand" "=v") >>> (lshiftrt:VI (match_operand:VI 1 "register_operand" "v") >>> (match_operand:VI 2 "register_operand" "v")))] >>> "TARGET_VX" >>> "vesrlv<bhfgq>\t%v0,%v1,%v2" >>> [(set_attr "op_type" "VRR")]) >>> >>> >>> However, the insn-recog.c code only seem to check the predicate. This is a problem since >>> shift_count_or_setmem_operand does not check the mode. >> >> Yeah, it's a bug if a "non-special" predicate doesn't check the mode. >> Even old genreog relied on that: >> >> /* After factoring, try to simplify the tests on any one node. >> Tests that are useful for switch statements are recognizable >> by having only a single test on a node -- we'll be manipulating >> nodes with multiple tests: >> >> If we have mode tests or code tests that are redundant with >> predicates, remove them. */ >> >> although it sounds like the old optimisation didn't trigger for your case. >> >> genpreds.c:mark_mode_tests is supposed to add these tests automatically >> if needed. I suppose it isn't doing so here because the predicate >> accepts const_int and because of: >> >> /* Given an RTL expression EXP, find all subexpressions which we may >> assume to perform mode tests. Normal MATCH_OPERAND does; >> MATCH_CODE does if it applies to the whole expression and accepts >> CONST_INT or CONST_DOUBLE; and we have to assume that MATCH_TEST >> does not. [...] >> */ >> static void >> mark_mode_tests (rtx exp) >> { >> switch (GET_CODE (exp)) >> { >> [...] >> case MATCH_CODE: >> if (XSTR (exp, 1)[0] != '\0' >> || (!strstr (XSTR (exp, 0), "const_int") >> && !strstr (XSTR (exp, 0), "const_double"))) >> NO_MODE_TEST (exp) = 1; >> break; >> >> The code matches the comment, but it doesn't look right. Perhaps it >> was supposed to mean match_codes that _only_ contain const_int and >> const_double? Knowing that the rtx is one of those codes guarantees >> that the mode is VOIDmode, but a match_code that includes other rtxes >> as well doesn't itself test the mode of those rtxes. >> >> Even then, a predicate that matches const_ints and is passed SImode >> mustn't accept const_ints outside the SImode range. I suppose we >> just rely on all predicates to perform some kind of range check. >> (The standard ones do of course.) >> >> As a quick workaround, try replacing the case above with: >> >> case MATCH_CODE: >> if (XSTR (exp, 1)[0] != '\0') >> NO_MODE_TEST (exp) = 1; >> break; >> >> I'll try to come up with a better fix in the meantime. > > Thanks for looking into this! > I've applied a patch adding a mode check to > shift_count_or_setmem_operand which as expected fixes > the issue for me. OK. After a false start earlier in the week, I've now got a patch that I think fixes genpreds.c "properly". Hope to post it soon. Thanks, Richard
Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2015-04-27 10:42:57.783191573 +0100 +++ gcc/Makefile.in 2015-04-27 10:43:42.878643078 +0100 @@ -2527,7 +2527,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H) build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ - coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h + coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h \ + $(HASH_TABLE_H) inchash.h build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF) \ $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \ @@ -2559,6 +2560,8 @@ genprog = $(genprogerr) check checksum c # These programs need libs over and above what they get from the above list. build/genautomata$(build_exeext) : BUILD_LIBS += -lm +build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o + # For stage1 and when cross-compiling use the build libcpp which is # built with NLS disabled. For stage2+ use the host library and # its dependencies.