Message ID | 001401d7a2a5$5bf07db0$13d17910$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | Simplify paradoxical subreg extensions of TRUNCATE | expand |
On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote: > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as > (truncate:HI (reg:SI)), and closely related variants. Subregs of other than regs are undefined in RTL. You will first have to define this (in documentation as well as in other code that handles subregs). I doubt this is possible to do, subreg have so many overloaded meanings already. Segher
On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote: > > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as > > (truncate:HI (reg:SI)), and closely related variants. > > Subregs of other than regs are undefined in RTL. You will first have to > define this (in documentation as well as in other code that handles > subregs). I doubt this is possible to do, subreg have so many > overloaded meanings already. I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM is equal to (set (reg:MODE2) (xyz:MODE2 ..)) (subreg:MODE1 (reg:MODE2) ...) with 'reg' being a pseudo reg is the (only?) sensible way of defining it. That would make the simplification apply when substituting the pseudos definition inside the subreg which maybe is what this targets? Richard. > > > Segher
Hi Segher, I think the current documentation is sufficient. During compilation, GCC's combine pass will often substitute a register with an expression defining it's value, and then attempt to simplify it. As you point out, this often produces (temporary intermediate) expressions with poorly defined semantics. The purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE as an argument (that some folks consider undefined) and instead simplify them to either a single TRUNCATE or a SUBREG of a REG, both of which are well defined. I'm making/helping the implementation match the documentation. From the current output of -fdump-rtl-combine-all=stdout Trying 10 -> 15: 10: r29:QI=trunc(r32:SI) REG_DEAD r32:SI 15: r38:HI=r29:QI#0 REG_DEAD r29:QI Failed to match this instruction: (set (reg:HI 38) (subreg:HI (truncate:QI (reg:SI 32)) 0)) with this patch: Trying 10 -> 15: 10: r29:QI=trunc(r32:SI) REG_DEAD r32:SI 15: r38:HI=r29:QI#0 REG_DEAD r29:QI Successfully matched this instruction: (set (reg:HI 38) (truncate:HI (reg:SI 32))) allowing combination of insns 10 and 15 original costs 4 + 12 = 16 replacement cost 4 deferring deletion of insn with uid = 10. A few lines earlier in this function, it simplifies SUBREGs of CONST_INT, which would also have poorly defined semantics. Perhaps your interpretation of the RTL documentation doesn't strictly apply to this part of the RTL optimizers? Let me know what you think. Best regards, Roger -- -----Original Message----- From: Segher Boessenkool <segher@kernel.crashing.org> Sent: 06 September 2021 11:14 To: Roger Sayle <roger@nextmovesoftware.com> Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org> Subject: Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote: > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as > (truncate:HI (reg:SI)), and closely related variants. Subregs of other than regs are undefined in RTL. You will first have to define this (in documentation as well as in other code that handles subregs). I doubt this is possible to do, subreg have so many overloaded meanings already. Segher
Hi! On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote: > I think the current documentation is sufficient. During compilation, GCC's > combine pass will often substitute a register with an expression defining > it's value, and then attempt to simplify it. As you point out, this often > produces > (temporary intermediate) expressions with poorly defined semantics. Not "poorly defined". The documentation says (in rtl.texi) @findex subreg @item (subreg:@var{m1} @var{reg:m2} @var{bytenum}) @code{subreg} expressions are used to refer to a register in a machine mode other than its natural one, or to refer to one register of a multi-part @code{reg} that actually refers to several registers. It goes on to say there also are subregs of mem currently; it exhaustively lists all things you can take a subreg of, what the different semantics of those are, how the semantics are further "modified" (read: completely different) if some RTL flags are set, etc. The instruction combiner very often creates invalid RTL (only structurally valid, like, no missing operands). Invalid RTL is supposed to never match (because backends will not have patterns that match these). combine even creates invalid RTL on purpose (a clobber of const0_rtx) to ensure its result is deemed invalid, when it wants to abort a combination attempt for some reason. > The > purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE > as an argument (that some folks consider undefined) and instead simplify > them to either a single TRUNCATE or a SUBREG of a REG, both of which are > well defined. I'm making/helping the implementation match the > documentation. But this should never make it to simplify-rtx at all. You can only validly do such transformations in combine, because you need to know what insns you started with etc. simplify-rtx is a part of combine, sure, but it is used from other contexts as well nowadays. If you can safely do this from combine, you can do it in combine. > Trying 10 -> 15: > 10: r29:QI=trunc(r32:SI) > REG_DEAD r32:SI > 15: r38:HI=r29:QI#0 > REG_DEAD r29:QI > Failed to match this instruction: > (set (reg:HI 38) > (subreg:HI (truncate:QI (reg:SI 32)) 0)) > > with this patch: > > Trying 10 -> 15: > 10: r29:QI=trunc(r32:SI) > REG_DEAD r32:SI > 15: r38:HI=r29:QI#0 > REG_DEAD r29:QI > Successfully matched this instruction: > (set (reg:HI 38) > (truncate:HI (reg:SI 32))) > allowing combination of insns 10 and 15 > original costs 4 + 12 = 16 > replacement cost 4 > deferring deletion of insn with uid = 10. (It appears costs are a bit off on your target? Or is HImode especially expensive for some reason?) There is no subreg of a non-reg in there, so this is fine. > A few lines earlier in this function, it simplifies SUBREGs of CONST_INT, > which > would also have poorly defined semantics. Perhaps your interpretation of > the > RTL documentation doesn't strictly apply to this part of the RTL optimizers? Perhaps this is a bug. It is if it actually allows subregs of ints as input. Segher
On 9/6/2021 8:24 AM, Segher Boessenkool wrote: > Hi! > > On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote: >> I think the current documentation is sufficient. During compilation, GCC's >> combine pass will often substitute a register with an expression defining >> it's value, and then attempt to simplify it. As you point out, this often >> produces >> (temporary intermediate) expressions with poorly defined semantics. > Not "poorly defined". The documentation says (in rtl.texi) > > @findex subreg > @item (subreg:@var{m1} @var{reg:m2} @var{bytenum}) > > @code{subreg} expressions are used to refer to a register in a machine > mode other than its natural one, or to refer to one register of > a multi-part @code{reg} that actually refers to several registers. > > It goes on to say there also are subregs of mem currently; it > exhaustively lists all things you can take a subreg of, what the > different semantics of those are, how the semantics are further > "modified" (read: completely different) if some RTL flags are set, etc. > > The instruction combiner very often creates invalid RTL (only > structurally valid, like, no missing operands). Invalid RTL is supposed > to never match (because backends will not have patterns that match > these). combine even creates invalid RTL on purpose (a clobber of > const0_rtx) to ensure its result is deemed invalid, when it wants to > abort a combination attempt for some reason. > >> The >> purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE >> as an argument (that some folks consider undefined) and instead simplify >> them to either a single TRUNCATE or a SUBREG of a REG, both of which are >> well defined. I'm making/helping the implementation match the >> documentation. > But this should never make it to simplify-rtx at all. You can only > validly do such transformations in combine, because you need to know > what insns you started with etc. > > simplify-rtx is a part of combine, sure, but it is used from other > contexts as well nowadays. If you can safely do this from combine, > you can do it in combine. [ ... ] So what I think is missing here is some concrete path forward. If I'm understanding your objection Segher, you'd like to see Roger look at where these (subreg (truncate)) expressions came from and address them earlier than simplify_rtx? The theory being that the simplification bits could be used from other contexts than just combine and in those other contexts (subreg (truncate)) isn't valid? Jeff
Hi! On Sun, Sep 19, 2021 at 09:14:55AM -0600, Jeff Law wrote: > On 9/6/2021 8:24 AM, Segher Boessenkool wrote: > >On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote: > >>I think the current documentation is sufficient. During compilation, > >>GCC's > >>combine pass will often substitute a register with an expression defining > >>it's value, and then attempt to simplify it. As you point out, this often > >>produces > >>(temporary intermediate) expressions with poorly defined semantics. > >Not "poorly defined". The documentation says (in rtl.texi) > > > >@findex subreg > >@item (subreg:@var{m1} @var{reg:m2} @var{bytenum}) > > > >@code{subreg} expressions are used to refer to a register in a machine > >mode other than its natural one, or to refer to one register of > >a multi-part @code{reg} that actually refers to several registers. > > > >It goes on to say there also are subregs of mem currently; it > >exhaustively lists all things you can take a subreg of, what the > >different semantics of those are, how the semantics are further > >"modified" (read: completely different) if some RTL flags are set, etc. > > > >The instruction combiner very often creates invalid RTL (only > >structurally valid, like, no missing operands). Invalid RTL is supposed > >to never match (because backends will not have patterns that match > >these). combine even creates invalid RTL on purpose (a clobber of > >const0_rtx) to ensure its result is deemed invalid, when it wants to > >abort a combination attempt for some reason. > > > >>The > >>purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE > >>as an argument (that some folks consider undefined) and instead simplify > >>them to either a single TRUNCATE or a SUBREG of a REG, both of which are > >>well defined. I'm making/helping the implementation match the > >>documentation. > >But this should never make it to simplify-rtx at all. You can only > >validly do such transformations in combine, because you need to know > >what insns you started with etc. > > > >simplify-rtx is a part of combine, sure, but it is used from other > >contexts as well nowadays. If you can safely do this from combine, > >you can do it in combine. > [ ... ] > So what I think is missing here is some concrete path forward. If I'm > understanding your objection Segher, you'd like to see Roger look at > where these (subreg (truncate)) expressions came from and address them > earlier than simplify_rtx? There is no such thing as "earlier than simplify-rtx", that is the point. simplify-rtx is not a pass: it is like a library that is used from all over the RTL routines. If something belongs in combine, it should be in combine, not in simplify-rtx. > The theory being that the simplification > bits could be used from other contexts than just combine and in those > other contexts (subreg (truncate)) isn't valid? Or not actually simplifying (or canonicalising). Yup. It also is the case that invalid RTL should simply never be constructed, we should not (try to) fix it up later. I'm not sure if that was what was happening in this case, too much of this stuff recently, everything blurs together, sorry :-/ I would love to see this all improved. But if we just poke at it with a stick we'll be in worse shape instead of better by the time stage 1 closes. So we need a plan, and/or a topic tree (some devel/* branch or whatever). Segher
On 9/20/2021 6:23 PM, Segher Boessenkool wrote: > Hi! > > On Sun, Sep 19, 2021 at 09:14:55AM -0600, Jeff Law wrote: >> On 9/6/2021 8:24 AM, Segher Boessenkool wrote: >>> On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote: >>>> I think the current documentation is sufficient. During compilation, >>>> GCC's >>>> combine pass will often substitute a register with an expression defining >>>> it's value, and then attempt to simplify it. As you point out, this often >>>> produces >>>> (temporary intermediate) expressions with poorly defined semantics. >>> Not "poorly defined". The documentation says (in rtl.texi) >>> >>> @findex subreg >>> @item (subreg:@var{m1} @var{reg:m2} @var{bytenum}) >>> >>> @code{subreg} expressions are used to refer to a register in a machine >>> mode other than its natural one, or to refer to one register of >>> a multi-part @code{reg} that actually refers to several registers. >>> >>> It goes on to say there also are subregs of mem currently; it >>> exhaustively lists all things you can take a subreg of, what the >>> different semantics of those are, how the semantics are further >>> "modified" (read: completely different) if some RTL flags are set, etc. >>> >>> The instruction combiner very often creates invalid RTL (only >>> structurally valid, like, no missing operands). Invalid RTL is supposed >>> to never match (because backends will not have patterns that match >>> these). combine even creates invalid RTL on purpose (a clobber of >>> const0_rtx) to ensure its result is deemed invalid, when it wants to >>> abort a combination attempt for some reason. >>> >>>> The >>>> purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE >>>> as an argument (that some folks consider undefined) and instead simplify >>>> them to either a single TRUNCATE or a SUBREG of a REG, both of which are >>>> well defined. I'm making/helping the implementation match the >>>> documentation. >>> But this should never make it to simplify-rtx at all. You can only >>> validly do such transformations in combine, because you need to know >>> what insns you started with etc. >>> >>> simplify-rtx is a part of combine, sure, but it is used from other >>> contexts as well nowadays. If you can safely do this from combine, >>> you can do it in combine. >> [ ... ] >> So what I think is missing here is some concrete path forward. If I'm >> understanding your objection Segher, you'd like to see Roger look at >> where these (subreg (truncate)) expressions came from and address them >> earlier than simplify_rtx? > There is no such thing as "earlier than simplify-rtx", that is the > point. simplify-rtx is not a pass: it is like a library that is used > from all over the RTL routines. I'm referring to earlier in the call chain, not an earlier pass. Sorry I wasn't clear about that. If we were catching the scenario which led to the creation of (subreg (truncate)) in combine and instead of creating (subreg (truncate)) we instead created the simplified, correct form would that ease your concerns? Jeff
On Tue, Sep 21, 2021 at 6:18 AM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 9/20/2021 6:23 PM, Segher Boessenkool wrote: > > Hi! > > > > On Sun, Sep 19, 2021 at 09:14:55AM -0600, Jeff Law wrote: > >> On 9/6/2021 8:24 AM, Segher Boessenkool wrote: > >>> On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote: > >>>> I think the current documentation is sufficient. During compilation, > >>>> GCC's > >>>> combine pass will often substitute a register with an expression defining > >>>> it's value, and then attempt to simplify it. As you point out, this often > >>>> produces > >>>> (temporary intermediate) expressions with poorly defined semantics. > >>> Not "poorly defined". The documentation says (in rtl.texi) > >>> > >>> @findex subreg > >>> @item (subreg:@var{m1} @var{reg:m2} @var{bytenum}) > >>> > >>> @code{subreg} expressions are used to refer to a register in a machine > >>> mode other than its natural one, or to refer to one register of > >>> a multi-part @code{reg} that actually refers to several registers. > >>> > >>> It goes on to say there also are subregs of mem currently; it > >>> exhaustively lists all things you can take a subreg of, what the > >>> different semantics of those are, how the semantics are further > >>> "modified" (read: completely different) if some RTL flags are set, etc. > >>> > >>> The instruction combiner very often creates invalid RTL (only > >>> structurally valid, like, no missing operands). Invalid RTL is supposed > >>> to never match (because backends will not have patterns that match > >>> these). combine even creates invalid RTL on purpose (a clobber of > >>> const0_rtx) to ensure its result is deemed invalid, when it wants to > >>> abort a combination attempt for some reason. > >>> > >>>> The > >>>> purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE > >>>> as an argument (that some folks consider undefined) and instead simplify > >>>> them to either a single TRUNCATE or a SUBREG of a REG, both of which are > >>>> well defined. I'm making/helping the implementation match the > >>>> documentation. > >>> But this should never make it to simplify-rtx at all. You can only > >>> validly do such transformations in combine, because you need to know > >>> what insns you started with etc. > >>> > >>> simplify-rtx is a part of combine, sure, but it is used from other > >>> contexts as well nowadays. If you can safely do this from combine, > >>> you can do it in combine. > >> [ ... ] > >> So what I think is missing here is some concrete path forward. If I'm > >> understanding your objection Segher, you'd like to see Roger look at > >> where these (subreg (truncate)) expressions came from and address them > >> earlier than simplify_rtx? > > There is no such thing as "earlier than simplify-rtx", that is the > > point. simplify-rtx is not a pass: it is like a library that is used > > from all over the RTL routines. > I'm referring to earlier in the call chain, not an earlier pass. Sorry I > wasn't clear about that. > > If we were catching the scenario which led to the creation of (subreg > (truncate)) in combine and instead of creating (subreg (truncate)) we > instead created the simplified, correct form would that ease your concerns? Sorry to chime in, but if (subreg (truncate ...)) is invalid it would be OK for simplify_gen_subreg () to 1) return NULL_RTX 2) return valid RTX, in this case the suggested canoncalized form I agree that simplify_gen_subreg should not return (subreg (truncate ..)). So I'm not understanding if you are suggesting that even calling simplify_gen_subreg on a (truncate ...) is invalid? Now, the patch changes simplify_subreg where it's not clear whether the subreg already exists (it should not) or whether we are about to construct it, seeing if there's a valid RTX that expresses the semantic as if the op were pushed to a pseudo (aka we're combining on the fly). It's a bit like fold_convert (double_type_node, integer_one_node) second-guessing and doing a FLOAT_EXPR when folding the implicit CONVERT_EXPR. Richard. > > > Jeff
[Using this is a convenient place to reply to the thread as a whole] Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: >> >> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote: >> > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as >> > (truncate:HI (reg:SI)), and closely related variants. >> >> Subregs of other than regs are undefined in RTL. You will first have to >> define this (in documentation as well as in other code that handles >> subregs). I doubt this is possible to do, subreg have so many >> overloaded meanings already. > > I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM > is equal to > > (set (reg:MODE2) (xyz:MODE2 ..)) > (subreg:MODE1 (reg:MODE2) ...) > > with 'reg' being a pseudo reg is the (only?) sensible way of defining it. Agreed. And I think that's already the de facto definition (and has been for a long time). Subreg as an operation has to have defined semantics for all the cases that simplify_subreg handles, otherwise we have GIGO and a lot of the function could be deleted. We can (and do) choose to prevent some of those operations becoming actual rtxes, but even there, the de facto rules are quite broad. E.g.: - Like you said later, simplify_gen_subreg is opt-out rather than opt-in in terms of the subreg rtxes that it's prepared to create. - Even rs6000.md has: (define_insn "*<su>mul<mode>3_highpart" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (subreg:GPR (mult:<DMODE> (any_extend:<DMODE> (match_operand:GPR 1 "gpc_reg_operand" "r")) (any_extend:<DMODE> (match_operand:GPR 2 "gpc_reg_operand" "r"))) 0))] "WORDS_BIG_ENDIAN && !(<MODE>mode == SImode && TARGET_POWERPC64)" "mulh<wd><u> %0,%1,%2" [(set_attr "type" "mul") (set_attr "size" "<bits>")]) Many other ports have similar patterns. The problem with “combine can generate invalid rtl but backends should reject it” is that, generally, people write combine patterns by looking at what combine _wants_ to generate and then writing .md patterns to match that. In other words, combine in practice defines the (de facto) correct rtl representation of a combined sequence. Given: Trying 10 -> 15: 10: r29:QI=trunc(r32:SI) REG_DEAD r32:SI 15: r38:HI=r29:QI#0 REG_DEAD r29:QI Failed to match this instruction: (set (reg:HI 38) (subreg:HI (truncate:QI (reg:SI 32)) 0)) I'm sure there's a temptation to add an .md pattern that matches the subreg. :-) Thanks, Richard > That would make the simplification apply when substituting the pseudos > definition inside the subreg which maybe is what this targets? > > Richard. > >> >> >> Segher
On Mon, Sep 20, 2021 at 10:18:15PM -0600, Jeff Law wrote: > On 9/20/2021 6:23 PM, Segher Boessenkool wrote: > >There is no such thing as "earlier than simplify-rtx", that is the > >point. simplify-rtx is not a pass: it is like a library that is used > >from all over the RTL routines. > I'm referring to earlier in the call chain, not an earlier pass. Sorry I > wasn't clear about that. Ah okay, I see. > If we were catching the scenario which led to the creation of (subreg > (truncate)) in combine and instead of creating (subreg (truncate)) we > instead created the simplified, correct form would that ease your concerns? Yes please. And combine has access to the original instructions, so can make sure to only ever create something with the same semantics. Thanks, Segher
On Tue, Sep 21, 2021 at 08:24:04AM +0200, Richard Biener wrote: > On Tue, Sep 21, 2021 at 6:18 AM Jeff Law via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > If we were catching the scenario which led to the creation of (subreg > > (truncate)) in combine and instead of creating (subreg (truncate)) we > > instead created the simplified, correct form would that ease your concerns? > > Sorry to chime in, No, please do! > but if (subreg (truncate ...)) is invalid it would be OK for > simplify_gen_subreg () to > 1) return NULL_RTX Yes, that is valid, but the simplify-rtx routines (as well as pretty much everything dealing with RTL) doesn't have to check it is valid RTL, as a matter of principle: it can just assume anything passed around *is* valid. Anything simplify-rtx gets as input should be valid RTL, and the only thing it should be concerned with is *generating* only valid RTL as well :-) > 2) return valid RTX, in this case the suggested canoncalized form > I agree that simplify_gen_subreg should not return (subreg (truncate ..)). > > So I'm not understanding if you are suggesting that even calling > simplify_gen_subreg on a (truncate ...) is invalid? What would the semantics be? A subreg has a reg as operand, and its semantics are different for pseudos and for hard regs! We could define the semantics of calling simplify_gen_subreg on non-regs, but that will make the interface only more confused imo, much harder to use correctly. > Now, the patch changes simplify_subreg where it's not clear whether > the subreg already exists (it should not) or whether we are about to > construct it, seeing if there's a valid RTX that expresses the semantic > as if the op were pushed to a pseudo (aka we're combining on the fly). > > It's a bit like fold_convert (double_type_node, integer_one_node) > second-guessing and doing a FLOAT_EXPR when folding > the implicit CONVERT_EXPR. Yeah :-/ Segher
That define_insn is making my eyes bleed! I think that's the most convincing argument I've ever read on gcc-patches, and I can see now what Segher is so opposed to. My new goal is to add sh_mul and uh_mul as RTL expression codes for highpart multiplication, so that that pattern will go away [and never ever been seen again]. Fortunately, the proposed patch is part of the solution, rather than part of the problem. simplify_subreg is used to find a sensible alternate form for a hypothetical SUBREG before it's created (and normally it'll only be created if it's valid c.f. validate_subreg in simplify_gen_subreg). This function is called over 30 million times during stage2/stage3 of a bootstrap on x86_64, with the most common "SUBREG" operand RTX codes being: 16160294 reg 5627157 const_int 3278692 mem 960839 lshiftrt 902685 plus 690246 and 430131 subreg 319407 zero_extract 319145 minus 255043 value 232959 ashift 225468 mult 217819 xor 207645 clobber 207454 ashiftrt 100213 ior Adding a simplify_subreg for mult, replacing them with a highpart rtx should avoid (reduce) combine's Frankenstein patterns appearing in machine descriptions. Presumably, the reason that this define_insn starts with a "*" even for a recognized named pattern, is that if expand ever attempted emitting this instruction, the internal sanity checks for a sane/valid SUBREG would fail/ICE. p.s. You're right that there's a temptation to add ugly patterns to a backend, instead of fixing their simplification in the middle-end; but only when backend patches get reviewed faster/easier than middle-end patches 😊 Awesome (counter-)example! Roger -- -----Original Message----- From: Richard Sandiford <richard.sandiford@arm.com> Sent: 21 September 2021 13:02 To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> Cc: Segher Boessenkool <segher@kernel.crashing.org>; Richard Biener <richard.guenther@gmail.com>; Roger Sayle <roger@nextmovesoftware.com> Subject: Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE [Using this is a convenient place to reply to the thread as a whole] Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: >> >> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote: >> > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as >> > (truncate:HI (reg:SI)), and closely related variants. >> >> Subregs of other than regs are undefined in RTL. You will first have >> to define this (in documentation as well as in other code that >> handles subregs). I doubt this is possible to do, subreg have so >> many overloaded meanings already. > > I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM is > equal to > > (set (reg:MODE2) (xyz:MODE2 ..)) > (subreg:MODE1 (reg:MODE2) ...) > > with 'reg' being a pseudo reg is the (only?) sensible way of defining it. Agreed. And I think that's already the de facto definition (and has been for a long time). Subreg as an operation has to have defined semantics for all the cases that simplify_subreg handles, otherwise we have GIGO and a lot of the function could be deleted. We can (and do) choose to prevent some of those operations becoming actual rtxes, but even there, the de facto rules are quite broad. E.g.: - Like you said later, simplify_gen_subreg is opt-out rather than opt-in in terms of the subreg rtxes that it's prepared to create. - Even rs6000.md has: (define_insn "*<su>mul<mode>3_highpart" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (subreg:GPR (mult:<DMODE> (any_extend:<DMODE> (match_operand:GPR 1 "gpc_reg_operand" "r")) (any_extend:<DMODE> (match_operand:GPR 2 "gpc_reg_operand" "r"))) 0))] "WORDS_BIG_ENDIAN && !(<MODE>mode == SImode && TARGET_POWERPC64)" "mulh<wd><u> %0,%1,%2" [(set_attr "type" "mul") (set_attr "size" "<bits>")]) Many other ports have similar patterns. The problem with “combine can generate invalid rtl but backends should reject it” is that, generally, people write combine patterns by looking at what combine _wants_ to generate and then writing .md patterns to match that. In other words, combine in practice defines the (de facto) correct rtl representation of a combined sequence. Given: Trying 10 -> 15: 10: r29:QI=trunc(r32:SI) REG_DEAD r32:SI 15: r38:HI=r29:QI#0 REG_DEAD r29:QI Failed to match this instruction: (set (reg:HI 38) (subreg:HI (truncate:QI (reg:SI 32)) 0)) I'm sure there's a temptation to add an .md pattern that matches the subreg. :-) Thanks, Richard > That would make the simplification apply when substituting the pseudos > definition inside the subreg which maybe is what this targets? > > Richard. > >> >> >> Segher
On 9/21/2021 6:01 AM, Richard Sandiford via Gcc-patches wrote: > [Using this is a convenient place to reply to the thread as a whole] > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool >> <segher@kernel.crashing.org> wrote: >>> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote: >>>> This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as >>>> (truncate:HI (reg:SI)), and closely related variants. >>> Subregs of other than regs are undefined in RTL. You will first have to >>> define this (in documentation as well as in other code that handles >>> subregs). I doubt this is possible to do, subreg have so many >>> overloaded meanings already. >> I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM >> is equal to >> >> (set (reg:MODE2) (xyz:MODE2 ..)) >> (subreg:MODE1 (reg:MODE2) ...) >> >> with 'reg' being a pseudo reg is the (only?) sensible way of defining it. > Agreed. And I think that's already the de facto definition (and has been > for a long time). Subreg as an operation has to have defined semantics > for all the cases that simplify_subreg handles, otherwise we have GIGO > and a lot of the function could be deleted. We can (and do) choose > to prevent some of those operations becoming actual rtxes, but even there, > the de facto rules are quite broad. E.g.: > > - Like you said later, simplify_gen_subreg is opt-out rather than opt-in > in terms of the subreg rtxes that it's prepared to create. > > - Even rs6000.md has: > > (define_insn "*<su>mul<mode>3_highpart" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (subreg:GPR > (mult:<DMODE> (any_extend:<DMODE> > (match_operand:GPR 1 "gpc_reg_operand" "r")) > (any_extend:<DMODE> > (match_operand:GPR 2 "gpc_reg_operand" "r"))) > 0))] > "WORDS_BIG_ENDIAN && !(<MODE>mode == SImode && TARGET_POWERPC64)" > "mulh<wd><u> %0,%1,%2" > [(set_attr "type" "mul") > (set_attr "size" "<bits>")]) > > Many other ports have similar patterns. > > The problem with “combine can generate invalid rtl but backends > should reject it” is that, generally, people write combine patterns > by looking at what combine _wants_ to generate and then writing > .md patterns to match that. In other words, combine in practice > defines the (de facto) correct rtl representation of a combined > sequence. Yup. In fact, I'm having this exact concern with an internal chunk of work right now :-) > > Given: > > Trying 10 -> 15: > 10: r29:QI=trunc(r32:SI) > REG_DEAD r32:SI > 15: r38:HI=r29:QI#0 > REG_DEAD r29:QI > Failed to match this instruction: > (set (reg:HI 38) > (subreg:HI (truncate:QI (reg:SI 32)) 0)) > > I'm sure there's a temptation to add an .md pattern that matches > the subreg. :-) And that's why my stated position is that any subreg in a backend pattern needs to be justified. Obviously all kinds exist, but when I see them I ask for a justification. Jeff
On 9/21/2021 6:54 AM, Roger Sayle wrote: > That define_insn is making my eyes bleed! I think that's the most convincing > argument I've ever read on gcc-patches, and I can see now what Segher is so > opposed to. Then you haven't seen enough patterns or your eyes haven't toughened up through the years :-) Jeff
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index ebad5cb..3040136 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -7403,13 +7403,34 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, return immed_wide_int_const (val, int_outermode); } - if (GET_MODE_PRECISION (int_outermode) - < GET_MODE_PRECISION (int_innermode)) + unsigned int outerprec = GET_MODE_PRECISION (int_outermode); + unsigned int innerprec = GET_MODE_PRECISION (int_innermode); + if (outerprec < innerprec) { rtx tem = simplify_truncation (int_outermode, op, int_innermode); if (tem) return tem; } + else if (outerprec > innerprec + && GET_CODE (op) == TRUNCATE) + { + /* Optimize paradoxial subreg extension of a truncate, where + we can eliminate the truncation, or widen the truncation + to the desired mode. */ + scalar_int_mode int_origmode; + rtx orig = XEXP (op, 0); + if (is_a <scalar_int_mode> (GET_MODE (orig), &int_origmode)) + { + unsigned int origprec = GET_MODE_PRECISION (int_origmode); + if (outerprec < origprec) + return simplify_gen_unary (TRUNCATE, outermode, orig, + GET_MODE (orig)); + else if (outerprec > origprec) + return lowpart_subreg (int_outermode, orig, int_origmode); + else if (outermode == GET_MODE (orig)) + return orig; + } + } } /* If OP is a vector comparison and the subreg is not changing the