Message ID | 4C52F946.6010404@codesourcery.com |
---|---|
State | New |
Headers | show |
On Fri, Jul 30, 2010 at 6:09 PM, Jie Zhang <jie@codesourcery.com> wrote: > PR tree-optimization/45144 shows an issue that SRA causes. I used > arm-none-eabi target as an example in PR tree-optimization/45144. But the > same issue can also been seen on x86_64-linux-gnu target using the same test > case in the PR. > > SRA completely scalarizes a small record. But when the record is used later > as a whole, GCC has to make the record out of the scalar parts. When the > record contains bit-fields, GCC generates ugly code to assemble the scalar > parts into a record. > > Until the aggregates copy propagation is implemented, I think it would > better to disable full scalarization for such records. The patch is > attached. It's bootstrapped on x86_64-linux-gnu and regression tested. > > Is it OK for now? We can remove it after aggregates copy propagation is > implemented. > > Will it be better to add bit-field check in type_consists_of_records_p > instead of using a new function "type_contains_bit_field_p"? Without looking at the patch I have two comments. First, I heard talks about an aggregate copy propagation pass. Instead of a new pass I would suggest to rewrite aggregate variables (partly) into SSA form, extending the DECL_GIMPLE_REG_P facility. Thus, for each full definition aggr = ... you get aggr_2 = ..., partial reads of that SSA name should be fine as well as aggregate uses. This works for non-aliased variables only of course and requires some thinking as for DECL_GIMPLE_REG_P as that applies to the underlying decl which would prohibit creating from a.a = x; a = b; a.a = x; a_2 = b; which means that we need to replace the full defs with a new temporary (with DECL_DEBUG_EXPR_FROM set appropriately), thus a.a = x; tmp_2 = b; // was a this rewriting should happen in update_address_taken. Second comment. SRA should be the place to lower bitfield accesses to loads of the underlying scalar and the bit extraction via BIT_FIELD_REF. Stores should be handled via a read-modify-write cycle, borrowing BIT_FIELD_EXPR as done on the old mem-ref branch. Richard. > Regards, > -- > Jie Zhang > CodeSourcery >
Richard Guenther wrote: >> Until the aggregates copy propagation is implemented, I think it would >> better to disable full scalarization for such records. The patch is >> attached. It's bootstrapped on x86_64-linux-gnu and regression tested. > Without looking at the patch I have two comments. From reading your comments, I'm not sure if you're saying that you don't like Jie's idea, or if you think it's an OK idea, but there are some other things that you would like to see done as well or instead. It would be helpful if you could make that clear. To me, Jie's change seems like a plausible heuristic within the current infrastructure. I'm all for building new infrastructure when possible/necessary, but I don't think we should prevent these kinds of tweaks to heuristics just because we can think of another way of doing things. To me, the question ought to be "does this make the compiler better for users?" To answer that question, what I guess I'd like to know is what the impact is on some benchmark that matters. Here, I don't think SPEC, EEMBC, etc. are probably the right places to look; they probably don't have much of this kind of code. Perhaps it would be interesting to know how many SRA opportunities we lose in the Linux kernel because of this change -- and then spot-check some of them to see whether those are cases where we really lose by not doing SRA, or really win because we're not doing the kind of ugly stuff that inspired this change. Thanks,
On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote: > To me, Jie's change seems like a plausible heuristic within the current > infrastructure. I'm all for building new infrastructure when > possible/necessary, but I don't think we should prevent these kinds of > tweaks to heuristics just because we can think of another way of doing > things. To me, the question ought to be "does this make the compiler > better for users?" I wouldn't call it tweak to heuristics, it looks to me like an ugly hack. In many cases the SRA of bitfields is very desirable, especially for apps that use bitfields heavily like Linux kernel. I remember Alex spending quite some time improving SRA to handle bitfields more efficiently, and this hack just disables it altogether. What we IMHO need is a pass late in the gimple pipeline which will optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops or something similar, because bitfield ops are too hard to be handled efficiently after the expansion. Combiner helps sometimes, but for many bit field ops the 3 insn limit is too limiting. Such pass could help with many more things than just what has been created for bitfields by SRA in some cases, consider say: struct A { unsigned short h; int i : 4, j : 4, k : 4, l : 4; } a, b, c; void foo (void) { a.i |= 5; a.j = 3; a.k &= 2; a.l = 8; b.i = c.i; b.j = c.j; b.k = c.k; b.l = c.l; } which on say i?86/x86_64 can be optimized as roughly: ((unsigned short *)&a)[1] = (((unsigned short *)&a)[1] & 0x20a) | 0x8035; ((unsigned short *)&b)[1] = ((unsigned short *)&c)[1]; (of course in some alias friendly way). See e.g. PR37135/PR22121/PR42172. Jakub
Jakub Jelinek wrote: > On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote: >> To me, Jie's change seems like a plausible heuristic within the current >> infrastructure. I'm all for building new infrastructure when >> possible/necessary, but I don't think we should prevent these kinds of >> tweaks to heuristics just because we can think of another way of doing >> things. To me, the question ought to be "does this make the compiler >> better for users?" > > I wouldn't call it tweak to heuristics, it looks to me like an ugly hack. Well, call it what you like. It's making a guess about when SRA is or isn't profitable. It might be a good guess or a bad guess, but it's a guess, which is why I used the term "heuristic". > In many cases the SRA of bitfields is very desirable, especially for apps > that use bitfields heavily like Linux kernel. I remember Alex spending > quite some time improving SRA to handle bitfields more efficiently, > and this hack just disables it altogether. That's precisely why I wanted some data about that. Maybe we already know enough to know that this isn't a good heuristic because we already know that SRA on bitfields is a big win in lots of cases. That's fine; in that case, the patch is not a good thing. > What we IMHO need is a pass late in the gimple pipeline which will > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops > or something similar, because bitfield ops are too hard to be handled > efficiently after the expansion. That's an interesting idea. Does anyone else have an idea as to the best plan here?
> > What we IMHO need is a pass late in the gimple pipeline which will > > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops > > or something similar, because bitfield ops are too hard to be handled > > efficiently after the expansion. > > That's an interesting idea. Does anyone else have an idea as to the > best plan here? Note that fold-const.c has done this in some cases for a very long time.
On Jul 30, 2010, at 11:54 AM, kenner@vlsi1.ultra.nyu.edu (Richard Kenner) wrote: >>> What we IMHO need is a pass late in the gimple pipeline which will >>> optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops >>> or something similar, because bitfield ops are too hard to be >>> handled >>> efficiently after the expansion. >> >> That's an interesting idea. Does anyone else have an idea as to the >> best plan here? > > Note that fold-const.c has done this in some cases for a very long > time. Interesting because we have this extact pass here at cavium. I can post the patch set that adds it to 4.3.3 if anyone wants to look at them.
Andrew Pinski wrote: > Interesting because we have this extact pass here at cavium. I can post > the patch set that adds it to 4.3.3 if anyone wants to look at them. Certainly seems like it would be helpful, assuming Cavium will assign copyright.
On Fri, Jul 30, 2010 at 8:54 PM, Richard Kenner <kenner@vlsi1.ultra.nyu.edu> wrote: >> > What we IMHO need is a pass late in the gimple pipeline which will >> > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops >> > or something similar, because bitfield ops are too hard to be handled >> > efficiently after the expansion. >> >> That's an interesting idea. Does anyone else have an idea as to the >> best plan here? > > Note that fold-const.c has done this in some cases for a very long time. Indeed - also in a very ugly way. Richard.
On Fri, Jul 30, 2010 at 7:53 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote: >> To me, Jie's change seems like a plausible heuristic within the current >> infrastructure. I'm all for building new infrastructure when >> possible/necessary, but I don't think we should prevent these kinds of >> tweaks to heuristics just because we can think of another way of doing >> things. To me, the question ought to be "does this make the compiler >> better for users?" > > I wouldn't call it tweak to heuristics, it looks to me like an ugly hack. > In many cases the SRA of bitfields is very desirable, especially for apps > that use bitfields heavily like Linux kernel. I remember Alex spending > quite some time improving SRA to handle bitfields more efficiently, > and this hack just disables it altogether. > > What we IMHO need is a pass late in the gimple pipeline which will > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops > or something similar, because bitfield ops are too hard to be handled > efficiently after the expansion. Combiner helps sometimes, but for many > bit field ops the 3 insn limit is too limiting. > Such pass could help with many more things than just what has been created > for bitfields by SRA in some cases, consider say: > struct A { unsigned short h; int i : 4, j : 4, k : 4, l : 4; } a, b, c; > > void > foo (void) > { > a.i |= 5; a.j = 3; a.k &= 2; a.l = 8; > b.i = c.i; b.j = c.j; b.k = c.k; b.l = c.l; > } > > which on say i?86/x86_64 can be optimized as roughly: > ((unsigned short *)&a)[1] = (((unsigned short *)&a)[1] & 0x20a) | 0x8035; > ((unsigned short *)&b)[1] = ((unsigned short *)&c)[1]; > (of course in some alias friendly way). > See e.g. PR37135/PR22121/PR42172. Note that on the old mem-ref branch I lowered bit-field references very early, but people complained that we eventually loose the targets ability to directly perform bitfield stores. Basically I lowered each bitfield access (with bitfield access I name those using a COMPONENT_REF with a FIELD_DECL that has DECL_BIT_FIELD set) to either a load of the underlying type plus a BIT_FIELD_REF on the loaded scalar, or a load of the underlying type plus a BIT_FIELD_EXPR (new code, which inserts bits into a scalar) plus a store of the underlying type. CSE and DSE handled the redundant loads and stores very nicely and BIT_FIELD_EXPRs and BIT_FIELD_EXPRs were easily combined. Now instead of lowering very early or very late as Jakub suggests we can drive (and perform) that lowering in SRA. For example by simply walking through its list of accesses and combine them. Richard. > Jakub >
On Fri, Jul 30, 2010 at 6:09 PM, Jie Zhang <jie@codesourcery.com> wrote: > PR tree-optimization/45144 shows an issue that SRA causes. I used > arm-none-eabi target as an example in PR tree-optimization/45144. But the > same issue can also been seen on x86_64-linux-gnu target using the same test > case in the PR. > > SRA completely scalarizes a small record. But when the record is used later > as a whole, GCC has to make the record out of the scalar parts. When the > record contains bit-fields, GCC generates ugly code to assemble the scalar > parts into a record. > > Until the aggregates copy propagation is implemented, I think it would > better to disable full scalarization for such records. The patch is > attached. It's bootstrapped on x86_64-linux-gnu and regression tested. > > Is it OK for now? We can remove it after aggregates copy propagation is > implemented. > > Will it be better to add bit-field check in type_consists_of_records_p > instead of using a new function "type_contains_bit_field_p"? The patch looks like a hack. Can you instead make SRA treat the underlying type of bit-fields as the object for scalarization? I'm not 100% familiar with the internals, but IIRC SRA builds an access tree, so for each bitfield load/store the analysis phase should record an access of the underlying field covering all bits and a sub-access for the respective member. Maybe Martin can weight in here. Richard. > > Regards, > -- > Jie Zhang > CodeSourcery >
On 07/31/2010 01:11 AM, Mark Mitchell wrote: > Richard Guenther wrote: > >>> Until the aggregates copy propagation is implemented, I think it would >>> better to disable full scalarization for such records. The patch is >>> attached. It's bootstrapped on x86_64-linux-gnu and regression tested. > >> Without looking at the patch I have two comments. > > From reading your comments, I'm not sure if you're saying that you don't > like Jie's idea, or if you think it's an OK idea, but there are some > other things that you would like to see done as well or instead. It > would be helpful if you could make that clear. > > To me, Jie's change seems like a plausible heuristic within the current > infrastructure. I'm all for building new infrastructure when > possible/necessary, but I don't think we should prevent these kinds of > tweaks to heuristics just because we can think of another way of doing > things. To me, the question ought to be "does this make the compiler > better for users?" > > To answer that question, what I guess I'd like to know is what the > impact is on some benchmark that matters. Here, I don't think SPEC, > EEMBC, etc. are probably the right places to look; they probably don't > have much of this kind of code. Perhaps it would be interesting to know > how many SRA opportunities we lose in the Linux kernel because of this > change -- and then spot-check some of them to see whether those are > cases where we really lose by not doing SRA, or really win because we're > not doing the kind of ugly stuff that inspired this change. > My patch causes no changes on EEMBC for arm-none-eabi target. My patch prevents several full scalarizations of records with bit-field when compiling Linux kernel for x86_64, but none of these causes differences in final assemblies. I use 2.6.34.1 and the default config for x86_64. I checked -O2 and -Os. I also checked the effect of my patch on GCC itself. My patch prevents one full scalarization of records with bit-field when compiling files under gcc/ with -O2. But there is no difference in the final assemblies. Regards,
On 07/31/2010 01:53 AM, Jakub Jelinek wrote: > On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote: >> To me, Jie's change seems like a plausible heuristic within the current >> infrastructure. I'm all for building new infrastructure when >> possible/necessary, but I don't think we should prevent these kinds of >> tweaks to heuristics just because we can think of another way of doing >> things. To me, the question ought to be "does this make the compiler >> better for users?" > > I wouldn't call it tweak to heuristics, it looks to me like an ugly hack. > In many cases the SRA of bitfields is very desirable, especially for apps > that use bitfields heavily like Linux kernel. I remember Alex spending > quite some time improving SRA to handle bitfields more efficiently, > and this hack just disables it altogether. > My patch does not disable SRA of bit-fields. It's an adjust of the tweak for PR 42585. It's only disable total scalarization of small records containing bit-fields. As you said below, it's difficult for RTL combiner to combine the scalarized bit-fields again. So we'd better to avoid total scalarizing it. If the bit-fields are accessed explicitly in the IL, SRA still works as in usual way. As my reply to Mark's email, my patch does not change the final assembly of linux kernel, at least for 2.6.34.1 with the default configuration of x86_64. Regards,
On 07/31/2010 02:59 AM, Mark Mitchell wrote: > Andrew Pinski wrote: > >> Interesting because we have this extact pass here at cavium. I can post >> the patch set that adds it to 4.3.3 if anyone wants to look at them. > > Certainly seems like it would be helpful, assuming Cavium will assign > copyright. > This is very interesting! If Cavium will assign copyright and the patch set is posted, I can help to bring it to trunk. Regards,
On 07/31/2010 05:48 PM, Richard Guenther wrote: > On Fri, Jul 30, 2010 at 6:09 PM, Jie Zhang<jie@codesourcery.com> wrote: >> PR tree-optimization/45144 shows an issue that SRA causes. I used >> arm-none-eabi target as an example in PR tree-optimization/45144. But the >> same issue can also been seen on x86_64-linux-gnu target using the same test >> case in the PR. >> >> SRA completely scalarizes a small record. But when the record is used later >> as a whole, GCC has to make the record out of the scalar parts. When the >> record contains bit-fields, GCC generates ugly code to assemble the scalar >> parts into a record. >> >> Until the aggregates copy propagation is implemented, I think it would >> better to disable full scalarization for such records. The patch is >> attached. It's bootstrapped on x86_64-linux-gnu and regression tested. >> >> Is it OK for now? We can remove it after aggregates copy propagation is >> implemented. >> >> Will it be better to add bit-field check in type_consists_of_records_p >> instead of using a new function "type_contains_bit_field_p"? > > The patch looks like a hack. Can you instead make SRA treat the > underlying type of bit-fields as the object for scalarization? > I'm not 100% familiar with the internals, but IIRC SRA builds an > access tree, so for each bitfield load/store the analysis phase should > record an access of the underlying field covering all bits and > a sub-access for the respective member. > Yes. It's a hack, but it's a hack to the hack for PR 42585. ;-) The fix for PR 42585 scalarize small records unconditionally, in hope that later passes can fix it up if the scalars are not used eventually. But for bit-fields, RTL combiner failed to do so. :-( > Maybe Martin can weight in here. > Martin? Regards,
Hi, On Sat, Jul 31, 2010 at 12:09:42AM +0800, Jie Zhang wrote: > PR tree-optimization/45144 shows an issue that SRA causes. I used > arm-none-eabi target as an example in PR tree-optimization/45144. > But the same issue can also been seen on x86_64-linux-gnu target > using the same test case in the PR. > > SRA completely scalarizes a small record. But when the record is > used later as a whole, GCC has to make the record out of the scalar > parts. When the record contains bit-fields, GCC generates ugly code > to assemble the scalar parts into a record. > > Until the aggregates copy propagation is implemented, I think it > would better to disable full scalarization for such records. The > patch is attached. It's bootstrapped on x86_64-linux-gnu and > regression tested. > > Is it OK for now? We can remove it after aggregates copy propagation > is implemented. > > Will it be better to add bit-field check in > type_consists_of_records_p instead of using a new function > "type_contains_bit_field_p"? > When I was implementing the total scalarization bit of SRA I thought of disabling it for structures with bit-fields too. I did not really examine the effects in any way but I never expected this to result in nice code at places where we use SRA to do poor-man's copy propagation. However, eventually I decided to keep the total scalarization for these structures because doing so can save stack space and it would be shame if adding one such field to a structure would make us use the space again (in fact, total scalarization was introduced as a fix to unnecessary stack-frame setup bugs like PR 42585). But given your results with kernel and gcc, I don't object to disabling it... people will scream if something slows down for them. On the other hand, if we decide to go this way, we need to do the check at a different place, going over the whole type whenever looking at an assignment is not necessary and is wasteful. The check would be most appropriate as a part of type_consists_of_records_p where it would be performed only once for each variable in question. Thanks, Martin > > Regards, > -- > Jie Zhang > CodeSourcery > > PR tree-optimization/45144 > * tree-sra.c (type_contains_bit_field_p): New. > (build_accesses_from_assign): Don't completely scalarize > a record if it contains bit-field. > > testsuite/ > PR tree-optimization/45144 > * gcc.dg/tree-ssa/pr45144.c: New test. > > Index: testsuite/gcc.dg/tree-ssa/pr45144.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/pr45144.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/pr45144.c (revision 0) > @@ -0,0 +1,46 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +void baz (unsigned); > + > +extern unsigned buf[]; > + > +struct A > +{ > + unsigned a1:10; > + unsigned a2:3; > + unsigned:19; > +}; > + > +union TMP > +{ > + struct A a; > + unsigned int b; > +}; > + > +static unsigned > +foo (struct A *p) > +{ > + union TMP t; > + struct A x; > + > + x = *p; > + t.a = x; > + return t.b; > +} > + > +void > +bar (unsigned orig, unsigned *new) > +{ > + struct A a; > + union TMP s; > + > + s.b = orig; > + a = s.a; > + if (a.a1) > + baz (a.a2); > + *new = foo (&a); > +} > + > +/* { dg-final { scan-tree-dump "x = a;" "optimized"} } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: tree-sra.c > =================================================================== > --- tree-sra.c (revision 162704) > +++ tree-sra.c (working copy) > @@ -998,6 +998,30 @@ disqualify_ops_if_throwing_stmt (gimple > return false; > } > > +static bool > +type_contains_bit_field_p (const_tree type) > +{ > + tree fld; > + > + if (TREE_CODE (type) != RECORD_TYPE) > + return false; > + > + for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) > + if (TREE_CODE (fld) == FIELD_DECL) > + { > + tree ft = TREE_TYPE (fld); > + > + if (DECL_BIT_FIELD (fld)) > + return true; > + > + if (TREE_CODE (ft) == RECORD_TYPE > + && type_contains_bit_field_p (ft)) > + return true; > + } > + > + return false; > +} > + > /* Scan expressions occuring in STMT, create access structures for all accesses > to candidates for scalarization and remove those candidates which occur in > statements or expressions that prevent them from being split apart. Return > @@ -1025,7 +1049,8 @@ build_accesses_from_assign (gimple stmt) > { > racc->grp_assignment_read = 1; > if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt) > - && !is_gimple_reg_type (racc->type)) > + && !is_gimple_reg_type (racc->type) > + && !type_contains_bit_field_p (racc->type)) > bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base)); > } >
Hi, On Sat, Jul 31, 2010 at 11:44:24AM +0200, Richard Guenther wrote: > On Fri, Jul 30, 2010 at 7:53 PM, Jakub Jelinek <jakub@redhat.com> wrote: > > On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote: > >> To me, Jie's change seems like a plausible heuristic within the current > >> infrastructure. I'm all for building new infrastructure when > >> possible/necessary, but I don't think we should prevent these kinds of > >> tweaks to heuristics just because we can think of another way of doing > >> things. To me, the question ought to be "does this make the compiler > >> better for users?" > > > > I wouldn't call it tweak to heuristics, it looks to me like an ugly hack. > > In many cases the SRA of bitfields is very desirable, especially for apps > > that use bitfields heavily like Linux kernel. I remember Alex spending > > quite some time improving SRA to handle bitfields more efficiently, > > and this hack just disables it altogether. > > > > What we IMHO need is a pass late in the gimple pipeline which will > > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops > > or something similar, because bitfield ops are too hard to be handled > > efficiently after the expansion. Combiner helps sometimes, but for many > > bit field ops the 3 insn limit is too limiting. > > Such pass could help with many more things than just what has been created > > for bitfields by SRA in some cases, consider say: > > struct A { unsigned short h; int i : 4, j : 4, k : 4, l : 4; } a, b, c; > > > > void > > foo (void) > > { > > a.i |= 5; a.j = 3; a.k &= 2; a.l = 8; > > b.i = c.i; b.j = c.j; b.k = c.k; b.l = c.l; > > } > > > > which on say i?86/x86_64 can be optimized as roughly: > > ((unsigned short *)&a)[1] = (((unsigned short *)&a)[1] & 0x20a) | 0x8035; > > ((unsigned short *)&b)[1] = ((unsigned short *)&c)[1]; > > (of course in some alias friendly way). > > See e.g. PR37135/PR22121/PR42172. > > Note that on the old mem-ref branch I lowered bit-field references very > early, but people complained that we eventually loose the targets > ability to directly perform bitfield stores. > > Basically I lowered each bitfield access (with bitfield access I name > those using a COMPONENT_REF with a FIELD_DECL that has > DECL_BIT_FIELD set) to either a load of the underlying type > plus a BIT_FIELD_REF on the loaded scalar, or a load of the underlying > type plus a BIT_FIELD_EXPR (new code, which inserts bits into > a scalar) plus a store of the underlying type. CSE and DSE handled > the redundant loads and stores very nicely and BIT_FIELD_EXPRs > and BIT_FIELD_EXPRs were easily combined. > > Now instead of lowering very early or very late as Jakub suggests we > can drive (and perform) that lowering in SRA. For example by > simply walking through its list of accesses and combine them. > With somehing like BIT_FIELD_EXPR... yes, this is probably the way to go, also for other reasons like replacing the implementation of buid_ref_for_offset with building of a MEM_REF. But as you well know, this is not particuarly easy. But I hope I'll get there eventually :-) Martin
Jie Zhang wrote: > My patch prevents several full scalarizations of records with bit-field > when compiling Linux kernel for x86_64, but none of these causes > differences in final assemblies. I use 2.6.34.1 and the default config > for x86_64. I checked -O2 and -Os. That seems at odds with the statement made previously in this thread that this optimization was essential for Linux kernel performance. If Jie's statement is accurate, then, whether or not this is a "hack", it seems like a win. I don't see anything wrong with accepting a small, local improvement that has no user-observable negative impact; we can always rip it out and replace it with something better when something better exists.
On Mon, Aug 2, 2010 at 6:52 PM, Mark Mitchell <mark@codesourcery.com> wrote: > Jie Zhang wrote: > >> My patch prevents several full scalarizations of records with bit-field >> when compiling Linux kernel for x86_64, but none of these causes >> differences in final assemblies. I use 2.6.34.1 and the default config >> for x86_64. I checked -O2 and -Os. > > That seems at odds with the statement made previously in this thread > that this optimization was essential for Linux kernel performance. > > If Jie's statement is accurate, then, whether or not this is a "hack", > it seems like a win. I don't see anything wrong with accepting a small, > local improvement that has no user-observable negative impact; we can > always rip it out and replace it with something better when something > better exists. OTOH no changes in code generation are also not in favor of this patch. Why didn't it improve anything? Or was that expected? Can you adjust the patch according to Martins suggestion? Richard.
On 08/03/2010 05:00 PM, Richard Guenther wrote: > On Mon, Aug 2, 2010 at 6:52 PM, Mark Mitchell<mark@codesourcery.com> wrote: >> Jie Zhang wrote: >> >>> My patch prevents several full scalarizations of records with bit-field >>> when compiling Linux kernel for x86_64, but none of these causes >>> differences in final assemblies. I use 2.6.34.1 and the default config >>> for x86_64. I checked -O2 and -Os. >> >> That seems at odds with the statement made previously in this thread >> that this optimization was essential for Linux kernel performance. >> >> If Jie's statement is accurate, then, whether or not this is a "hack", >> it seems like a win. I don't see anything wrong with accepting a small, >> local improvement that has no user-observable negative impact; we can >> always rip it out and replace it with something better when something >> better exists. > > OTOH no changes in code generation are also not in favor of this > patch. Why didn't it improve anything? Or was that expected? > Only two total scalarizations are prevented by my patch when building the linux kernel for the default x86_64 config, which compiles 1646 C files. One is in io_apic.c:ioapic_write_entry (), "struct IO_APIC_route_entry e". The other is in fs-writeback.c:bdi_sync_writeback (), "struct wb_writeback_args args". In both cases, those structs are used simply as a whole. GCC can already optimize away the parts generated by total scalarization. So there is no difference when it's disabled. > Can you adjust the patch according to Martins suggestion? > OK. Thanks,
PR tree-optimization/45144 * tree-sra.c (type_contains_bit_field_p): New. (build_accesses_from_assign): Don't completely scalarize a record if it contains bit-field. testsuite/ PR tree-optimization/45144 * gcc.dg/tree-ssa/pr45144.c: New test. Index: testsuite/gcc.dg/tree-ssa/pr45144.c =================================================================== --- testsuite/gcc.dg/tree-ssa/pr45144.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/pr45144.c (revision 0) @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +void baz (unsigned); + +extern unsigned buf[]; + +struct A +{ + unsigned a1:10; + unsigned a2:3; + unsigned:19; +}; + +union TMP +{ + struct A a; + unsigned int b; +}; + +static unsigned +foo (struct A *p) +{ + union TMP t; + struct A x; + + x = *p; + t.a = x; + return t.b; +} + +void +bar (unsigned orig, unsigned *new) +{ + struct A a; + union TMP s; + + s.b = orig; + a = s.a; + if (a.a1) + baz (a.a2); + *new = foo (&a); +} + +/* { dg-final { scan-tree-dump "x = a;" "optimized"} } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: tree-sra.c =================================================================== --- tree-sra.c (revision 162704) +++ tree-sra.c (working copy) @@ -998,6 +998,30 @@ disqualify_ops_if_throwing_stmt (gimple return false; } +static bool +type_contains_bit_field_p (const_tree type) +{ + tree fld; + + if (TREE_CODE (type) != RECORD_TYPE) + return false; + + for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) + if (TREE_CODE (fld) == FIELD_DECL) + { + tree ft = TREE_TYPE (fld); + + if (DECL_BIT_FIELD (fld)) + return true; + + if (TREE_CODE (ft) == RECORD_TYPE + && type_contains_bit_field_p (ft)) + return true; + } + + return false; +} + /* Scan expressions occuring in STMT, create access structures for all accesses to candidates for scalarization and remove those candidates which occur in statements or expressions that prevent them from being split apart. Return @@ -1025,7 +1049,8 @@ build_accesses_from_assign (gimple stmt) { racc->grp_assignment_read = 1; if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt) - && !is_gimple_reg_type (racc->type)) + && !is_gimple_reg_type (racc->type) + && !type_contains_bit_field_p (racc->type)) bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base)); }