Message ID | 4C5A0F50.4000904@codesourcery.com |
---|---|
State | New |
Headers | show |
On Thu, 2010-08-05 at 03:09 +0200, Bernd Schmidt wrote: > With -mtune=xscale, arm_gen_load_multiple_1 doesn't actually generate a > load-multiple for two loads, but single instructions. This causes a > peephole2 loop where we try to optimize the same instructions over and > over again. > > Fixed by teaching multiple_operation_profitable_p about this situation. Ok? > > > Bernd + if (nops <= 2 && arm_tune_xscale && !optimize_size) It seems wrong to me to have an explicit test for XScale here. Other cores could surely exist that result in a similar loop. Ie the test should be on the cause, not on 'it's an xscale' R
On Thu, 2010-08-05 at 08:46 +0100, Richard Earnshaw wrote: > On Thu, 2010-08-05 at 03:09 +0200, Bernd Schmidt wrote: > > With -mtune=xscale, arm_gen_load_multiple_1 doesn't actually generate a > > load-multiple for two loads, but single instructions. This causes a > > peephole2 loop where we try to optimize the same instructions over and > > over again. > > > > Fixed by teaching multiple_operation_profitable_p about this situation. Ok? > > > > > > Bernd > > + if (nops <= 2 && arm_tune_xscale && !optimize_size) > > It seems wrong to me to have an explicit test for XScale here. Other > cores could surely exist that result in a similar loop. Ie the test > should be on the cause, not on 'it's an xscale' I think the equivalent condition in arm_gen_load_multiple (and arm_gen_store_multiple, fwiw) is also written in terms of arm_tune_xcale. So, to that extent, "it's an xscale" is actually the cause here, at least until and unless another CPU appears which has the same pipeline behaviour. It does seem a little bit fragile to require the conditions in the two places to match in order to avoid loops, though. Maybe there should be a comment at the appropriate place in arm_gen_xx_multiple to say that it needs to stay in sync with the code in multiple_operation_profitable_p, or maybe those two functions could be reworked to actually use multiple_operation_profitable_p() rather than duplicating its logic. p.
Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 162821) +++ config/arm/arm.c (working copy) @@ -9185,7 +9192,11 @@ multiple_operation_profitable_p (bool is changes, then the test below needs to be reworked. */ if (nops == 2 && arm_ld_sched && add_offset != 0) return false; - + /* Return false if arm_gen_load_multiple_1 or + arm_gen_store_multiple_1 would just emit single operations. See + the discussion there. */ + if (nops <= 2 && arm_tune_xscale && !optimize_size) + return false; return true; }